Many aggregated columns / using custom aggregation function for pivot

10 Jul 2020

      Hi,

today I checked out how to implement custom aggregation functions in Python.
Works pretty neat and scales well when you have few columns which you want
to aggregate, however I have many :)

Imagine that the following aggregate functions has variants which perform a
similar iteration but call a different function (not sum). The call to
numpy.unique() and the iteration would have to be performed for each
aggregated column which is bad in terms of performance.

Background: I have records in a key / value like structure so there's a key
column and a value column. I want to group by primary key and then create a
column for each key. It's like pivot but without aggregation.

Any ideas how to do this with SQL or custom functions in MonetDB?

Kind regards,
Daniel

CREATE AGGREGATE python_aggregate(val INTEGER) RETURNS INTEGER LANGUAGE PYTHON {
    try:
        unique = numpy.unique(aggr_group)
        x = numpy.zeros(shape=(unique.size))
        for i in range(0, unique.size):
            x[i] = numpy.sum(val[aggr_group==unique[i]])
    except NameError:
        # aggr_group doesn't exist. no groups, aggregate on all data
        x = numpy.sum(val)
    return(x)
};

Daniel Glöckner

tags

participants (1)