Greetings,
We're currently evaluating MonetDB for a analytical DW and so far we
are happy with the results.
I am trying to implement a grouping function that calculates a a
value over a set of strings, so that my queries would read like
this:
select metric, udf_aggregate(string_column)
from table
group by metric;
for a bit of background, we're using a dinstinct value sketch called
HyperLogLog
http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-for-big-data/
http://blog.aggregateknowledge.com/2012/10/25/sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/
and we're currently storing estimations for each time period (day).
HLL lets you merge/aggregate a set of estimations (each estimation
is a vector of numbers, we're currently storing it as a string) for
an arbitrary range, and still have an accurate estimation. (I'm sure
the literature doesn't call it estimations, sorry for my English)
what I would like is a custom UDF like the one provided in MonetDB
src (reverse) but that would operate and behave like
an aggregate function.
Right now, I'm not considering using it for types other than string
(no need for polymorphic right now).
Is this possible with an UDF? I found a way of registering aggregate
functions on the mailing list, but the HLL is complex enough to
warrant its own C impl, instead of a MAL function.
Thanks,
Miguel