Greetings, We're currently evaluating MonetDB for a analytical DW and so far we are happy with the results. I am trying to implement a grouping function that calculates a a value over a set of strings, so that my queries would read like this: select metric, *udf_aggregate*(string_column) from table group by metric; for a bit of background, we're using a dinstinct value sketch called HyperLogLog http://metamarkets.com/2012/fast-cheap-and-98-right-cardinality-estimation-f... http://blog.aggregateknowledge.com/2012/10/25/sketch-of-the-day-hyperloglog-... and we're currently storing estimations for each time period (day). HLL lets you merge/aggregate a set of estimations (each estimation is a vector of numbers, we're currently storing it as a string) for an arbitrary range, and still have an accurate estimation. (I'm sure the literature doesn't call it estimations, sorry for my English) what I would like is a custom UDF like the one provided in MonetDB src (*reverse*)**but that would operate and behave like an aggregate function. Right now, I'm not considering using it for types other than /string/ (no need for polymorphic right now). Is this possible with an UDF? I found a way of registering aggregate functions on the mailing list, but the HLL is complex enough to warrant its own C impl, instead of a MAL function. Thanks, Miguel _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
On Wed, Nov 28, 2012 at 05:27:47PM +0000, Miguel Ping wrote:
Greetings,
We're currently evaluating MonetDB for a analytical DW and so far we are happy with the results.
I am trying to implement a grouping function that calculates a a value over a set of strings, so that my queries would read like this:
select metric, udf_aggregate(string_column) from table group by metric;
for a bit of background, we're using a dinstinct value sketch called HyperLogLog http://metamarkets.com/2012/ fast-cheap-and-98-right-cardinality-estimation-for-big-data/ http://blog.aggregateknowledge.com/2012/10/25/ sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/
and we're currently storing estimations for each time period (day). HLL lets you merge/aggregate a set of estimations (each estimation is a vector of numbers, we're currently storing it as a string) for an arbitrary range, and still have an accurate estimation. (I'm sure the literature doesn't call it estimations, sorry for my English)
what I would like is a custom UDF like the one provided in MonetDB src (reverse) but that would operate and behave like an aggregate function.
Right now, I'm not considering using it for types other than string (no need for polymorphic right now). Is this possible with an UDF? I found a way of registering aggregate Yes. functions on the mailing list, but the HLL is complex enough to warrant its own C impl, instead of a MAL function. Well c functions need there own mal signatures. So steps needed
c-hll-aggr implementations for single and multiple groups. Load this library into monetdb (ie mal and library file). sql create aggregate to register them. Examples are in the batxml file(s). Niels
Thanks, Miguel
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
Niels, thanks for your quick reply. I'm looking at batxml.c right now, is there a func that I should look for single and multiple groups? I'm looking at xmlagg. I don't understand why I would need both single and multiple group impls; i can only compute HLL values for single columns. Any pointers are greatly appreciated, thanks! Miguel On 11/28/2012 05:33 PM, Niels Nes wrote:
On Wed, Nov 28, 2012 at 05:27:47PM +0000, Miguel Ping wrote:
Greetings,
We're currently evaluating MonetDB for a analytical DW and so far we are happy with the results.
I am trying to implement a grouping function that calculates a a value over a set of strings, so that my queries would read like this:
select metric, udf_aggregate(string_column) from table group by metric;
for a bit of background, we're using a dinstinct value sketch called HyperLogLog http://metamarkets.com/2012/ fast-cheap-and-98-right-cardinality-estimation-for-big-data/ http://blog.aggregateknowledge.com/2012/10/25/ sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/
and we're currently storing estimations for each time period (day). HLL lets you merge/aggregate a set of estimations (each estimation is a vector of numbers, we're currently storing it as a string) for an arbitrary range, and still have an accurate estimation. (I'm sure the literature doesn't call it estimations, sorry for my English)
what I would like is a custom UDF like the one provided in MonetDB src (reverse) but that would operate and behave like an aggregate function.
Right now, I'm not considering using it for types other than string (no need for polymorphic right now). Is this possible with an UDF? I found a way of registering aggregate Yes. functions on the mailing list, but the HLL is complex enough to warrant its own C impl, instead of a MAL function. Well c functions need there own mal signatures. So steps needed
c-hll-aggr implementations for single and multiple groups. Load this library into monetdb (ie mal and library file). sql create aggregate to register them.
Examples are in the batxml file(s).
Niels
Thanks, Miguel _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
On Wed, Nov 28, 2012 at 06:32:30PM +0000, Miguel Ping wrote:
Niels, thanks for your quick reply.
I'm looking at batxml.c right now, is there a func that I should look for single and multiple groups? I'm looking at xmlagg. I don't understand why I would need both single and multiple group impls; i can only compute HLL values for single columns.
Any pointers are greatly appreciated, thanks!
We have to update the documentation on extending monetdb with complex functions. The UDF description is only for very simple map like operations. But just looking at the code, may get you far enough. Indeed BATxmlagg is the multi group version. BATXMLgroup is the example of the single group. Niels
Miguel
On 11/28/2012 05:33 PM, Niels Nes wrote:
On Wed, Nov 28, 2012 at 05:27:47PM +0000, Miguel Ping wrote:
Greetings,
We're currently evaluating MonetDB for a analytical DW and so far we are happy with the results.
I am trying to implement a grouping function that calculates a a value over a set of strings, so that my queries would read like this:
select metric, udf_aggregate(string_column) from table group by metric;
for a bit of background, we're using a dinstinct value sketch called HyperLogLog http://metamarkets.com/2012/ fast-cheap-and-98-right-cardinality-estimation-for-big-data/ http://blog.aggregateknowledge.com/2012/10/25/ sketch-of-the-day-hyperloglog-cornerstone-of-a-big-data-infrastructure/
and we're currently storing estimations for each time period (day). HLL lets you merge/aggregate a set of estimations (each estimation is a vector of numbers, we're currently storing it as a string) for an arbitrary range, and still have an accurate estimation. (I'm sure the literature doesn't call it estimations, sorry for my English)
what I would like is a custom UDF like the one provided in MonetDB src (reverse) but that would operate and behave like an aggregate function.
Right now, I'm not considering using it for types other than string (no need for polymorphic right now). Is this possible with an UDF? I found a way of registering aggregate Yes. functions on the mailing list, but the HLL is complex enough to warrant its own C impl, instead of a MAL function. Well c functions need there own mal signatures. So steps needed
c-hll-aggr implementations for single and multiple groups. Load this library into monetdb (ie mal and library file). sql create aggregate to register them.
Examples are in the batxml file(s).
Niels
Thanks, Miguel _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Miguel Ping
-
Niels Nes