RE: b and g must be aligned

14 Aug 2013

      I ran into a similar issue when creating my own aggregate function - the issue was that when using the mitosis pipeline it assumes your sub-aggregation is iterative (but then doesn't properly call it iteratively in the mergetable  processing since it only recognises aggregations sum/count/min/max). I worked around this by changing the opt_mergetable processing to call mat.pack() for b, g and e before passing them to my aggregation function. Doing this allows the use of the mitosis pipeline where possible and still supporting non-iterative sub aggregation functions. Whether this could be considered a valid enhancement for MonetDB I'm not sure?

If you want your sub-aggregations to run iteratively you need to change opt_mergetable to recognise your sub-aggregation name. It would be good if there was a dedicated namespace available for custom aggregations that user could define iterative and non-iterative sub-aggregations in that would avoid having to change the optimizer code.

Regards,

Scott

PS: The other option for making your sub-aggregation not use the mitosis pipeline is to declare it in the aggr module - that way it gets recognised as an aggregation not supporting the iterative approach. This means that none of your query can use the mitosis pipeline though.

-----Original Message-----
From: users-list [mailto:users-list-bounces+scott.mathieson=pb.com@monetdb.org] On Behalf Of Niels Nes
Sent: 14 August 2013 14:10
To: Communication channel for MonetDB users
Subject: Re: b and g must be aligned

On Wed, Aug 14, 2013 at 02:04:27PM +0100, Miguel Ping wrote:
...
I've been tinkering a little more, if I use either the minimal_pipe,
no_mitosis_pipe or the sequential_pipe optimizers the error no longer
occurs. I've specified the set of optimizers to use, and it seems that
the optimizer step that's problematic is the *mitosis* step.
Where can I learn more about this optimizer? Can someone shed some light?
Could you send us the explain outputs of both with and without mitosis/mergetable. This may give us a hint of what goes wrong.
My guess, is that the subhllagg should probably be added (recognized).

Niels
...
Thanks!
--
Miguel
On 08/13/2013 03:59 PM, Miguel Ping wrote:
...
I'm trying to come up with a small sample, but it is hard.
- I currently have a small dataset (~600 rows) which gives me the error.
- BUT if I export that data onto a *new* table, the error doesn't
happen. This makes it hard to provide a proper sample.
Running explain on the same query for these two tables, the plan is
different: the error plan seems to use the new subgroup feature for
aggregates which was released in a recent MonetDB version (11.15.x?) .
* Does monetdb keep some sort of internal table statistics that feed
the MAL planner?
* Can I force the query to use the 'old' aggregate function?
* Can you guys point me to the part where the group BAT is calculated?
my guess is that the group ids are not being calculated correctly,
hence the difference between g->U->count and b->U->count. I'm
guessing the culprit is around here:
...
|     (X_16,r1_34,X_140) := group.subgroupdone(X_15);
|     X_18 := algebra.leftfetchjoin(r1_34,X_15);
...
|     X_27:bat[:oid,:str]  := batudf.subhllagg(X_26,X_16,r1_34,true);
#r1_r4 should be the bat* gid
I have tried the same dataset with MonetDB 11.13.7 (which had the 'old'
aggregate definition) and the query works as expected.
I don't want to do a downgrade since I don't even know if the data
files are compatible.
Thanks,
Miguel
On 08/13/2013 11:35 AM, Martin Kersten wrote:
...
If it is hitting before your code, then please provide the smallest
(SQL) test case to reproduce it locally.
Thanks, Martin
On 8/13/13 11:27 AM, Miguel Ping wrote:
...
The query calls a custom aggregate function, but the error occurs
before hitting my code; the code path just starts to prepare things (it's just the boilerplate to run a custom aggr function), and it hits the error while calling BATgroupaggrinit. In fact, BATgroupaggrinit is the very first thing that the BAThllaggr function calls; Also according to Sjoerd, it's a bug:
"If this happens when running a SQL query, it's a bug. I don't
think NULLs have anything to do with it. NULL values are stored in-line. You might want to look at b->U->count, g->U->count, b->H->seq, g->H->seq, b->H->dense, g->H->dense when the misalignment happens (either in the debugger or by using printf--but realize that count and seq are not int, so %d is not going to work). Also things like the MAL plan (prepend SQL query with EXPLAIN) and the stack trace might be useful."
Thanks.
On 08/13/2013 08:51 AM, Stefan Manegold wrote:
...
Dear Miguel,
I am not aware of any "hllaggr()" function in the MonetDB release,
so I assume the error occurs in your code.  Not know your code at
all, I'm afraid we cannot be of much help.
Best,
Stefan
On Mon, Aug 12, 2013 at 06:59:33PM +0100, Miguel Ping wrote:
...
Some more info:
>SELECT count(*) FROM wa_sapo_pt_audience.kpi_2013_07 WHERE
ts>=1373410800000 AND ts<=1374706799000;
==> 764314
>SELECT count(distinct(ts_day)) FROM
wa_sapo_pt_audience.kpi_2013_07 WHERE ts>=1373410800000 AND
ts<=1374706799000; ==> 15
it seems that b.count is the row count, while g.count is
something like the distinct count, with 2 more values?
On 08/12/2013 05:50 PM, Miguel Ping wrote:
>Hi, I'm resurrecting this since I've been out of town and only
>today I got a chance to investigate further. I've recompiled
>with
>-O0 to prevent optimizations from "hiding" the values, and in
>the debugger I got this:
>
>b->U->count    BUN             764314
>g->U->count    BUN             17
>b->H->seq      oid             0
>g->H->seq      oid             0
>b->H->dense    unsigned int    1
>g->H->dense    unsigned int    1
>
>
>The stack call is as follows:
>
>BAThllaggr() at udf.c
>AGGRsubhllaggcand() at udf.c
>AGGRsubhllagg() at udf.c
>malCommandCall() at mal_interpreter.c
>runMALsequence() at mal_interpreter.c
>DFLOWworker() at mal_dataflow.c
>start_thread() at pthread_create.c
>clone() at clone.S
>0x0
>
>-------- Original Message --------
>Subject:     Re: b and g must be aligned
>Date:     Fri, 26 Jul 2013 11:34:53 +0100
>From:     Sjoerd Mullender 
>Reply-To:     Communication channel for MonetDB users
>
>To:     Communication channel for MonetDB users 
>
>
>
>-----BEGIN PGP SIGNED MESSAGE-----
>Hash: SHA1
>
>On 2013-07-26 12:23, Miguel Ping wrote:
>>On 07/25/2013 01:51 PM, Sjoerd Mullender wrote: On 2013-07-25
>>14:04, Miguel Ping wrote:
>>>>>Hi all,
>>>>>
>>>>>We're hitting this error "b and g must be aligned". I
>>>>>tracked the src to a commit about some alignment code thing
>>>>>in
>>>>>gdk_calc:
>>>>>http://www.mail-archive.com/checkin-list@monetdb.org/msg0973
>>>>>1.html
>>>>>
>>>>>
>(Fix alignment conversion in compatibility code for grouped
>>>>>aggregates.)
>>>>>
>>>>>Can you guys please explain what's the reason behind this
>>>>>error? I can't understand by just looking to the src of
>>>>>gdk_calc.c
>>>>>
>>>>>Thanks!
>>When using grouped aggregates, the grouping bat must be aligned
>>with the value bat.  The value bat is b and contains the values
>>you want to aggregate.  The group bat is g and contains for
>>each value in b the group (an oid) it belongs to.  Equal group
>>ids means the same group. These bats must be aligned, because
>>we need to know for each value in b to which group it belongs.
>>Aligned means: same length, and same head column values.  The
>>head columns must be dense (a sequence of numbers starting at
>>some value, and each next value exactly one larger than the
>>previous).  Dense sequences are usually not stored explicitly
>>in MonetDB.  We only store the first value in the hseqbase
>>field.  So the hseqbase fields of b and g must be equal.  The
>>one exception to this is when the bats are both empty.  This
>>last exception is the change to gdk_calc.c in that changeset.
>>
>>-- Sjoerd Mullender
>>>_______________________________________________ users-list
>>>mailing listusers-list@monetdb.org
>>>http://mail.monetdb.org/mailman/listinfo/users-list
>>>
>>>
>>Thanks for  your explanation. I still don't understand how can
>>there be a misalignment. I would expect MonetDB to feed my hll
>>aggregate functions with the correct values. Can it be that I
>>may have some NULL values and the validation is failing because
>>of that?
>If this happens when running a SQL query, it's a bug.
>I don't think NULLs have anything to do with it.  NULL values
>are stored in-line.
>You might want to look at b->U->count, g->U->count, b->H->seq,
>g->H->seq, b->H->dense, g->H->dense when the misalignment
>g->H->happens
>(either in the debugger or by using printf--but realize that
>count and seq are not int, so %d is not going to work).
>Also things like the MAL plan (prepend SQL query with EXPLAIN)
>and the stack trace might be useful.
>
>- -- Sjoerd Mullender
>-----BEGIN PGP SIGNATURE-----
>Version: GnuPG v1.4.13 (GNU/Linux)
>Comment: Using GnuPG with Thunderbird -http://www.enigmail.net/
>
>iQCVAwUBUfJQyT7g04AjvIQpAQK/JAP9HCp/aFaYWv0jodfPUnSRVgFSsdjTn/VL
>ttSsmAF+yomGMDIne2311f/D51F3/nte7Utx+01lgvArapWErhjGN1hzPSr5LQbs
>PZ6dUfNcH8Rt2AtT3uSxfkFZy9VRDCNNXPei43IgMS2HxVZ48pnAVkNcpBbW3Gms
>GwdvU7bSZtM=
>=2zSg
>-----END PGP SIGNATURE-----
>_______________________________________________
>users-list mailing list
>users-list@monetdb.org
>http://mail.monetdb.org/mailman/listinfo/users-list
>
>
>
>
>
>
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list

users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list

users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list
--
Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands
room L3.14,  phone ++31 20 592-4098     sip:4098@sip.cwi.nl
url: http://www.cwi.nl/~niels   e-mail: Niels.Nes@cwi.nl

________________________________

RE: b and g must be aligned

Scott Mathieson