[MonetDB-users] wrong results for group by queries

older
[MonetDB-users] Records deleted...

Daniel Boesswetter

7 Apr 2011 7 Apr '11

5:57 p.m.

Hi, we are using MonetDB (Oct-2010) for analyzing a 250G data warehouse and have found at least one "thing" which might be a serious bug. The simplest scenario that I have found to describe it, is as follows: SELECT comp.id as compid, comp.name, affnets.name as affid FROM toc INNER JOIN app ON toc.application_id=app.id INNER JOIN comp ON app.company_id = comp.id INNER JOIN affnets ON toc.affiliate_network_id=affnets.id GROUP BY compid, comp.name, affid This query is supposed to return all unique company/affnet pairs which occur in the fact table "toc". However, it returns only the unique company-names, each with a random (the first?) affnet. (Of course, this could be accomplished with DISTINCT, but our aim to to aggregate other fields). If I change the query to select app.company_id as compid, comp.name, affnets.name as affid ... or select comp.id as compid, comp.name, affnets.name as affid ... i.e. semantically the same but only with fields from 2 directly joined tables in the GROUP BY, then the result is correct. Question: Is this a bug or are we missing something? Thank you in advance! Daniel

Show replies by date

Daniel Boesswetter

8 Apr 8 Apr

8:50 p.m.

Hi again, now I managed to isolate the problem even more. Attached you will find a SQL script which creates 4 tables, inserts a few tuples and then queries them twice: the first query produces too few rows, but after dropping a primary key on one of the tables (it matters which one), the second query returns the correct results. Looks like the optimizer optimizes a little too eagerly in the presence of primary keys? Best regards, Daniel Daniel Boesswetter wrote:

...

Hi,

we are using MonetDB (Oct-2010) for analyzing a 250G data warehouse and have found at least one "thing" which might be a serious bug. The simplest scenario that I have found to describe it, is as follows:

SELECT comp.id as compid, comp.name, affnets.name as affid FROM toc INNER JOIN app ON toc.application_id=app.id INNER JOIN comp ON app.company_id = comp.id INNER JOIN affnets ON toc.affiliate_network_id=affnets.id GROUP BY compid, comp.name, affid

This query is supposed to return all unique company/affnet pairs which occur in the fact table "toc". However, it returns only the unique company-names, each with a random (the first?) affnet. (Of course, this could be accomplished with DISTINCT, but our aim to to aggregate other fields).

If I change the query to

select app.company_id as compid, comp.name, affnets.name as affid ...

or

select comp.id as compid, comp.name, affnets.name as affid ...

i.e. semantically the same but only with fields from 2 directly joined tables in the GROUP BY, then the result is correct.

Question: Is this a bug or are we missing something?

Thank you in advance!

Daniel

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Michael Sioutis

9 Apr 9 Apr

2:57 a.m.

I am getting same results with latest development release: +------+------+------+ | name | id | name | +======+======+======+ | b1 | 1 | c1 | | b1 | 2 | c2 | | b1 | 3 | c3 | +------+------+------+ 3 tuples +------+------+------+ | name | id | name | +======+======+======+ | b1 | 1 | c1 | | b1 | 2 | c2 | | b1 | 3 | c3 | | b2 | 1 | c1 | | b2 | 2 | c2 | | b2 | 3 | c3 | | b3 | 1 | c1 | | b3 | 2 | c2 | | b3 | 3 | c3 | +------+------+------+ 9 tuples Mike On Fri, Apr 8, 2011 at 11:50 PM, Daniel Boesswetter < daniel.boesswetter@fu-berlin.de> wrote:

...

Hi again,

now I managed to isolate the problem even more. Attached you will find a SQL script which creates 4 tables, inserts a few tuples and then queries them twice: the first query produces too few rows, but after dropping a primary key on one of the tables (it matters which one), the second query returns the correct results.

Looks like the optimizer optimizes a little too eagerly in the presence of primary keys?

Best regards, Daniel

Daniel Boesswetter wrote:

...
Hi,

we are using MonetDB (Oct-2010) for analyzing a 250G data warehouse and have found at least one "thing" which might be a serious bug. The simplest scenario that I have found to describe it, is as follows:

SELECT comp.id as compid, comp.name, affnets.name as affid FROM toc INNER JOIN app ON toc.application_id=app.id INNER JOIN comp ON app.company_id = comp.id INNER JOIN affnets ON toc.affiliate_network_id=affnets.id GROUP BY compid, comp.name, affid

This query is supposed to return all unique company/affnet pairs which occur in the fact table "toc". However, it returns only the unique company-names, each with a random (the first?) affnet. (Of course, this could be accomplished with DISTINCT, but our aim to to aggregate other fields).

If I change the query to

select app.company_id as compid, comp.name, affnets.name as affid ...

or

select comp.id as compid, comp.name, affnets.name as affid ...

i.e. semantically the same but only with fields from 2 directly joined tables in the GROUP BY, then the result is correct.

Question: Is this a bug or are we missing something?

Thank you in advance!

Daniel

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Franck Routier

5 May 5 May

7:55 a.m.

Hi list, this seems particularly annoying, to say the least, and I don't see any follow up on the list. Is this confirmed and a known problem (maybe with a bug report), or is it solved, or what ? Franck Le samedi 09 avril 2011 à 05:57 +0300, Michael Sioutis a écrit :

...

I am getting same results with latest development release:

+------+------+------+ | name | id | name | +======+======+======+ | b1 | 1 | c1 | | b1 | 2 | c2 | | b1 | 3 | c3 | +------+------+------+ 3 tuples

+------+------+------+ | name | id | name | +======+======+======+ | b1 | 1 | c1 | | b1 | 2 | c2 | | b1 | 3 | c3 | | b2 | 1 | c1 | | b2 | 2 | c2 | | b2 | 3 | c3 | | b3 | 1 | c1 | | b3 | 2 | c2 | | b3 | 3 | c3 | +------+------+------+ 9 tuples

Mike

On Fri, Apr 8, 2011 at 11:50 PM, Daniel Boesswetter wrote: Hi again,

now I managed to isolate the problem even more. Attached you will find a SQL script which creates 4 tables, inserts a few tuples and then queries them twice: the first query produces too few rows, but after dropping a primary key on one of the tables (it matters which one), the second query returns the correct results.

Looks like the optimizer optimizes a little too eagerly in the presence of primary keys?

Best regards, Daniel

Daniel Boesswetter wrote: Hi,

we are using MonetDB (Oct-2010) for analyzing a 250G data warehouse and have found at least one "thing" which might be a serious bug. The simplest scenario that I have found to describe it, is as follows:

SELECT comp.id as compid, comp.name, affnets.name as affid FROM toc INNER JOIN app ON toc.application_id=app.id INNER JOIN comp ON app.company_id = comp.id INNER JOIN affnets ON toc.affiliate_network_id=affnets.id GROUP BY compid, comp.name, affid

This query is supposed to return all unique company/affnet pairs which occur in the fact table "toc". However, it returns only the unique company-names, each with a random (the first?) affnet. (Of course, this could be accomplished with DISTINCT, but our aim to to aggregate other fields).

If I change the query to

select app.company_id as compid, comp.name, affnets.name as affid ...

or

select comp.id as compid, comp.name, affnets.name as affid ...

i.e. semantically the same but only with fields from 2 directly joined tables in the GROUP BY, then the result is correct.

Question: Is this a bug or are we missing something?

Thank you in advance!

Daniel

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Stefan.Manegold＠cwi.nl

2:18 p.m.

Hi, please bear with us. We're also only humans with 24 hours to spend per day. ;-) I suppose this is a bug; looks like it might be a too eager attempt to (in this case, i.e., after the join, incorrectly) exploit the uniqueness of the primary key of table c when determining the groups. Could you --- Franck or Daniel --- please file a bug report using Daniel's detailed example? I'll try to fix it as soon as our agendas allow so ... Thanks, Stefan On Thu, May 05, 2011 at 09:55:37AM +0200, Franck Routier wrote:

...

Hi list,

this seems particularly annoying, to say the least, and I don't see any follow up on the list. Is this confirmed and a known problem (maybe with a bug report), or is it solved, or what ?

Franck

Le samedi 09 avril 2011 à 05:57 +0300, Michael Sioutis a écrit :

...
I am getting same results with latest development release:

+------+------+------+ | name | id | name | +======+======+======+ | b1 | 1 | c1 | | b1 | 2 | c2 | | b1 | 3 | c3 | +------+------+------+ 3 tuples

+------+------+------+ | name | id | name | +======+======+======+ | b1 | 1 | c1 | | b1 | 2 | c2 | | b1 | 3 | c3 | | b2 | 1 | c1 | | b2 | 2 | c2 | | b2 | 3 | c3 | | b3 | 1 | c1 | | b3 | 2 | c2 | | b3 | 3 | c3 | +------+------+------+ 9 tuples

Mike

On Fri, Apr 8, 2011 at 11:50 PM, Daniel Boesswetter wrote: Hi again,

now I managed to isolate the problem even more. Attached you will find a SQL script which creates 4 tables, inserts a few tuples and then queries them twice: the first query produces too few rows, but after dropping a primary key on one of the tables (it matters which one), the second query returns the correct results.

Looks like the optimizer optimizes a little too eagerly in the presence of primary keys?

Best regards, Daniel

Daniel Boesswetter wrote: Hi,

we are using MonetDB (Oct-2010) for analyzing a 250G data warehouse and have found at least one "thing" which might be a serious bug. The simplest scenario that I have found to describe it, is as follows:

SELECT comp.id as compid, comp.name, affnets.name as affid FROM toc INNER JOIN app ON toc.application_id=app.id INNER JOIN comp ON app.company_id = comp.id INNER JOIN affnets ON toc.affiliate_network_id=affnets.id GROUP BY compid, comp.name, affid

This query is supposed to return all unique company/affnet pairs which occur in the fact table "toc". However, it returns only the unique company-names, each with a random (the first?) affnet. (Of course, this could be accomplished with DISTINCT, but our aim to to aggregate other fields).

If I change the query to

select app.company_id as compid, comp.name, affnets.name as affid ...

or

select comp.id as compid, comp.name, affnets.name as affid ...

i.e. semantically the same but only with fields from 2 directly joined tables in the GROUP BY, then the result is correct.

Question: Is this a bug or are we missing something?

Thank you in advance!

Daniel

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Daniel Boesswetter

2:25 p.m.

Stefan: I'll file the bug. Franck: thanks for bringing up the topic again ;-) Best, Daniel Stefan.Manegold@cwi.nl wrote:

...

Hi,

please bear with us. We're also only humans with 24 hours to spend per day. ;-)

I suppose this is a bug; looks like it might be a too eager attempt to (in this case, i.e., after the join, incorrectly) exploit the uniqueness of the primary key of table c when determining the groups.

Could you --- Franck or Daniel --- please file a bug report using Daniel's detailed example?

I'll try to fix it as soon as our agendas allow so ...

Thanks, Stefan

On Thu, May 05, 2011 at 09:55:37AM +0200, Franck Routier wrote:

...
Hi list,

this seems particularly annoying, to say the least, and I don't see any follow up on the list. Is this confirmed and a known problem (maybe with a bug report), or is it solved, or what ?

Franck

Le samedi 09 avril 2011 à 05:57 +0300, Michael Sioutis a écrit :

...
I am getting same results with latest development release:

+------+------+------+ | name | id | name | +======+======+======+ | b1 | 1 | c1 | | b1 | 2 | c2 | | b1 | 3 | c3 | +------+------+------+ 3 tuples

+------+------+------+ | name | id | name | +======+======+======+ | b1 | 1 | c1 | | b1 | 2 | c2 | | b1 | 3 | c3 | | b2 | 1 | c1 | | b2 | 2 | c2 | | b2 | 3 | c3 | | b3 | 1 | c1 | | b3 | 2 | c2 | | b3 | 3 | c3 | +------+------+------+ 9 tuples

Mike

On Fri, Apr 8, 2011 at 11:50 PM, Daniel Boesswetter wrote: Hi again,

now I managed to isolate the problem even more. Attached you will find a SQL script which creates 4 tables, inserts a few tuples and then queries them twice: the first query produces too few rows, but after dropping a primary key on one of the tables (it matters which one), the second query returns the correct results.

Looks like the optimizer optimizes a little too eagerly in the presence of primary keys?

Best regards, Daniel

Daniel Boesswetter wrote: Hi,

we are using MonetDB (Oct-2010) for analyzing a 250G data warehouse and have found at least one "thing" which might be a serious bug. The simplest scenario that I have found to describe it, is as follows:

SELECT comp.id as compid, comp.name, affnets.name as affid FROM toc INNER JOIN app ON toc.application_id=app.id INNER JOIN comp ON app.company_id = comp.id INNER JOIN affnets ON toc.affiliate_network_id=affnets.id GROUP BY compid, comp.name, affid

This query is supposed to return all unique company/affnet pairs which occur in the fact table "toc". However, it returns only the unique company-names, each with a random (the first?) affnet. (Of course, this could be accomplished with DISTINCT, but our aim to to aggregate other fields).

If I change the query to

select app.company_id as compid, comp.name, affnets.name as affid ...

or

select comp.id as compid, comp.name, affnets.name as affid ...

i.e. semantically the same but only with fields from 2 directly joined tables in the GROUP BY, then the result is correct.

Question: Is this a bug or are we missing something?

Thank you in advance!

Daniel

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ Xperia(TM) PLAY It's a major breakthrough. An authentic gaming smartphone on the nation's most reliable network. And it wants your games. http://p.sf.net/sfu/verizon-sfdev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Franck Routier

2:32 p.m.

Hi, Le jeudi 05 mai 2011 à 16:18 +0200, Stefan.Manegold@cwi.nl a écrit :

...

Hi,

please bear with us. We're also only humans with 24 hours to spend per day. ;-)

Sure Stefan, there was no intent to be rude or look demanding. I'm conscious this is a FOSS project, and a very interesting one. It's just that returning unreliable results is rather embarrassing for a database :-)

...

I suppose this is a bug; looks like it might be a too eager attempt to (in this case, i.e., after the join, incorrectly) exploit the uniqueness of the primary key of table c when determining the groups.

Could you --- Franck or Daniel --- please file a bug report using Daniel's detailed example?

I'll let Daniel do this as he picked it up first ! Thanks, Franck

Daniel Boesswetter

2:42 p.m.

Hi, speaking of optimizer bugs, I'm afraid I've got at least one more (some lines removed for readability): sql>select count(distinct x) from a; 3 sql>select count(x) from a; 6 sql>select count(x), count(distinct x) from a; 6|6 The expected answer is "6|3". If everybody agrees, I'll file a bug? Daniel Franck Routier wrote:

...

Hi,

Le jeudi 05 mai 2011 à 16:18 +0200, Stefan.Manegold@cwi.nl a écrit :

...
Hi,

please bear with us. We're also only humans with 24 hours to spend per day. ;-)

Sure Stefan, there was no intent to be rude or look demanding. I'm conscious this is a FOSS project, and a very interesting one. It's just that returning unreliable results is rather embarrassing for a database :-)

...
I suppose this is a bug; looks like it might be a too eager attempt to (in this case, i.e., after the join, incorrectly) exploit the uniqueness of the primary key of table c when determining the groups.

Could you --- Franck or Daniel --- please file a bug report using Daniel's detailed example?

I'll let Daniel do this as he picked it up first !

Thanks,

Franck

------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Stefan.Manegold＠cwi.nl

2:57 p.m.

On Thu, May 05, 2011 at 04:42:48PM +0200, Daniel Boesswetter wrote:

...

Hi,

speaking of optimizer bugs, I'm afraid I've got at least one more (some lines removed for readability):

sql>select count(distinct x) from a;

3

sql>select count(x) from a;

6

sql>select count(x), count(distinct x) from a;

6|6

The expected answer is "6|3".

"nice" catch ...

...

If everybody agrees, I'll file a bug?

yes please --- see also below ... ... indeed "embarrassing" ... :-( Stefan ======== sql>select count(name) from tables; +------+ | L17 | +======+ | 37 | +------+ 1 tuple (3.446ms) sql>select count(distinct name) from tables; +------+ | L20 | +======+ | 31 | +------+ 1 tuple (4.475ms) sql>select count(name) , count(distinct name) from tables; +------+------+ | L21 | L22 | +======+======+ | 37 | 37 | +------+------+ 1 tuple (4.061ms) sql>PLAN select count(name) from tables; +------------------------------------------------------------------------------------------------------------------------------------------+ | rel | +==========================================================================================================================================+ | project ( | | | group by ( | | | | project ( | | | | | union ( | | | | | | project ( | | | | | | | select ( | | | | | | | | table(sys._tables) [ _tables.id as p.id, _tables.name as p.name, _tables.type as p.type, _tables.%TID% NOT NULL as p.%TID% ] | | | | | | | ) [ p.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ p.id, p.name, p.type ], | | | | | | project ( | | | | | | | select ( | | | | | | | | table(tmp._tables) [ _tables.id as t.id, _tables.name as t.name, _tables.type as t.type, _tables.%TID% NOT NULL as t.%TID% ] | | | | | | | ) [ t.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ t.id, t.name, t.type ] | | | | | ) [ id as tables.id, name as tables.name, type as tables.type ] | | | | ) [ tables.id, tables.name ] | | | ) [ ] [ count no nil (tables.name) as L23 ] | ^^^^^^^^^^^^ | ) [ L23 ] | +------------------------------------------------------------------------------------------------------------------------------------------+ 18 tuples (0.973ms) sql>PLAN select count(distinct name) from tables; +------------------------------------------------------------------------------------------------------------------------------------------+ | rel | +==========================================================================================================================================+ | project ( | | | group by ( | | | | project ( | | | | | union ( | | | | | | project ( | | | | | | | select ( | | | | | | | | table(sys._tables) [ _tables.id as p.id, _tables.name as p.name, _tables.type as p.type, _tables.%TID% NOT NULL as p.%TID% ] | | | | | | | ) [ p.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ p.id, p.name, p.type ], | | | | | | project ( | | | | | | | select ( | | | | | | | | table(tmp._tables) [ _tables.id as t.id, _tables.name as t.name, _tables.type as t.type, _tables.%TID% NOT NULL as t.%TID% ] | | | | | | | ) [ t.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ t.id, t.name, t.type ] | | | | | ) [ id as tables.id, name as tables.name, type as tables.type ] | | | | ) [ tables.id, tables.name ] | | | ) [ ] [ count unique no nil (tables.name) as L24 ] | ^^^^^^!!!!!!^^^^^^^^ | ) [ L24 ] | +------------------------------------------------------------------------------------------------------------------------------------------+ 18 tuples (0.874ms) sql>PLAN select count(name) , count(distinct name) from tables; +------------------------------------------------------------------------------------------------------------------------------------------+ | rel | +==========================================================================================================================================+ | project ( | | | group by ( | | | | project ( | | | | | union ( | | | | | | project ( | | | | | | | select ( | | | | | | | | table(sys._tables) [ _tables.id as p.id, _tables.name as p.name, _tables.type as p.type, _tables.%TID% NOT NULL as p.%TID% ] | | | | | | | ) [ p.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ p.id, p.name, p.type ], | | | | | | project ( | | | | | | | select ( | | | | | | | | table(tmp._tables) [ _tables.id as t.id, _tables.name as t.name, _tables.type as t.type, _tables.%TID% NOT NULL as t.%TID% ] | | | | | | | ) [ t.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ t.id, t.name, t.type ] | | | | | ) [ id as tables.id, name as tables.name, type as tables.type ] | | | | ) [ tables.id, tables.name ] | | | ) [ ] [ count no nil (tables.name) as L25, L25 as L26 ] | ^^^^^^^^^^^^ !!!!!!!!!! | ) [ L25, L26 ] | +------------------------------------------------------------------------------------------------------------------------------------------+ 18 tuples (0.351ms) ========

...

Daniel

Franck Routier wrote:

...
Hi,

Le jeudi 05 mai 2011 à 16:18 +0200, Stefan.Manegold@cwi.nl a écrit :

...
Hi,

please bear with us. We're also only humans with 24 hours to spend per day. ;-)

Sure Stefan, there was no intent to be rude or look demanding. I'm conscious this is a FOSS project, and a very interesting one. It's just that returning unreliable results is rather embarrassing for a database :-)

...
I suppose this is a bug; looks like it might be a too eager attempt to (in this case, i.e., after the join, incorrectly) exploit the uniqueness of the primary key of table c when determining the groups.

Could you --- Franck or Daniel --- please file a bug report using Daniel's detailed example?

I'll let Daniel do this as he picked it up first !

Thanks,

Franck

------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Daniel Boesswetter

3:10 p.m.

Hi Stefan, I filed that one as well ... Why didn't you tell me that there's a PLAN command? ;-) I've been struggling with EXPLAIN and so far I did not find it very useful, but PLAN looks promising .... Best regards, Daniel Stefan.Manegold@cwi.nl wrote:

...

On Thu, May 05, 2011 at 04:42:48PM +0200, Daniel Boesswetter wrote:

...
Hi,

speaking of optimizer bugs, I'm afraid I've got at least one more (some lines removed for readability):

sql>select count(distinct x) from a;

3

sql>select count(x) from a;

6

sql>select count(x), count(distinct x) from a;

6|6

The expected answer is "6|3".

"nice" catch ...

...
If everybody agrees, I'll file a bug?

yes please --- see also below ...

... indeed "embarrassing" ... :-(

Stefan

======== sql>select count(name) from tables; +------+ | L17 | +======+ | 37 | +------+ 1 tuple (3.446ms) sql>select count(distinct name) from tables; +------+ | L20 | +======+ | 31 | +------+ 1 tuple (4.475ms) sql>select count(name) , count(distinct name) from tables; +------+------+ | L21 | L22 | +======+======+ | 37 | 37 | +------+------+ 1 tuple (4.061ms) sql>PLAN select count(name) from tables; +------------------------------------------------------------------------------------------------------------------------------------------+ | rel | +==========================================================================================================================================+ | project ( | | | group by ( | | | | project ( | | | | | union ( | | | | | | project ( | | | | | | | select ( | | | | | | | | table(sys._tables) [ _tables.id as p.id, _tables.name as p.name, _tables.type as p.type, _tables.%TID% NOT NULL as p.%TID% ] | | | | | | | ) [ p.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ p.id, p.name, p.type ], | | | | | | project ( | | | | | | | select ( | | | | | | | | table(tmp._tables) [ _tables.id as t.id, _tables.name as t.name, _tables.type as t.type, _tables.%TID% NOT NULL as t.%TID% ] | | | | | | | ) [ t.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ t.id, t.name, t.type ] | | | | | ) [ id as tables.id, name as tables.name, type as tables.type ] | | | | ) [ tables.id, tables.name ] | | | ) [ ] [ count no nil (tables.name) as L23 ] | ^^^^^^^^^^^^ | ) [ L23 ] | +------------------------------------------------------------------------------------------------------------------------------------------+ 18 tuples (0.973ms) sql>PLAN select count(distinct name) from tables; +------------------------------------------------------------------------------------------------------------------------------------------+ | rel | +==========================================================================================================================================+ | project ( | | | group by ( | | | | project ( | | | | | union ( | | | | | | project ( | | | | | | | select ( | | | | | | | | table(sys._tables) [ _tables.id as p.id, _tables.name as p.name, _tables.type as p.type, _tables.%TID% NOT NULL as p.%TID% ] | | | | | | | ) [ p.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ p.id, p.name, p.type ], | | | | | | project ( | | | | | | | select ( | | | | | | | | table(tmp._tables) [ _tables.id as t.id, _tables.name as t.name, _tables.type as t.type, _tables.%TID% NOT NULL as t.%TID% ] | | | | | | | ) [ t.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ t.id, t.name, t.type ] | | | | | ) [ id as tables.id, name as tables.name, type as tables.type ] | | | | ) [ tables.id, tables.name ] | | | ) [ ] [ count unique no nil (tables.name) as L24 ] | ^^^^^^!!!!!!^^^^^^^^ | ) [ L24 ] | +------------------------------------------------------------------------------------------------------------------------------------------+ 18 tuples (0.874ms) sql>PLAN select count(name) , count(distinct name) from tables; +------------------------------------------------------------------------------------------------------------------------------------------+ | rel | +==========================================================================================================================================+ | project ( | | | group by ( | | | | project ( | | | | | union ( | | | | | | project ( | | | | | | | select ( | | | | | | | | table(sys._tables) [ _tables.id as p.id, _tables.name as p.name, _tables.type as p.type, _tables.%TID% NOT NULL as p.%TID% ] | | | | | | | ) [ p.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ p.id, p.name, p.type ], | | | | | | project ( | | | | | | | select ( | | | | | | | | table(tmp._tables) [ _tables.id as t.id, _tables.name as t.name, _tables.type as t.type, _tables.%TID% NOT NULL as t.%TID% ] | | | | | | | ) [ t.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ t.id, t.name, t.type ] | | | | | ) [ id as tables.id, name as tables.name, type as tables.type ] | | | | ) [ tables.id, tables.name ] | | | ) [ ] [ count no nil (tables.name) as L25, L25 as L26 ] | ^^^^^^^^^^^^ !!!!!!!!!! | ) [ L25, L26 ] | +------------------------------------------------------------------------------------------------------------------------------------------+ 18 tuples (0.351ms) ========

...
Daniel

Franck Routier wrote:

...
Hi,

Le jeudi 05 mai 2011 à 16:18 +0200, Stefan.Manegold@cwi.nl a écrit :

...
Hi,

please bear with us. We're also only humans with 24 hours to spend per day. ;-)

Sure Stefan, there was no intent to be rude or look demanding. I'm conscious this is a FOSS project, and a very interesting one. It's just that returning unreliable results is rather embarrassing for a database :-)

...
I suppose this is a bug; looks like it might be a too eager attempt to (in this case, i.e., after the join, incorrectly) exploit the uniqueness of the primary key of table c when determining the groups.

Could you --- Franck or Daniel --- please file a bug report using Daniel's detailed example?

I'll let Daniel do this as he picked it up first !

Thanks,

Franck

------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Stefan.Manegold＠cwi.nl

3:15 p.m.

On Thu, May 05, 2011 at 05:10:18PM +0200, Daniel Boesswetter wrote:

...

Hi Stefan,

I filed that one as well ...

Thanks!

...

Why didn't you tell me that there's a PLAN command? ;-)

don't recall you asked ;-) I admit it's not mentioned on the web site (yet?) http://monetdb.cwi.nl/SQL/Documentation/SQL-Runtime-Features.html but it has been mentioned once in a while on this list ... anyway, we'll also try to improve documentation ... Stefan

...

I've been struggling with EXPLAIN and so far I did not find it very useful, but PLAN looks promising ....

Best regards, Daniel

Stefan.Manegold@cwi.nl wrote:

...
On Thu, May 05, 2011 at 04:42:48PM +0200, Daniel Boesswetter wrote:

...
Hi,

speaking of optimizer bugs, I'm afraid I've got at least one more (some lines removed for readability):

sql>select count(distinct x) from a;

3

sql>select count(x) from a;

6

sql>select count(x), count(distinct x) from a;

6|6

The expected answer is "6|3".

"nice" catch ...

...
If everybody agrees, I'll file a bug?

yes please --- see also below ...

... indeed "embarrassing" ... :-(

Stefan

======== sql>select count(name) from tables; +------+ | L17 | +======+ | 37 | +------+ 1 tuple (3.446ms) sql>select count(distinct name) from tables; +------+ | L20 | +======+ | 31 | +------+ 1 tuple (4.475ms) sql>select count(name) , count(distinct name) from tables; +------+------+ | L21 | L22 | +======+======+ | 37 | 37 | +------+------+ 1 tuple (4.061ms) sql>PLAN select count(name) from tables; +------------------------------------------------------------------------------------------------------------------------------------------+ | rel | +==========================================================================================================================================+ | project ( | | | group by ( | | | | project ( | | | | | union ( | | | | | | project ( | | | | | | | select ( | | | | | | | | table(sys._tables) [ _tables.id as p.id, _tables.name as p.name, _tables.type as p.type, _tables.%TID% NOT NULL as p.%TID% ] | | | | | | | ) [ p.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ p.id, p.name, p.type ], | | | | | | project ( | | | | | | | select ( | | | | | | | | table(tmp._tables) [ _tables.id as t.id, _tables.name as t.name, _tables.type as t.type, _tables.%TID% NOT NULL as t.%TID% ] | | | | | | | ) [ t.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ t.id, t.name, t.type ] | | | | | ) [ id as tables.id, name as tables.name, type as tables.type ] | | | | ) [ tables.id, tables.name ] | | | ) [ ] [ count no nil (tables.name) as L23 ] | ^^^^^^^^^^^^ | ) [ L23 ] | +------------------------------------------------------------------------------------------------------------------------------------------+ 18 tuples (0.973ms) sql>PLAN select count(distinct name) from tables; +------------------------------------------------------------------------------------------------------------------------------------------+ | rel | +==========================================================================================================================================+ | project ( | | | group by ( | | | | project ( | | | | | union ( | | | | | | project ( | | | | | | | select ( | | | | | | | | table(sys._tables) [ _tables.id as p.id, _tables.name as p.name, _tables.type as p.type, _tables.%TID% NOT NULL as p.%TID% ] | | | | | | | ) [ p.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ p.id, p.name, p.type ], | | | | | | project ( | | | | | | | select ( | | | | | | | | table(tmp._tables) [ _tables.id as t.id, _tables.name as t.name, _tables.type as t.type, _tables.%TID% NOT NULL as t.%TID% ] | | | | | | | ) [ t.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ t.id, t.name, t.type ] | | | | | ) [ id as tables.id, name as tables.name, type as tables.type ] | | | | ) [ tables.id, tables.name ] | | | ) [ ] [ count unique no nil (tables.name) as L24 ] | ^^^^^^!!!!!!^^^^^^^^ | ) [ L24 ] | +------------------------------------------------------------------------------------------------------------------------------------------+ 18 tuples (0.874ms) sql>PLAN select count(name) , count(distinct name) from tables; +------------------------------------------------------------------------------------------------------------------------------------------+ | rel | +==========================================================================================================================================+ | project ( | | | group by ( | | | | project ( | | | | | union ( | | | | | | project ( | | | | | | | select ( | | | | | | | | table(sys._tables) [ _tables.id as p.id, _tables.name as p.name, _tables.type as p.type, _tables.%TID% NOT NULL as p.%TID% ] | | | | | | | ) [ p.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ p.id, p.name, p.type ], | | | | | | project ( | | | | | | | select ( | | | | | | | | table(tmp._tables) [ _tables.id as t.id, _tables.name as t.name, _tables.type as t.type, _tables.%TID% NOT NULL as t.%TID% ] | | | | | | | ) [ t.type < convert(2 from tinyint to smallint) ] | | | | | | ) [ t.id, t.name, t.type ] | | | | | ) [ id as tables.id, name as tables.name, type as tables.type ] | | | | ) [ tables.id, tables.name ] | | | ) [ ] [ count no nil (tables.name) as L25, L25 as L26 ] | ^^^^^^^^^^^^ !!!!!!!!!! | ) [ L25, L26 ] | +------------------------------------------------------------------------------------------------------------------------------------------+ 18 tuples (0.351ms) ========

...
Daniel

Franck Routier wrote:

...
Hi,

Le jeudi 05 mai 2011 à 16:18 +0200, Stefan.Manegold@cwi.nl a écrit :

...
Hi,

please bear with us. We're also only humans with 24 hours to spend per day. ;-)

Sure Stefan, there was no intent to be rude or look demanding. I'm conscious this is a FOSS project, and a very interesting one. It's just that returning unreliable results is rather embarrassing for a database :-)

...
I suppose this is a bug; looks like it might be a too eager attempt to (in this case, i.e., after the join, incorrectly) exploit the uniqueness of the primary key of table c when determining the groups.

Could you --- Franck or Daniel --- please file a bug report using Daniel's detailed example?

I'll let Daniel do this as he picked it up first !

Thanks,

Franck

------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

------------------------------------------------------------------------------ WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

5057

Age (days ago)

5085

Last active (days ago)

List overview

Download

10 comments

4 participants

participants (4)

Daniel Boesswetter
Franck Routier
Michael Sioutis
Stefan.Manegold＠cwi.nl