Bug grouping on string column, only when results filtered
Hi, I think I've found a bug in MonetDB. It only occurs when grouping by a text column and filtering the results; with no "where" clause, the results are fine. The table is fairly large (~30 million rows). ~% ./generate_test_csv.py test.csv ~% mclient bug -s "create table test (b integer, a varchar(5))" operation successful ~% mclient bug -s "copy into test FROM '/home/pete/test.csv' DELIMITERS ','" 29508336 affected rows There are no nulls or empty strings in column "a", however: ~% mclient bug -s "select a, count(*) from test where b = 20227 group by a" +------+--------+ | a | L1 | +======+========+ | | 763088 | | 10T | 127 | | 6V | 216594 | | 9T | 104 | | 8T | 88 | +------+--------+ When the "where" clause is removed, no nulls are returned: ~% mclient bug -s "select a, count(*) from test group by a" +------+---------+ | a | L1 | +======+=========+ | 2T | 5799418 | | 1T | 6072825 | | 5T | 2605374 | | 4T | 2998157 | | 3T | 4262607 | | 15T | 2703 | | 20T | 2818 | | 30T | 497657 | | 7T | 22460 | | 6T | 686789 | | 10T | 3496 | | 6V | 6550540 | | 9T | 1853 | | 8T | 1639 | +------+---------+ 14 tuples I have tested with the latest MonetDB Ubuntu debs and also with a local build of the Hg Feb2013 branch tip. Both exhibit the same problem. I'm running Linux (Ubuntu 12.10) on a 6 core Xeon with 16Gb RAM. I've tried with minimal_pipe and the results are the same. Please let me know if any other information would be useful. Regards, Pete.
Hello Pete, thank you for your bug report and the effort you have put into making this reproducible for us. However, having followed your instructions with both the Feb2013 version as well as the current development branch, I was not able to also provoke this issue. I got the following results (Both Feb2013 and default branch): ➜ ~ mclient bug -s "select a, count(*) from test where b = 20227 group by a" +------+--------+ | a | L1 | +======+========+ | 2T | 193058 | | 1T | 202267 | | 5T | 87038 | | 4T | 100302 | | 3T | 142326 | | 15T | 95 | | 20T | 95 | | 30T | 14277 | | 7T | 767 | | 6T | 22863 | | 10T | 127 | | 6V | 216594 | | 9T | 104 | | 8T | 88 | +------+--------+ 14 tuples ➜ ~ mclient bug -s "select a, count(*) from test group by a" +------+---------+ | a | L1 | +======+=========+ | 2T | 5799418 | | 1T | 6072825 | | 5T | 2605374 | | 4T | 2998157 | | 3T | 4262607 | | 15T | 2703 | | 20T | 2818 | | 30T | 497657 | | 7T | 22460 | | 6T | 686789 | | 10T | 3496 | | 6V | 6550540 | | 9T | 1853 | | 8T | 1639 | +------+---------+ 14 tuples Can you perhaps give more information abuont your environment? For example, this a 64 or 32 bit build? You could also try the development default branch and see if the problem also occurs there. Best, Hannes On 17/04/13 10:33, Pete Hollobon wrote:
Hi,
I think I've found a bug in MonetDB. It only occurs when grouping by a text column and filtering the results; with no "where" clause, the results are fine. The table is fairly large (~30 million rows).
~% ./generate_test_csv.py test.csv ~% mclient bug -s "create table test (b integer, a varchar(5))" operation successful ~% mclient bug -s "copy into test FROM '/home/pete/test.csv' DELIMITERS ','" 29508336 affected rows
There are no nulls or empty strings in column "a", however:
~% mclient bug -s "select a, count(*) from test where b = 20227 group by a" +------+--------+ | a | L1 | +======+========+ | | 763088 | | 10T | 127 | | 6V | 216594 | | 9T | 104 | | 8T | 88 | +------+--------+
When the "where" clause is removed, no nulls are returned:
~% mclient bug -s "select a, count(*) from test group by a" +------+---------+ | a | L1 | +======+=========+ | 2T | 5799418 | | 1T | 6072825 | | 5T | 2605374 | | 4T | 2998157 | | 3T | 4262607 | | 15T | 2703 | | 20T | 2818 | | 30T | 497657 | | 7T | 22460 | | 6T | 686789 | | 10T | 3496 | | 6V | 6550540 | | 9T | 1853 | | 8T | 1639 | +------+---------+ 14 tuples
I have tested with the latest MonetDB Ubuntu debs and also with a local build of the Hg Feb2013 branch tip. Both exhibit the same problem. I'm running Linux (Ubuntu 12.10) on a 6 core Xeon with 16Gb RAM. I've tried with minimal_pipe and the results are the same.
Please let me know if any other information would be useful.
Regards, Pete.
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
Hi Hannes,
Thanks for looking into this.
I'm using a 64-bit build. Here's my uname and the versions from the
MonetDB binaries.
~% uname -a
Linux lx1 3.5.0-17-generic #28-Ubuntu SMP Tue Oct 9 19:31:23 UTC 2012
x86_64 x86_64 x86_64 GNU/Linux
~% /usr/bin/monetdbd version
MonetDB Database Server v1.7 (Feb2013-SP1)
~% /usr/bin/monetdb version
MonetDB Database Server Toolkit v1.1 (Feb2013-SP1)
~% /usr/bin/mclient --version
mclient, the MonetDB interactive terminal (Feb2013-SP1)
support for command-line editing compiled-in
character encoding: UTF-8
However, I realised that my local Feb2013 build was using the shared
libraries from the Ubuntu debs. I fixed this, and that build now seems
to be working properly.
I can still reproduce it with the distributed binaries though.
Whatever bug it was must have been fixed already - apologies for
wasting your time.
Regards,
Pete.
On 17 April 2013 12:56, Hannes Mühleisen
Hello Pete,
thank you for your bug report and the effort you have put into making this reproducible for us. However, having followed your instructions with both the Feb2013 version as well as the current development branch, I was not able to also provoke this issue.
I got the following results (Both Feb2013 and default branch):
➜ ~ mclient bug -s "select a, count(*) from test where b = 20227 group by a" +------+--------+ | a | L1 | +======+========+ | 2T | 193058 | | 1T | 202267 | | 5T | 87038 | | 4T | 100302 | | 3T | 142326 | | 15T | 95 | | 20T | 95 | | 30T | 14277 | | 7T | 767 | | 6T | 22863 |
| 10T | 127 | | 6V | 216594 | | 9T | 104 | | 8T | 88 | +------+--------+ 14 tuples
➜ ~ mclient bug -s "select a, count(*) from test group by a"
+------+---------+ | a | L1 | +======+=========+ | 2T | 5799418 | | 1T | 6072825 | | 5T | 2605374 | | 4T | 2998157 | | 3T | 4262607 | | 15T | 2703 | | 20T | 2818 | | 30T | 497657 | | 7T | 22460 | | 6T | 686789 | | 10T | 3496 | | 6V | 6550540 | | 9T | 1853 | | 8T | 1639 | +------+---------+ 14 tuples
Can you perhaps give more information abuont your environment? For example, this a 64 or 32 bit build? You could also try the development default branch and see if the problem also occurs there.
Best,
Hannes
On 17/04/13 10:33, Pete Hollobon wrote:
Hi,
I think I've found a bug in MonetDB. It only occurs when grouping by a text column and filtering the results; with no "where" clause, the results are fine. The table is fairly large (~30 million rows).
~% ./generate_test_csv.py test.csv ~% mclient bug -s "create table test (b integer, a varchar(5))" operation successful ~% mclient bug -s "copy into test FROM '/home/pete/test.csv' DELIMITERS ','" 29508336 affected rows
There are no nulls or empty strings in column "a", however:
~% mclient bug -s "select a, count(*) from test where b = 20227 group by a" +------+--------+ | a | L1 | +======+========+ | | 763088 | | 10T | 127 | | 6V | 216594 | | 9T | 104 | | 8T | 88 | +------+--------+
When the "where" clause is removed, no nulls are returned:
~% mclient bug -s "select a, count(*) from test group by a" +------+---------+ | a | L1 | +======+=========+ | 2T | 5799418 | | 1T | 6072825 | | 5T | 2605374 | | 4T | 2998157 | | 3T | 4262607 | | 15T | 2703 | | 20T | 2818 | | 30T | 497657 | | 7T | 22460 | | 6T | 686789 | | 10T | 3496 | | 6V | 6550540 | | 9T | 1853 | | 8T | 1639 | +------+---------+ 14 tuples
I have tested with the latest MonetDB Ubuntu debs and also with a local build of the Hg Feb2013 branch tip. Both exhibit the same problem. I'm running Linux (Ubuntu 12.10) on a 6 core Xeon with 16Gb RAM. I've tried with minimal_pipe and the results are the same.
Please let me know if any other information would be useful.
Regards, Pete.
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Hannes Mühleisen
-
Pete Hollobon