Re: Bug grouping on string column, only when results filtered
as string-related bug has recently been fixed (see MonetDB bug tracker; I don't recall the bug ID right now) and will be shipped with the upcoming Feb2013-SP2 release, a preview of which is avilable from the testing repository.
Stefan
Pete Hollobon
Hello Pete,
thank you for your bug report and the effort you have put into making this reproducible for us. However, having followed your instructions with both the Feb2013 version as well as the current development branch, I was not able to also provoke this issue.
I got the following results (Both Feb2013 and default branch):
➜ ~ mclient bug -s "select a, count(*) from test where b = 20227 group by a" +------+--------+ | a | L1 | +======+========+ | 2T | 193058 | | 1T | 202267 | | 5T | 87038 | | 4T | 100302 | | 3T | 142326 | | 15T | 95 | | 20T | 95 | | 30T | 14277 | | 7T | 767 | | 6T | 22863 |
| 10T | 127 | | 6V | 216594 | | 9T | 104 | | 8T | 88 | +------+--------+ 14 tuples
➜ ~ mclient bug -s "select a, count(*) from test group by a"
+------+---------+ | a | L1 | +======+=========+ | 2T | 5799418 | | 1T | 6072825 | | 5T | 2605374 | | 4T | 2998157 | | 3T | 4262607 | | 15T | 2703 | | 20T | 2818 | | 30T | 497657 | | 7T | 22460 | | 6T | 686789 | | 10T | 3496 | | 6V | 6550540 | | 9T | 1853 | | 8T | 1639 | +------+---------+ 14 tuples
Can you perhaps give more information abuont your environment? For example, this a 64 or 32 bit build? You could also try the development default branch and see if the problem also occurs there.
Best,
Hannes
On 17/04/13 10:33, Pete Hollobon wrote:
Hi,
I think I've found a bug in MonetDB. It only occurs when grouping by a text column and filtering the results; with no "where" clause, the results are fine. The table is fairly large (~30 million rows).
~% ./generate_test_csv.py test.csv ~% mclient bug -s "create table test (b integer, a varchar(5))" operation successful ~% mclient bug -s "copy into test FROM '/home/pete/test.csv' DELIMITERS ','" 29508336 affected rows
There are no nulls or empty strings in column "a", however:
~% mclient bug -s "select a, count(*) from test where b = 20227 group by a" +------+--------+ | a | L1 | +======+========+ | | 763088 | | 10T | 127 | | 6V | 216594 | | 9T | 104 | | 8T | 88 | +------+--------+
When the "where" clause is removed, no nulls are returned:
~% mclient bug -s "select a, count(*) from test group by a" +------+---------+ | a | L1 | +======+=========+ | 2T | 5799418 | | 1T | 6072825 | | 5T | 2605374 | | 4T | 2998157 | | 3T | 4262607 | | 15T | 2703 | | 20T | 2818 | | 30T | 497657 | | 7T | 22460 | | 6T | 686789 | | 10T | 3496 | | 6V | 6550540 | | 9T | 1853 | | 8T | 1639 | +------+---------+ 14 tuples
I have tested with the latest MonetDB Ubuntu debs and also with a local build of the Hg Feb2013 branch tip. Both exhibit the same problem. I'm running Linux (Ubuntu 12.10) on a 6 core Xeon with 16Gb RAM. I've tried with minimal_pipe and the results are the same.
Please let me know if any other information would be useful.
Regards, Pete.
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
participants (1)
-
Stefan Manegold