
On Mon, May 17, 2010 at 06:23:53PM +0200, Stefan Manegold wrote:
On Mon, May 17, 2010 at 01:07:50PM +0100, Sam Mason wrote:
I've just pulled a largish dataset into monetdb (419851473 rows, 8 columns) and it's crashing when I try to do the following:
does the server carsh, or does it "merely" issue an error on this query, but the server keep running?
Sorry for the imprecision, for this query it just gives an error back. It sometimes crashes, but haven't found a repeatable test case so was just reporting the unexpected error message.
select count(distinct a) from calldata;
"a" is an integer column and monetdb gets the right answer of 886693 when I run either:
select count(a) from (select a from calldata group by a) x; or select count(a) from (select distinct a from calldata) x;
I'm running the latest 64bit .deb packages under ubuntu 10.4. The following is what I get back from mclient:
Just for info: did you try with earlier versions before? If so, did it work? (You don't have to try it, if you didn't ...)
Nope, I investigated monetdb a while ago under 32bit linux and gave up as working with larger datasets needed too much handholding.
I've not got much swap space, does it just happen to need it for the first version but not the other versions?
It should not use swap space, but memory mapped files, i.e., disk space in your file system where your dbfarm is located.
Good to know, was mainly wondering what it did for intermediate results. For the main tables I'd expect it to use memory mapped files, but wasn't sure what it did at other times.
And indeed, if the two aboe version work, than also the count(distinct ..) should work --- possibly some yet undetected glitch in the query translation / optimization --- you can check the PLAN or EXPLAIN of all three variant to see whether/how they differ. We'll try to do the same (without your data) to see what we can find ...
Not tried to read Monetdb's plans before, they look good but slightly alien at the moment. I'm slightly surprised to see "distinct" seperate from "group by", I thought they have the same semantics. Have put the plan's at the bottom and attached the log of doing explains. -- Sam http://samason.me.uk/ plan select count(distinct a) from calldata; project ( | group by ( | | table(smason.calldata) [ calldata.a, calldata.%TID% NOT NULL ] | ) [ ] [ count unique no nil (calldata.a) as L7 ] ) [ L7 ] plan select count(a) from (select distinct a from calldata) x; project ( | group by ( | | distinct project ( | | | table(smason.calldata) [ calldata.a, calldata.%TID% NOT NULL ] | | ) [ calldata.a as x.a ] | ) [ ] [ count no nil (x.a) as L12 ] ) [ L12 ] plan select count(a) from (select a from calldata group by a) x; project ( | group by ( | | project ( | | | group by ( | | | | table(smason.calldata) [ calldata.a, calldata.%TID% NOT NULL ] | | | ) [ calldata.a ] [ calldata.a ] | | ) [ calldata.a as x.a ] | ) [ ] [ count no nil (x.a) as L13 ] ) [ L13 ]