Will, did / can you run your first (slow) query twice in a row and report the speeds of both runs? Thanks! Stefan ----- Original Message -----
On 29/11/13 11:36, Will Muldrew wrote:
Hi
I've been trying out monetdb on a 350M row table (modern linux workstation with 24GB RAM + SSD). This table has a sorted TIMESTAMP column and a second unsorted VARCHAR(6) column which is non-unique (perhaps 50M distinct values fairly randomly distributed).
This query is very slow, spends huge amount of time writing, and has very low CPU (intermediate results?):
select count(*) from t_order where orderId = 'XXXXXX'
This takes hours to complete.
However, this query is a lot faster, even though the timestamp clause is redundant (all my data is > '2012-01-01'): It decided it can use parallel processing effectively. Why it did not in the first query is unknown.
select count(*) from t_order where orderId = 'XXXXXX' and time > '1970-01-01';
The first run is slowish (10s of seconds - I guess because we're reading the orderId column fully), however subsequent runs are sub-second.
Yes, that is caching effect.
Obviously I could write an app which puts the spurious timestamp clause into the where, but it's still confusing and not ideal. Is there something I can do?
regards, Martin _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |