Hi All, I did some experiments with monetdb over the weekend and found some strange and unexpected behavior. My database has a single table which stores stock trades. The table is defined as follows: create table trades( td varchar(2), // just a dummy field date int, time int, ticker varchar(17), price double, qty int, vol int, conditioncode int, exch varchar(8)); This table has a little under 1 Billion rows and takes up around 52 gigabytes. The source data is also around 48 gigabytes. I expected monetdb to aggressively compress the columns and reduce the disk usage by 10 fold or so! In any case, that is not the main issue. [select sum(date), sum(time), count(ticker) from trades where date = 20120501;] This query took 1.1 seconds the first time and around 173 ms the second two times (count(ticker) returned 31,268,285). [select sum(date), sum(time), count(ticker) from trades where date = 20120501 and time > 34200000;] This query took about 250 ms the first, second and third times I ran it. count(ticker) returned 31,176,190. (I doing the same thing as previous query, but asking to ignore all trades before the market opened...obviously very very few trades occur before the market opens). [select sum(date), sum(time), count(ticker) from trades where date = 20120501 and time > 34200000 and time < 37800000;] This query took 1 minute and 15 seconds! The second time I ran it, it took 1 minute and 11 seconds. Notice that count(ticker) in this query only returned 5,969,337. This query is essentially the same as the previous one, but now I am only looking to process trades between 9:30 am and 10:30 am. So this query processes 1 hours' worth of trades while previous queries process the whole day. Notice that all three queries process the same columns. Between the second and third columns, all the columns used in the select clause and where clause are exactly the same. The second query just add an additional filter on the time column. Why is the third query two orders of magnitude slower than the second query? This experiment was done on an ubuntu server:
uname -a Linux abcdefg 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
cat /proc/meminfo |fgrep Mem MemTotal: 8168280 kB MemFree: 3588848 kB
hdparm -t ...disk... Timing buffered disk reads: 510 MB in 3.00 seconds = 169.80 MB/sec
Please let me know if you require more information. Thanks, Shahbaz