[MonetDB-users] Expected query to be much faster, but is actually two orders of magnitude slower!
Hi All, I did some experiments with monetdb over the weekend and found some strange and unexpected behavior. My database has a single table which stores stock trades. The table is defined as follows: create table trades( td varchar(2), // just a dummy field date int, time int, ticker varchar(17), price double, qty int, vol int, conditioncode int, exch varchar(8)); This table has a little under 1 Billion rows and takes up around 52 gigabytes. The source data is also around 48 gigabytes. I expected monetdb to aggressively compress the columns and reduce the disk usage by 10 fold or so! In any case, that is not the main issue. [select sum(date), sum(time), count(ticker) from trades where date = 20120501;] This query took 1.1 seconds the first time and around 173 ms the second two times (count(ticker) returned 31,268,285). [select sum(date), sum(time), count(ticker) from trades where date = 20120501 and time > 34200000;] This query took about 250 ms the first, second and third times I ran it. count(ticker) returned 31,176,190. (I doing the same thing as previous query, but asking to ignore all trades before the market opened...obviously very very few trades occur before the market opens). [select sum(date), sum(time), count(ticker) from trades where date = 20120501 and time > 34200000 and time < 37800000;] This query took 1 minute and 15 seconds! The second time I ran it, it took 1 minute and 11 seconds. Notice that count(ticker) in this query only returned 5,969,337. This query is essentially the same as the previous one, but now I am only looking to process trades between 9:30 am and 10:30 am. So this query processes 1 hours' worth of trades while previous queries process the whole day. Notice that all three queries process the same columns. Between the second and third columns, all the columns used in the select clause and where clause are exactly the same. The second query just add an additional filter on the time column. Why is the third query two orders of magnitude slower than the second query? This experiment was done on an ubuntu server:
uname -a Linux abcdefg 3.2.0-25-generic #40-Ubuntu SMP Wed May 23 20:30:51 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux
cat /proc/meminfo |fgrep Mem MemTotal: 8168280 kB MemFree: 3588848 kB
hdparm -t ...disk... Timing buffered disk reads: 510 MB in 3.00 seconds = 169.80 MB/sec
Please let me know if you require more information. Thanks, Shahbaz
Hi Shahbaz, On 01-07-2012 12:22:02 -0400, Shahbaz wrote:
[select sum(date), sum(time), count(ticker) from trades where date = 20120501 and time > 34200000 and time < 37800000;]
This query took 1 minute and 15 seconds! The second time I ran it, it took 1 minute and 11 seconds. Notice that count(ticker) in this query only returned 5,969,337. This query is essentially the same as the previous one, but now I am only looking to process trades between 9:30 am and 10:30 am. So this query processes 1 hours' worth of trades while previous queries process the whole day.
I cannot give any hints here, but this feels like a bug somehow. Could you open a bug for this on our bugtracker (bugs.monetdb.org), and attach the output of your fast and slow query when prefixed with "EXPLAIN"? Thanks, Fabian
On Thu, Jul 05, 2012 at 10:58:42AM +0200, Fabian Groffen wrote:
Hi Shahbaz,
On 01-07-2012 12:22:02 -0400, Shahbaz wrote:
[select sum(date), sum(time), count(ticker) from trades where date = 20120501 and time > 34200000 and time < 37800000;]
This query took 1 minute and 15 seconds! The second time I ran it, it took 1 minute and 11 seconds. Notice that count(ticker) in this query only returned 5,969,337. This query is essentially the same as the previous one, but now I am only looking to process trades between 9:30 am and 10:30 am. So this query processes 1 hours' worth of trades while previous queries process the whole day.
I cannot give any hints here, but this feels like a bug somehow. Could you open a bug for this on our bugtracker (bugs.monetdb.org), and attach the output of your fast and slow query when prefixed with "EXPLAIN"?
if possible, please attach next to the "EPLAIN"-prefixed output also the "TRACE"-prefixed output. Thanks! Stefan
Thanks, Fabian
------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hi Shahbaz, Of course this will not solve the issue, but will you try running select sum(date), sum(time), count(ticker) from trades where date = 20120501 and time between 34200000 and 37800000; to see whether this is too slow as well? Regards Bart On 07/05/2012 11:44 AM, Stefan Manegold wrote:
On Thu, Jul 05, 2012 at 10:58:42AM +0200, Fabian Groffen wrote:
Hi Shahbaz,
On 01-07-2012 12:22:02 -0400, Shahbaz wrote:
[select sum(date), sum(time), count(ticker) from trades where date = 20120501 and time> 34200000 and time< 37800000;]
This query took 1 minute and 15 seconds! The second time I ran it, it took 1 minute and 11 seconds. Notice that count(ticker) in this query only returned 5,969,337. This query is essentially the same as the previous one, but now I am only looking to process trades between 9:30 am and 10:30 am. So this query processes 1 hours' worth of trades while previous queries process the whole day.
I cannot give any hints here, but this feels like a bug somehow. Could you open a bug for this on our bugtracker (bugs.monetdb.org), and attach the output of your fast and slow query when prefixed with "EXPLAIN"?
if possible, please attach next to the "EPLAIN"-prefixed output also the "TRACE"-prefixed output. Thanks!
Stefan
Thanks, Fabian
------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (4)
-
Bart Scheers
-
Fabian Groffen
-
Shahbaz
-
Stefan Manegold