[MonetDB-users] count slower than expected?
Hello mailing list! This is my first attempt to ask a question here, so please excuse my ignorance. For large tables (>1M rows), I encounter strange behavior when using the COUNT function in SQL in the mclient. When I do the following query: SELECT DISTINCT(predicate) FROM facts; I get a nice resultset of 31 tuples in 20ms. When I try the following query: SELECT COUNT(DISTINCT(predicate)) FROM facts; I get my answer in 565ms. It seems weird that the database doesn't 'optimize' this count to: SELECT COUNT(*) FROM (SELECT DISTINCT(predicate) FROM facts) AS b; I get my answer in 20ms. The facts table is defined as follows: CREATE TABLE "sys"."facts" ( "subject" CHAR(2048), "predicate" CHAR(2048), "object" CHAR(2048) ); This is my first attempt to make a knowledge store and I am aware there is lots I can do to optimize how I store the data. I was just surprised that the count took significantly longer than getting all the results out and counting them. Here is the version information of MonetDB I am using: MonetDB 5 server v11.3.7 "Apr2011-SP2" (32-bit, 32-bit oids) Copyright (c) 1993-July 2008 CWI Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved Visit http://monetdb.cwi.nl/ for further information Found 1.0GiB available memory, 2 available cpu cores Libraries: libpcre: 7.8 2008-09-05 (compiled with 7.8) openssl: OpenSSL 0.9.8k 25 Mar 2009 (compiled with OpenSSL 0.9.8k 25 Mar 2009) libxml2: 2.7.6 (compiled with 2.7.6) Compiled by: root@ottar.ins.cwi.nl (i486-pc-linux-gnu) Compilation: gcc -Wall -O6 -fomit-frame-pointer -finline-functions -falign-loops=4 -falign-jumps=4 -falign-functions=4 -fexpensive-optimizations -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -ftree-vectorize Linking : /usr/bin/ld -Wl,-Bsymbolic-functions Greetings, Joris
Hi Joris, On 08-09-2011 15:12:48 +0200, Joris Slob wrote:
For large tables (>1M rows), I encounter strange behavior when using the COUNT function in SQL in the mclient.
When I do the following query:
SELECT DISTINCT(predicate) FROM facts;
I get a nice resultset of 31 tuples in 20ms.
When I try the following query:
SELECT COUNT(DISTINCT(predicate)) FROM facts;
I get my answer in 565ms.
It seems weird that the database doesn't 'optimize' this count to:
SELECT COUNT(*) FROM (SELECT DISTINCT(predicate) FROM facts) AS b;
I get my answer in 20ms.
That is very well possible. Please file a bug and attach the outputs of your queries prefixed with 'PLAN' and 'EXPLAIN'. They should clarify what's going wrong.
This is my first attempt to make a knowledge store and I am aware there is lots I can do to optimize how I store the data. I was just surprised that the count took significantly longer than getting all the results out and counting them.
My first guess would be that the optimisers take a wrong option here, and do it the long and lengthy way. Regards, Fabian Groffen
participants (2)
-
Fabian Groffen
-
Joris Slob