Hello mailing list!
This is my first attempt to ask a question here, so please excuse my ignorance.
For large tables (>1M rows), I encounter strange behavior when using the COUNT function in SQL in the mclient.
When I do the following query:
SELECT DISTINCT(predicate) FROM facts;
I get a nice resultset of 31 tuples in 20ms.
When I try the following query:
SELECT COUNT(DISTINCT(predicate)) FROM facts;
I get my answer in 565ms.
It seems weird that the database doesn't 'optimize' this count to:
SELECT COUNT(*) FROM (SELECT DISTINCT(predicate) FROM facts) AS b;
I get my answer in 20ms.
The facts table is defined as follows:
CREATE TABLE "sys"."facts" (
"subject" CHAR(2048),
"predicate" CHAR(2048),
"object" CHAR(2048)
);
This is my first attempt to make a knowledge store and I am aware there is lots I can do to optimize how I store the data. I was just surprised that the count took significantly longer than getting all the results out and counting them.
Here is the version information of MonetDB I am using:
MonetDB 5 server v11.3.7 "Apr2011-SP2" (32-bit, 32-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved
Found 1.0GiB available memory, 2 available cpu cores
Libraries:
libpcre: 7.8 2008-09-05 (compiled with 7.8)
openssl: OpenSSL 0.9.8k 25 Mar 2009 (compiled with OpenSSL 0.9.8k 25 Mar 2009)
libxml2: 2.7.6 (compiled with 2.7.6)
Compilation: gcc -Wall -O6 -fomit-frame-pointer -finline-functions -falign-loops=4 -falign-jumps=4 -falign-functions=4 -fexpensive-optimizations -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -ftree-vectorize
Linking : /usr/bin/ld -Wl,-Bsymbolic-functions
Greetings,
Joris