This is my first attempt to ask a question here, so please excuse my ignorance.

For large tables (>1M rows), I encounter strange behavior when using the COUNT function in SQL in the mclient.

When I do the following query:

SELECT DISTINCT(predicate) FROM facts;

I get a nice resultset of 31 tuples in 20ms.

When I try the following query:

SELECT COUNT(DISTINCT(predicate)) FROM facts;

I get my answer in 565ms.

It seems weird that the database doesn't 'optimize' this count to:

SELECT COUNT(*) FROM (SELECT DISTINCT(predicate) FROM facts) AS b;

I get my answer in 20ms.

The facts table is defined as follows:

CREATE TABLE "sys"."facts" (

"subject" CHAR(2048),

"predicate" CHAR(2048),

"object" CHAR(2048)

);

This is my first attempt to make a knowledge store and I am aware there is lots I can do to optimize how I store the data. I was just surprised that the count took significantly longer than getting all the results out and counting them.

Here is the version information of MonetDB I am using:

MonetDB 5 server v11.3.7 "Apr2011-SP2" (32-bit, 32-bit oids)

Visit http://monetdb.cwi.nl/ for further information

Found 1.0GiB available memory, 2 available cpu cores

Libraries:

libpcre: 7.8 2008-09-05 (compiled with 7.8)

openssl: OpenSSL 0.9.8k 25 Mar 2009 (compiled with OpenSSL 0.9.8k 25 Mar 2009)

libxml2: 2.7.6 (compiled with 2.7.6)

Compiled by: root@ottar.ins.cwi.nl (i486-pc-linux-gnu)

Compilation: gcc -Wall -O6 -fomit-frame-pointer -finline-functions -falign-loops=4 -falign-jumps=4 -falign-functions=4 -fexpensive-optimizations -funroll-loops -frerun-cse-after-loop -frerun-loop-opt -ftree-vectorize

Linking : /usr/bin/ld -Wl,-Bsymbolic-functions