[MonetDB-users] count slower than expected?

8 Sep 2011

      Hello mailing list!

This is my first attempt to ask a question here, so please excuse my
ignorance.

For large tables (>1M rows), I encounter strange behavior when using the
COUNT function in SQL in the mclient.

When I do the following query:
SELECT DISTINCT(predicate) FROM facts;
I get a nice resultset of 31 tuples in 20ms.

When I try the following query:
SELECT COUNT(DISTINCT(predicate)) FROM facts;
I get my answer in 565ms.

It seems weird that the database doesn't 'optimize' this count to:
SELECT COUNT(*) FROM (SELECT DISTINCT(predicate) FROM facts) AS b;
I get my answer in 20ms.

The facts table is defined as follows:
CREATE TABLE "sys"."facts" (
"subject"   CHAR(2048),
 "predicate" CHAR(2048),
"object"    CHAR(2048)
);

This is my first attempt to make a knowledge store and I am aware there is
lots I can do to optimize how I store the data. I was just surprised that
the count took significantly longer than getting all the results out and
counting them.

Here is the version information of MonetDB I am using:

MonetDB 5 server v11.3.7 "Apr2011-SP2" (32-bit, 32-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved
Visit http://monetdb.cwi.nl/ for further information
Found 1.0GiB available memory, 2 available cpu cores
Libraries:
  libpcre: 7.8 2008-09-05 (compiled with 7.8)
  openssl: OpenSSL 0.9.8k 25 Mar 2009 (compiled with OpenSSL 0.9.8k 25 Mar
2009)
  libxml2: 2.7.6 (compiled with 2.7.6)
Compiled by: root@ottar.ins.cwi.nl (i486-pc-linux-gnu)
Compilation: gcc -Wall -O6 -fomit-frame-pointer
 -finline-functions -falign-loops=4 -falign-jumps=4 -falign-functions=4
-fexpensive-optimizations                     -funroll-loops
-frerun-cse-after-loop -frerun-loop-opt -ftree-vectorize
Linking    : /usr/bin/ld -Wl,-Bsymbolic-functions

Greetings,
 Joris

Joris Slob

Fabian Groffen

tags

participants (2)