Re: [MonetDB-users] Long running query with count(distinct ...)

2 Aug 2011

      2011/8/2 Stefan Manegold :
...
On Tue, Aug 02, 2011 at 11:00:28AM +0400, Eugene Prokopiev wrote:
...
...
...
I have table with 81 887 010 rows. Select count from this table works
fine, but I can't wait for the results of select count(distinct ...)
from this table. Why can it be?
What data type is the column that you run the count(distinct) on?
INET
...
Which version of MonetDB are you using?
What OS are you running on?
What hardware (in particular CPU type & speed and amount of main memory) are
you using?
`mserver5 --version` shoudl asnwer the latter three questions sufficiently.
$ mserver5 --version
MonetDB 5 server v11.3.7 "Apr2011-SP2" (64-bit, 64-bit oids)
Copyright (c) 1993-July 2008 CWI
Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved
Visit http://monetdb.cwi.nl/ for further information
Found 2.0GiB available memory, 4 available cpu cores
Libraries:
  libpcre: 8.12 2011-01-15 (compiled with 8.12)
  openssl: OpenSSL 1.0.0d 8 Feb 2011 (compiled with )
  libxml2: 2.7.8 (compiled with 2.7.8)
Compiled by: builder@dev64 (x86_64-alt-linux-gnu)
Compilation: x86_64-alt-linux-gcc -pipe -Wall -g -O2
Linking    : /usr/bin/ld -m elf_x86_64

So, I have only 2.0GiB main memory, but table size is only 6.0GiB
...
...
...
Distinct requires the table to be sorted to find them.
Not quite. On (knownly) sorted columns, the count(distinct <column>) will be
(considerably) faster than on non-sorted columns, as determining the
distinct values can be done in a simple sequential scan over sorted data,
while it requires a more complicated (and due to its inherent very random
access pattern) significantly slower hash-based algorithm on non-sorted
data.
If you run your query with prefixed TRACE to complition, you'll see that
most of the time goes into the "algebra.kunique()" statement close to the
end.
...
Can clustered index helps? Is it possible to create two or more
clustered indexes on one table?
No, there is no such thing as (user controlable) clustered indexes in
MonetDB.
So, there is no way to sort column?