
2011/8/2 Stefan Manegold
On Tue, Aug 02, 2011 at 11:00:28AM +0400, Eugene Prokopiev wrote:
I have table with 81 887 010 rows. Select count from this table works fine, but I can't wait for the results of select count(distinct ...) from this table. Why can it be?
What data type is the column that you run the count(distinct) on?
INET
Which version of MonetDB are you using? What OS are you running on? What hardware (in particular CPU type & speed and amount of main memory) are you using?
`mserver5 --version` shoudl asnwer the latter three questions sufficiently.
$ mserver5 --version MonetDB 5 server v11.3.7 "Apr2011-SP2" (64-bit, 64-bit oids) Copyright (c) 1993-July 2008 CWI Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved Visit http://monetdb.cwi.nl/ for further information Found 2.0GiB available memory, 4 available cpu cores Libraries: libpcre: 8.12 2011-01-15 (compiled with 8.12) openssl: OpenSSL 1.0.0d 8 Feb 2011 (compiled with ) libxml2: 2.7.8 (compiled with 2.7.8) Compiled by: builder@dev64 (x86_64-alt-linux-gnu) Compilation: x86_64-alt-linux-gcc -pipe -Wall -g -O2 Linking : /usr/bin/ld -m elf_x86_64 So, I have only 2.0GiB main memory, but table size is only 6.0GiB
Distinct requires the table to be sorted to find them.
Not quite. On (knownly) sorted columns, the count(distinct <column>) will be (considerably) faster than on non-sorted columns, as determining the distinct values can be done in a simple sequential scan over sorted data, while it requires a more complicated (and due to its inherent very random access pattern) significantly slower hash-based algorithm on non-sorted data.
If you run your query with prefixed TRACE to complition, you'll see that most of the time goes into the "algebra.kunique()" statement close to the end.
Can clustered index helps? Is it possible to create two or more clustered indexes on one table?
No, there is no such thing as (user controlable) clustered indexes in MonetDB.
So, there is no way to sort column?