Re: [MonetDB-users] Long running query with count(distinct ...)

2 Aug 2011

      On Tue, Aug 02, 2011 at 11:00:28AM +0400, Eugene Prokopiev wrote:
...
...
...
I have table with 81 887 010 rows. Select count from this table works
fine, but I can't wait for the results of select count(distinct ...)
from this table. Why can it be?
What data type is the column that you run the count(distinct) on?
Which version of MonetDB are you using?
What OS are you running on?
What hardware (in particular CPU type & speed and amount of main memory) are
you using?

`mserver5 --version` shoudl asnwer the latter three questions sufficiently.
...
...
Distinct requires the table to be sorted to find them.
Not quite. On (knownly) sorted columns, the count(distinct <column>) will be
(considerably) faster than on non-sorted columns, as determining the
distinct values can be done in a simple sequential scan over sorted data,
while it requires a more complicated (and due to its inherent very random
access pattern) significantly slower hash-based algorithm on non-sorted
data.

If you run your query with prefixed TRACE to complition, you'll see that
most of the time goes into the "algebra.kunique()" statement close to the
end.
...
Can clustered index helps? Is it possible to create two or more
clustered indexes on one table?
No, there is no such thing as (user controlable) clustered indexes in
MonetDB.

Stefan
...
------------------------------------------------------------------------------
BlackBerry® DevCon Americas, Oct. 18-20, San Francisco, CA
The must-attend event for mobile developers. Connect with experts. 
Get tools for creating Super Apps. See the latest technologies.
Sessions, hands-on labs, demos & much more. Register early & save!
http://p.sf.net/sfu/rim-blackberry-1
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- 
| Stefan.Manegold @ CWI.nl | DB Architectures (INS1) |
| http://CWI.nl/~manegold/ | Science Park 123 (L321) |
| Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam  (NL) |

Re: [MonetDB-users] Long running query with count(distinct ...)

Stefan Manegold