Re: Large query taking a long time

16 Jul 2014

      Hi Dennis,

thanks a lot!

Some observations --- sorry for technical jargon, but for now that's more for the developers;
I'll try to explain for users once I find the time ... ---:

@ Niels, Sjoerd, (and other developers):

First, the IN predicate yields more or less the same plan as a disjunctive predicate,
i.e., multiple point selects and not a join; this might be on purpose, but could be
reconsidered ...

Second, the disjunctive predicate with all point selects is apparently no recognized
as a point query; thus, mitosis is not disabled, and hashes are (cannot) by built ...

@ Dennis:

For now, if you can determine how your query is generated,
instead of sorting your base table,
you might also want to consider creating a (temporary) table
with all the key you want to select and join that with your
base table, rather than having a where clause with thousands
of or'ed equality predicates.

Best,
Stefan

----- Original Message -----
...
Hi Stefan,
I have ran both versions of the query with a very limited set of keys
(only 4) in 3 different circumstances:
- table in unsorted state, immediately after the table has been created
and populated
- table in unsorted state, 10 minutes after the queries have already
been executed once
- table in sorted state
I've attached the traces of each situation/version in a ZIP attached to
this message. The difference in performance (approx 10x) is very clear
when looking at these traces.
The table contains 258.833 rows and I'm running MonetDB on a system with
2 cores (it has the following
CPU:http://ark.intel.com/nl/products/34694/Intel-Xeon-Processor-E3110-6M-Cache-3...).
Let me know if you need anything else.
Best regards,
Dennis
On 15-7-2014 16:22, Stefan Manegold wrote:
...
Hi Dennis,
it would be very helpful for us, if you could run your query
prefixed with TRACE and share the (profiling-)output with us,
preferably both, on non-sorted and sorted data, and both your
original version with larger where clause and Sjoerd's variant
with IN predicate in the where clause.
In general, the first point predicate (should) trigger the build
of hash index that is then re-used with the following predicates.
On sorted data, MonetDB uses binary search for range-predicates,
and might also favor that over building a hash for point predicates.
Also:
How many records are in your table, and are you running
on a multi-core system (if so, how many cores?)?
Thanks!
Stefan
----- Original Message -----
...
Well, I've solved my problem, as it's extremely fast now (< 50 ms) but I
don't know why exactly.
What I've done is re-created my table and made sure that data is sorted
on the primary key (ckey). Basically the following the queries:
CREATE TABLE temp_order AS SELECT * FROM london_hav_neogeo_pa ORDER BY
ckey ASC WITH DATA;
CREATE TABLE london_hav_neogeo_pa AS SELECT * FROM temp_order WITH DATA;
Sjoerd, do you know why this is? Clearly I'm triggering some sort of
internal mechanism of MonetDB when it's sorted.
Best regards,
Dennis
On 15-7-2014 14:36, Dennis Pallett wrote:
...
Thank you for your fast reply.
Unfortunately the IN(...) variant is not any faster.
I used EXPLAIN to inspect the query and that returns hundreds of lines
of the following:
|     X_1137 := bat.mergecand(X_1133,X_1135);
|     X_1139 := algebra.subselect(X_297,A198,A198,true,true,false);
|     X_1141 := bat.mergecand(X_1137,X_1139);
|     X_1143 := algebra.subselect(X_297,A199,A199,true,true,false);
Is this an indication that for each equality test the database engine
is doing some kind of subselect?
Best regards,
Dennis
On 15-7-2014 14:26, Sjoerd Mullender wrote:
...
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
On 2014-07-15 14:13, Dennis Pallett wrote:
...
Hi all,
I'm executing a very large query with MonetDB and it takes very
long before the result is calculated (e.g. several minutes)
whereas PostgreSQL does it in less than 100 ms.
The query itself is a very simple select + sum query but the WHERE
condition has more than a thousand equality tests on the primary
key (ckey) of the table. It basically looks like this:
SELECT sum(countAggr) FROM sys.london_hav_neogeo_pa WHERE
ckey=16947611436 OR ckey=16947611437 OR ckey=16947611438 OR
ckey=16947611439 OR ckey=16947611440 OR ckey=16947617908 OR
ckey=16947617909 OR ckey=16947617910 OR ckey=16947617911 OR
ckey=16947617912 [.. several hundreds of lines removed ..] OR
ckey=289411309830 OR ckey=289411309831 OR ckey=289411309832 OR
ckey=289411309833 OR ckey=289411309834 OR ckey=289411309835 OR
ckey=289411309836 OR ckey=289411309837;
Does anyone know why it's taking so long with MonetDB and if there
is a way to significantly improve performance?
Best regards, Dennis
I don't know why it is so slow, but you can check using EXPLAIN and
TRACE (prepend one or the other to your SQL query) what's going on.
Is it any faster when you use:
SELECT sum(countAggr)
FROM sys.london_hav_neogeo_pa
WHERE ckey in (16947611436, 16947611437, 16947611438, 16947611439,
16947611440, ...);
- -- Sjoerd Mullender
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQCVAwUBU8Ud3T7g04AjvIQpAQLP/AP+MfiYffOeyNXcydj9CDOaZ9ocRnnoIxbm
2+dvK3aoIBjo10u/BXkWAfhhSO9/hxrVADDZn1uccU2lGpzDa1a+f9qxt1EJSIzc
MCMXpWr0IRhU1w3wrr2zfBLkcFcSYnqEXMkMq7PqYo/iC+UeAEcSdjjWpE3GrhZO
9tHHfklL81E=
=eVlM
-----END PGP SIGNATURE-----
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list

users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
-- 
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |