
Hi Dennis, thanks a lot! Some observations --- sorry for technical jargon, but for now that's more for the developers; I'll try to explain for users once I find the time ... ---: @ Niels, Sjoerd, (and other developers): First, the IN predicate yields more or less the same plan as a disjunctive predicate, i.e., multiple point selects and not a join; this might be on purpose, but could be reconsidered ... Second, the disjunctive predicate with all point selects is apparently no recognized as a point query; thus, mitosis is not disabled, and hashes are (cannot) by built ... @ Dennis: For now, if you can determine how your query is generated, instead of sorting your base table, you might also want to consider creating a (temporary) table with all the key you want to select and join that with your base table, rather than having a where clause with thousands of or'ed equality predicates. Best, Stefan ----- Original Message -----
Hi Stefan,
I have ran both versions of the query with a very limited set of keys (only 4) in 3 different circumstances:
- table in unsorted state, immediately after the table has been created and populated - table in unsorted state, 10 minutes after the queries have already been executed once - table in sorted state
I've attached the traces of each situation/version in a ZIP attached to this message. The difference in performance (approx 10x) is very clear when looking at these traces.
The table contains 258.833 rows and I'm running MonetDB on a system with 2 cores (it has the following CPU:http://ark.intel.com/nl/products/34694/Intel-Xeon-Processor-E3110-6M-Cache-3...).
Let me know if you need anything else.
Best regards, Dennis
On 15-7-2014 16:22, Stefan Manegold wrote:
Hi Dennis,
it would be very helpful for us, if you could run your query prefixed with TRACE and share the (profiling-)output with us, preferably both, on non-sorted and sorted data, and both your original version with larger where clause and Sjoerd's variant with IN predicate in the where clause.
In general, the first point predicate (should) trigger the build of hash index that is then re-used with the following predicates. On sorted data, MonetDB uses binary search for range-predicates, and might also favor that over building a hash for point predicates.
Also: How many records are in your table, and are you running on a multi-core system (if so, how many cores?)?
Thanks! Stefan
----- Original Message -----
Well, I've solved my problem, as it's extremely fast now (< 50 ms) but I don't know why exactly.
What I've done is re-created my table and made sure that data is sorted on the primary key (ckey). Basically the following the queries:
CREATE TABLE temp_order AS SELECT * FROM london_hav_neogeo_pa ORDER BY ckey ASC WITH DATA; CREATE TABLE london_hav_neogeo_pa AS SELECT * FROM temp_order WITH DATA;
Sjoerd, do you know why this is? Clearly I'm triggering some sort of internal mechanism of MonetDB when it's sorted.
Best regards, Dennis
On 15-7-2014 14:36, Dennis Pallett wrote:
Thank you for your fast reply.
Unfortunately the IN(...) variant is not any faster.
I used EXPLAIN to inspect the query and that returns hundreds of lines of the following:
| X_1137 := bat.mergecand(X_1133,X_1135); | X_1139 := algebra.subselect(X_297,A198,A198,true,true,false); | X_1141 := bat.mergecand(X_1137,X_1139); | X_1143 := algebra.subselect(X_297,A199,A199,true,true,false);
Is this an indication that for each equality test the database engine is doing some kind of subselect?
Best regards, Dennis
On 15-7-2014 14:26, Sjoerd Mullender wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 2014-07-15 14:13, Dennis Pallett wrote:
Hi all,
I'm executing a very large query with MonetDB and it takes very long before the result is calculated (e.g. several minutes) whereas PostgreSQL does it in less than 100 ms.
The query itself is a very simple select + sum query but the WHERE condition has more than a thousand equality tests on the primary key (ckey) of the table. It basically looks like this:
SELECT sum(countAggr) FROM sys.london_hav_neogeo_pa WHERE ckey=16947611436 OR ckey=16947611437 OR ckey=16947611438 OR ckey=16947611439 OR ckey=16947611440 OR ckey=16947617908 OR ckey=16947617909 OR ckey=16947617910 OR ckey=16947617911 OR ckey=16947617912 [.. several hundreds of lines removed ..] OR ckey=289411309830 OR ckey=289411309831 OR ckey=289411309832 OR ckey=289411309833 OR ckey=289411309834 OR ckey=289411309835 OR ckey=289411309836 OR ckey=289411309837;
Does anyone know why it's taking so long with MonetDB and if there is a way to significantly improve performance?
Best regards, Dennis I don't know why it is so slow, but you can check using EXPLAIN and TRACE (prepend one or the other to your SQL query) what's going on.
Is it any faster when you use: SELECT sum(countAggr) FROM sys.london_hav_neogeo_pa WHERE ckey in (16947611436, 16947611437, 16947611438, 16947611439, 16947611440, ...);
- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
iQCVAwUBU8Ud3T7g04AjvIQpAQLP/AP+MfiYffOeyNXcydj9CDOaZ9ocRnnoIxbm 2+dvK3aoIBjo10u/BXkWAfhhSO9/hxrVADDZn1uccU2lGpzDa1a+f9qxt1EJSIzc MCMXpWr0IRhU1w3wrr2zfBLkcFcSYnqEXMkMq7PqYo/iC+UeAEcSdjjWpE3GrhZO 9tHHfklL81E= =eVlM -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |