extreme performance breakdown after update
Hello List, I'm running monetdb on a Debian server and the software is managed via Debian packages. A few days ago I installed an update for the monetdb5-server package. After that I noticed a problem with queries with the function lower() in the WHERE clause:
WHERE LOWER(“fieldname”) LIKE '%lower_string%'<<
without any change to the data, the requests became much, much slower then before (from some hundred microseconds to some minutes). Not only in my application but also in the monetdb client... I could solve the problem via a switch from using LIKE to ILIKE and no lower()-function, but I'm curious what could be the cause for this dramatic performance breakdown? I searched the release history but I found nothing... Has anybody experienced the same effect? Anny ideas on this phenomenon? Greetings from Germany, Andreas (theafh)
Hi, You can check the merovingian.log file for WARNINGs In the development branch 'default' we had a temporary bug that failed to find some bulk-operations. It has been fixed already. Perhaps you hit this jackpot. regards, Martin On 24/02/16 21:38, Andreas F. Hoffmann wrote:
Hello List,
I'm running monetdb on a Debian server and the software is managed via Debian packages. A few days ago I installed an update for the monetdb5-server package. After that I noticed a problem with queries with the function lower() in the WHERE clause:
WHERE LOWER(“fieldname”) LIKE '%lower_string%'<<
without any change to the data, the requests became much, much slower then before (from some hundred microseconds to some minutes). Not only in my application but also in the monetdb client...
I could solve the problem via a switch from using LIKE to ILIKE and no lower()-function, but I'm curious what could be the cause for this dramatic performance breakdown? I searched the release history but I found nothing...
Has anybody experienced the same effect? Anny ideas on this phenomenon?
Greetings from Germany, Andreas (theafh) _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
@Martin: The issue you mention where never release; hence, I do not think they play a role, here. @Andreas: Most important question: Which version of MonetDB did work fine before, and which does no longer after the upgrade? For the new version, you could profile your quer(y|ies) by prefixing them with TRACE --- feel free to share the results for analysis --- unfortunately, you cannot (easily) downgrade to the older version to also profile that one. Would you be to share all ingredients to reproduce the problem? Also, do I understand correctly that you tested the alternative omitting LOWER() and using ILIKE, instead, and that works fine (also) with the new version of MonetDB? If so, you might also want to profile (TRACE) that one to compare the trace with that of the "slow" query. Best, Stefan ----- On Feb 24, 2016, at 9:43 PM, Martin martin.kersten@cwi.nl wrote:
Hi,
You can check the merovingian.log file for WARNINGs In the development branch 'default' we had a temporary bug that failed to find some bulk-operations. It has been fixed already. Perhaps you hit this jackpot.
regards, Martin
On 24/02/16 21:38, Andreas F. Hoffmann wrote:
Hello List,
I'm running monetdb on a Debian server and the software is managed via Debian packages. A few days ago I installed an update for the monetdb5-server package. After that I noticed a problem with queries with the function lower() in the WHERE clause:
WHERE LOWER(“fieldname”) LIKE '%lower_string%'<<
without any change to the data, the requests became much, much slower then before (from some hundred microseconds to some minutes). Not only in my application but also in the monetdb client...
I could solve the problem via a switch from using LIKE to ILIKE and no lower()-function, but I'm curious what could be the cause for this dramatic performance breakdown? I searched the release history but I found nothing...
Has anybody experienced the same effect? Anny ideas on this phenomenon?
Greetings from Germany, Andreas (theafh) _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hello Martin and Stefan, its Sunday and time for some server fun! ;-) On 25.02.2016 09:47, Stefan Manegold wrote:
@Martin: The issue you mention where never release; hence, I do not think they play a role, here. OK, but for the sake of completeness: There is nothing suspicious to spot in the merovingian.log of this DB farm. Nothing in general and especially nothing when such a slow query is running... @Andreas: Most important question: Which version of MonetDB did work fine before, and which does no longer after the upgrade? I have no idea how to find out which version was the last. Sorry! I regularly install the Debian updates, it was the last "official" packet before the one I installed a few days before... My current version of monetdb is shown as: “MonetDB Database Server Toolkit v1.1 (Jul2015-SP2)” and im running Debian GNU/Linux 7 (wheezy). To make it also clear: there was no change to the DB Farm between software updates. Probably it is a specific problem with that current MonetDB5-Debian Package? For the new version, you could profile your quer(y|ies) by prefixing them with TRACE --- feel free to share the results for analysis --- unfortunately, you cannot (easily) downgrade to the older version to also profile that one. Would you be to share all ingredients to reproduce the problem? I can give you the monetdb version, the logs of the 3 traces ("lower", "ilike" and "simple", attached to this mail) unfortunately not the data itself... By the way, there is no difference in the behavior when I search other strings. The Breakdown comes when using LOWE(). The Table is extreme simple (filld with some million strings, the longest "ua" with some 100 chars). Its structure is build as follow: CREATE TABLE "agents" ("ua_id" int, "ua_hash" varchar(32),"ua" text, "class" varchar(3), "date" date); Also, do I understand correctly that you tested the alternative omitting LOWER() and using ILIKE, instead, and that works fine (also) with the new version of MonetDB? Yes! Absolutely. Using ILIKE instead of LOWER() in the current version works fine.
If so, you might also want to profile (TRACE) that one to compare the trace with that of the "slow" query. I attached the 3 traces and I hope you gain insight from it! :-) Best, Stefan
Thank you in advance, and greetings from Germany, Andreas (theafh)
Hi Andreas,
from your traces, I can read the following:
the query with lower() + like is (significantly) more expensive than
the query with ilike, which is in turn (slightly) more expensive than
the query with like:
$ grep ' := user.s' lower.log ilike.log simple.log
lower.log:[ 84857647, "X_4=0@0:void := user.s4_2(\"%qcrawl%\":str);" ]
ilike.log:[ 394212, "X_4=0@0:void := user.s3_2(\"%qcrawl%\":str);" ]
simple.log:[ 131674, "X_4=0@0:void := user.s2_2(\"%QCrawl%\":str);" ]
(times are in micro-seconds)
For the lower() + like, almost all of the time is spent in the lower() function:
$ echo ; for i in lower.log ilike.log simple.log ; do echo $i ; grep ' := ' $i | egrep -v ' := (language.dataflow|user.s[0-9])' | sort -nrk2 | head ; echo ; done
lower.log
[ 75923265, "X_311=
Hello Martin and Stefan,
its Sunday and time for some server fun! ;-)
On 25.02.2016 09:47, Stefan Manegold wrote:
@Martin: The issue you mention where never release; hence, I do not think they play a role, here. OK, but for the sake of completeness: There is nothing suspicious to spot in the merovingian.log of this DB farm. Nothing in general and especially nothing when such a slow query is running...
@Andreas: Most important question: Which version of MonetDB did work fine before, and which does no longer after the upgrade? I have no idea how to find out which version was the last. Sorry! I regularly install the Debian updates, it was the last "official" packet before the one I installed a few days before... My current version of monetdb is shown as: “MonetDB Database Server Toolkit v1.1 (Jul2015-SP2)” and im running Debian GNU/Linux 7 (wheezy). To make it also clear: there was no change to the DB Farm between software updates. Probably it is a specific problem with that current MonetDB5-Debian Package?
For the new version, you could profile your quer(y|ies) by prefixing them with TRACE --- feel free to share the results for analysis --- unfortunately, you cannot (easily) downgrade to the older version to also profile that one. Would you be to share all ingredients to reproduce the problem? I can give you the monetdb version, the logs of the 3 traces ("lower", "ilike" and "simple", attached to this mail) unfortunately not the data itself... By the way, there is no difference in the behavior when I search other strings. The Breakdown comes when using LOWE(). The Table is extreme simple (filld with some million strings, the longest "ua" with some 100 chars). Its structure is build as follow: CREATE TABLE "agents" ("ua_id" int, "ua_hash" varchar(32),"ua" text, "class" varchar(3), "date" date);
Also, do I understand correctly that you tested the alternative omitting LOWER() and using ILIKE, instead, and that works fine (also) with the new version of MonetDB? Yes! Absolutely. Using ILIKE instead of LOWER() in the current version works fine.
If so, you might also want to profile (TRACE) that one to compare the trace with that of the "slow" query. I attached the 3 traces and I hope you gain insight from it! :-)
Best, Stefan
Thank you in advance, and greetings from Germany, Andreas (theafh)
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
In the next release, lower() (and upper()) should be significantly faster. The day before the release of SP2 I checked in a change that did that, but that change is not in SP2 since the release was built before I made the change. The changeset in question is 43fb721ed302. On 02/28/2016 04:47 PM, Stefan Manegold wrote:
Hi Andreas,
from your traces, I can read the following:
the query with lower() + like is (significantly) more expensive than the query with ilike, which is in turn (slightly) more expensive than the query with like:
$ grep ' := user.s' lower.log ilike.log simple.log lower.log:[ 84857647, "X_4=0@0:void := user.s4_2(\"%qcrawl%\":str);" ] ilike.log:[ 394212, "X_4=0@0:void := user.s3_2(\"%qcrawl%\":str);" ] simple.log:[ 131674, "X_4=0@0:void := user.s2_2(\"%QCrawl%\":str);" ] (times are in micro-seconds)
For the lower() + like, almost all of the time is spent in the lower() function:
$ echo ; for i in lower.log ilike.log simple.log ; do echo $i ; grep ' := ' $i | egrep -v ' := (language.dataflow|user.s[0-9])' | sort -nrk2 | head ; echo ; done
lower.log [ 75923265, "X_311=
[675067]:bat[:oid,:str]{horigin=6} := batstr.toLower(X_301= [675067]:bat[:oid,:str]);" ] [ 71794523, "X_308= [675067]:bat[:oid,:str]{horigin=3} := batstr.toLower(X_295= [675067]:bat[:oid,:str]);" ] [ 70768563, "X_313= [675074]:bat[:oid,:str]{horigin=8} := batstr.toLower(X_305= [675074]:bat[:oid,:str]);" ] [ 69586703, "X_310= [675067]:bat[:oid,:str]{horigin=5} := batstr.toLower(X_299= [675067]:bat[:oid,:str]);" ] [ 67852676, "X_309= [675067]:bat[:oid,:str]{horigin=4} := batstr.toLower(X_297= [675067]:bat[:oid,:str]);" ] [ 57176388, "X_312= [675067]:bat[:oid,:str]{horigin=7} := batstr.toLower(X_303= [675067]:bat[:oid,:str]);" ] [ 27191014, "X_307= [675067]:bat[:oid,:str]{horigin=2} := batstr.toLower(X_293= [675067]:bat[:oid,:str]);" ] [ 16893811, "X_306= [675067]:bat[:oid,:str]{horigin=1} := batstr.toLower(X_291= [675067]:bat[:oid,:str]);" ] [ 58652, "X_319= [0]:bat[:oid,:oid]{torigin=5, horigin=5} := algebra.likesubselect(X_310= [675067]:bat[:oid,:str],X_119= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 56453, "X_321= [0]:bat[:oid,:oid]{torigin=7, horigin=7} := algebra.likesubselect(X_312= [675067]:bat[:oid,:str],X_123= [675038]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] ilike.log [ 336293, "X_297=
[0]:bat[:oid,:oid]{torigin=6, horigin=6} := algebra.ilikesubselect(X_133= [675067]:bat[:oid,:str],X_123= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 323465, "X_296= [0]:bat[:oid,:oid]{torigin=5, horigin=5} := algebra.ilikesubselect(X_132= [675067]:bat[:oid,:str],X_121= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 320678, "X_298= [0]:bat[:oid,:oid]{torigin=7, horigin=7} := algebra.ilikesubselect(X_134= [675067]:bat[:oid,:str],X_125= [675038]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 320629, "X_293= [0]:bat[:oid,:oid]{torigin=2, horigin=2} := algebra.ilikesubselect(X_129= [675067]:bat[:oid,:str],X_115= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 319128, "X_294= [0]:bat[:oid,:oid]{torigin=3, horigin=3} := algebra.ilikesubselect(X_130= [675067]:bat[:oid,:str],X_117= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 307071, "X_299= [0]:bat[:oid,:oid]{torigin=8, horigin=8} := algebra.ilikesubselect(X_135= [675074]:bat[:oid,:str],X_127= [675073]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 303231, "X_295= [0]:bat[:oid,:oid]{torigin=4, horigin=4} := algebra.ilikesubselect(X_131= [675067]:bat[:oid,:str],X_119= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 288524, "X_292= [1]:bat[:oid,:oid]{torigin=1, horigin=1} := algebra.ilikesubselect(X_128= [675067]:bat[:oid,:str],X_113= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 42323, "X_127= [675073]:bat[:oid,:oid]{horigin=8, torigin=8} := sql.tid(X_3=0:int,\"sys\":str,\"agents\":str,7:int,8:int);" ] [ 40222, "X_125= [675038]:bat[:oid,:oid]{horigin=7, torigin=7} := sql.tid(X_3=0:int,\"sys\":str,\"agents\":str,6:int,8:int);" ] simple.log [ 65030, "X_297=
[0]:bat[:oid,:oid]{torigin=6, horigin=6} := algebra.likesubselect(X_133= [675067]:bat[:oid,:str],X_123= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 64647, "X_296= [0]:bat[:oid,:oid]{torigin=5, horigin=5} := algebra.likesubselect(X_132= [675067]:bat[:oid,:str],X_121= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 63896, "X_293= [0]:bat[:oid,:oid]{torigin=2, horigin=2} := algebra.likesubselect(X_129= [675067]:bat[:oid,:str],X_115= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 62461, "X_298= [0]:bat[:oid,:oid]{torigin=7, horigin=7} := algebra.likesubselect(X_134= [675067]:bat[:oid,:str],X_125= [675038]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 62363, "X_294= [0]:bat[:oid,:oid]{torigin=3, horigin=3} := algebra.likesubselect(X_130= [675067]:bat[:oid,:str],X_117= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 61810, "X_295= [0]:bat[:oid,:oid]{torigin=4, horigin=4} := algebra.likesubselect(X_131= [675067]:bat[:oid,:str],X_119= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 58262, "X_292= [1]:bat[:oid,:oid]{torigin=1, horigin=1} := algebra.likesubselect(X_128= [675067]:bat[:oid,:str],X_113= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 57441, "X_299= [0]:bat[:oid,:oid]{torigin=8, horigin=8} := algebra.likesubselect(X_135= [675074]:bat[:oid,:str],X_127= [675073]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 39817, "X_117= [675067]:bat[:oid,:oid]{horigin=3, torigin=3} := sql.tid(X_3=0:int,\"sys\":str,\"agents\":str,2:int,8:int);" ] [ 39441, "X_125= [675038]:bat[:oid,:oid]{horigin=7, torigin=7} := sql.tid(X_3=0:int,\"sys\":str,\"agents\":str,6:int,8:int);" ] The overall order of these three versions is to be expected. The simple like does least work. The with the ilike, the actual comparison is expected to be (slightly?) more expensive than with the simple like; basically, is also has to nomalize the case internally. The version with lower() + like --- basically re-implementing ilike "the crude way", is expected to be more expensive than the other two, mainly as it first needs to convert and materialize all data (in your case 5400543 strings of up to 100 characters each, i.e., roughly between 5.5 MB and 550 MB, or so) --- in memory, if it fits --- before it does the same like comparison as the simple version.
Whether the difference is bound to be that big is had to tell without running it myself, or at least without knowing some more details about your data and machine, e.g., how much memory does the machine have, and how big (in MB) is your data (in particular the "ua" column); you might consider using the storage() function to check the latter: select * from storage(); Also, memory performance might be influenced by Linux kernel settings that might have changed with your latest Debian upgrade, we cannot exclude that the performance changes are not only due to the MonetDB upgrade.
I cannot find any code chances in MonetDB's lower() implementation since the Oct2014 release that would trigger any performance degradation, let alone between the Jul2015-SP1 and Ju2015-SP2 releases.
All I can find is a very recent change that is supposed to improve performance, but that is not included in any release, yet. cf., http://dev.monetdb.org/hg/MonetDB/rev/43fb721ed302 and https://www.monetdb.org/pipermail/developers-list/2016-January/004563.html
I am sure you are facing the same problem. In fact a quick test on 20M strings of 2 to 147 chars (35 chars on average) long just revealed that a query similar to your lower() + like query takes 1.5 min with the Jul2015-SP2 release, but only 7 secs with the latest (not yet released) code base of the Jul2015 branch, i.e., after above fix. (all on my 16GB 8-core 3.4GHz machine)
(We are currently planning to prepare the next feature release, which will also include the above fix, but that will take a few more months. I'm not sure, whether we'll manage to issue a Jul2015-SP3 bugfix release before that ...)
The questions remains, why you pervious version of MonetDB showed so much better performance on the very same data and very same machine for the very same query, although I cannot find and lower() related code changes since the Oct2014 release. For this, it would really help us to know with which version of MonetDB your lower() + like was so much (> 100x ?) faster than it is now with Jul2015-SP2. Maybe your merovingian log(s) date back for enough?
Best, Stefan
----- On Feb 28, 2016, at 1:32 PM, Andreas F. Hoffmann me@theafh.net wrote:
Hello Martin and Stefan,
its Sunday and time for some server fun! ;-)
On 25.02.2016 09:47, Stefan Manegold wrote:
@Martin: The issue you mention where never release; hence, I do not think they play a role, here. OK, but for the sake of completeness: There is nothing suspicious to spot in the merovingian.log of this DB farm. Nothing in general and especially nothing when such a slow query is running...
@Andreas: Most important question: Which version of MonetDB did work fine before, and which does no longer after the upgrade? I have no idea how to find out which version was the last. Sorry! I regularly install the Debian updates, it was the last "official" packet before the one I installed a few days before... My current version of monetdb is shown as: “MonetDB Database Server Toolkit v1.1 (Jul2015-SP2)” and im running Debian GNU/Linux 7 (wheezy). To make it also clear: there was no change to the DB Farm between software updates. Probably it is a specific problem with that current MonetDB5-Debian Package?
For the new version, you could profile your quer(y|ies) by prefixing them with TRACE --- feel free to share the results for analysis --- unfortunately, you cannot (easily) downgrade to the older version to also profile that one. Would you be to share all ingredients to reproduce the problem? I can give you the monetdb version, the logs of the 3 traces ("lower", "ilike" and "simple", attached to this mail) unfortunately not the data itself... By the way, there is no difference in the behavior when I search other strings. The Breakdown comes when using LOWE(). The Table is extreme simple (filld with some million strings, the longest "ua" with some 100 chars). Its structure is build as follow: CREATE TABLE "agents" ("ua_id" int, "ua_hash" varchar(32),"ua" text, "class" varchar(3), "date" date);
Also, do I understand correctly that you tested the alternative omitting LOWER() and using ILIKE, instead, and that works fine (also) with the new version of MonetDB? Yes! Absolutely. Using ILIKE instead of LOWER() in the current version works fine.
If so, you might also want to profile (TRACE) that one to compare the trace with that of the "slow" query. I attached the 3 traces and I hope you gain insight from it! :-)
Best, Stefan
Thank you in advance, and greetings from Germany, Andreas (theafh)
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
Hello Stefan, thank you for digging so deep in to that problem! :-) Yes I also got a new Linux kernel and some new version of system libraries with that update and it is possible that this might be the source of that problem... Luckily my merovingian.log might reveal something: before the update it states "Merovingian 1.7 (Jul2015-SP1) starting" and after the update it shows "Merovingian 1.7 (Jul2015-SP2)". Before that, also working with the lower() function in my SQL, it states "(Oct2014-SP4)"... Is that what you are locking for? Greetings from Berlin, Andreas On 28.02.2016 16:47, Stefan Manegold wrote:
Hi Andreas,
from your traces, I can read the following:
the query with lower() + like is (significantly) more expensive than the query with ilike, which is in turn (slightly) more expensive than the query with like:
$ grep ' := user.s' lower.log ilike.log simple.log lower.log:[ 84857647, "X_4=0@0:void := user.s4_2(\"%qcrawl%\":str);" ] ilike.log:[ 394212, "X_4=0@0:void := user.s3_2(\"%qcrawl%\":str);" ] simple.log:[ 131674, "X_4=0@0:void := user.s2_2(\"%QCrawl%\":str);" ] (times are in micro-seconds)
For the lower() + like, almost all of the time is spent in the lower() function:
$ echo ; for i in lower.log ilike.log simple.log ; do echo $i ; grep ' := ' $i | egrep -v ' := (language.dataflow|user.s[0-9])' | sort -nrk2 | head ; echo ; done
lower.log [ 75923265, "X_311=
[675067]:bat[:oid,:str]{horigin=6} := batstr.toLower(X_301= [675067]:bat[:oid,:str]);" ] [ 71794523, "X_308= [675067]:bat[:oid,:str]{horigin=3} := batstr.toLower(X_295= [675067]:bat[:oid,:str]);" ] [ 70768563, "X_313= [675074]:bat[:oid,:str]{horigin=8} := batstr.toLower(X_305= [675074]:bat[:oid,:str]);" ] [ 69586703, "X_310= [675067]:bat[:oid,:str]{horigin=5} := batstr.toLower(X_299= [675067]:bat[:oid,:str]);" ] [ 67852676, "X_309= [675067]:bat[:oid,:str]{horigin=4} := batstr.toLower(X_297= [675067]:bat[:oid,:str]);" ] [ 57176388, "X_312= [675067]:bat[:oid,:str]{horigin=7} := batstr.toLower(X_303= [675067]:bat[:oid,:str]);" ] [ 27191014, "X_307= [675067]:bat[:oid,:str]{horigin=2} := batstr.toLower(X_293= [675067]:bat[:oid,:str]);" ] [ 16893811, "X_306= [675067]:bat[:oid,:str]{horigin=1} := batstr.toLower(X_291= [675067]:bat[:oid,:str]);" ] [ 58652, "X_319= [0]:bat[:oid,:oid]{torigin=5, horigin=5} := algebra.likesubselect(X_310= [675067]:bat[:oid,:str],X_119= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 56453, "X_321= [0]:bat[:oid,:oid]{torigin=7, horigin=7} := algebra.likesubselect(X_312= [675067]:bat[:oid,:str],X_123= [675038]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] ilike.log [ 336293, "X_297=
[0]:bat[:oid,:oid]{torigin=6, horigin=6} := algebra.ilikesubselect(X_133= [675067]:bat[:oid,:str],X_123= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 323465, "X_296= [0]:bat[:oid,:oid]{torigin=5, horigin=5} := algebra.ilikesubselect(X_132= [675067]:bat[:oid,:str],X_121= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 320678, "X_298= [0]:bat[:oid,:oid]{torigin=7, horigin=7} := algebra.ilikesubselect(X_134= [675067]:bat[:oid,:str],X_125= [675038]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 320629, "X_293= [0]:bat[:oid,:oid]{torigin=2, horigin=2} := algebra.ilikesubselect(X_129= [675067]:bat[:oid,:str],X_115= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 319128, "X_294= [0]:bat[:oid,:oid]{torigin=3, horigin=3} := algebra.ilikesubselect(X_130= [675067]:bat[:oid,:str],X_117= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 307071, "X_299= [0]:bat[:oid,:oid]{torigin=8, horigin=8} := algebra.ilikesubselect(X_135= [675074]:bat[:oid,:str],X_127= [675073]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 303231, "X_295= [0]:bat[:oid,:oid]{torigin=4, horigin=4} := algebra.ilikesubselect(X_131= [675067]:bat[:oid,:str],X_119= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 288524, "X_292= [1]:bat[:oid,:oid]{torigin=1, horigin=1} := algebra.ilikesubselect(X_128= [675067]:bat[:oid,:str],X_113= [675067]:bat[:oid,:oid],A0=\"%qcrawl%\":str,\"\":str,false:bit);" ] [ 42323, "X_127= [675073]:bat[:oid,:oid]{horigin=8, torigin=8} := sql.tid(X_3=0:int,\"sys\":str,\"agents\":str,7:int,8:int);" ] [ 40222, "X_125= [675038]:bat[:oid,:oid]{horigin=7, torigin=7} := sql.tid(X_3=0:int,\"sys\":str,\"agents\":str,6:int,8:int);" ] simple.log [ 65030, "X_297=
[0]:bat[:oid,:oid]{torigin=6, horigin=6} := algebra.likesubselect(X_133= [675067]:bat[:oid,:str],X_123= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 64647, "X_296= [0]:bat[:oid,:oid]{torigin=5, horigin=5} := algebra.likesubselect(X_132= [675067]:bat[:oid,:str],X_121= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 63896, "X_293= [0]:bat[:oid,:oid]{torigin=2, horigin=2} := algebra.likesubselect(X_129= [675067]:bat[:oid,:str],X_115= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 62461, "X_298= [0]:bat[:oid,:oid]{torigin=7, horigin=7} := algebra.likesubselect(X_134= [675067]:bat[:oid,:str],X_125= [675038]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 62363, "X_294= [0]:bat[:oid,:oid]{torigin=3, horigin=3} := algebra.likesubselect(X_130= [675067]:bat[:oid,:str],X_117= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 61810, "X_295= [0]:bat[:oid,:oid]{torigin=4, horigin=4} := algebra.likesubselect(X_131= [675067]:bat[:oid,:str],X_119= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 58262, "X_292= [1]:bat[:oid,:oid]{torigin=1, horigin=1} := algebra.likesubselect(X_128= [675067]:bat[:oid,:str],X_113= [675067]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 57441, "X_299= [0]:bat[:oid,:oid]{torigin=8, horigin=8} := algebra.likesubselect(X_135= [675074]:bat[:oid,:str],X_127= [675073]:bat[:oid,:oid],A0=\"%QCrawl%\":str,\"\":str,false:bit);" ] [ 39817, "X_117= [675067]:bat[:oid,:oid]{horigin=3, torigin=3} := sql.tid(X_3=0:int,\"sys\":str,\"agents\":str,2:int,8:int);" ] [ 39441, "X_125= [675038]:bat[:oid,:oid]{horigin=7, torigin=7} := sql.tid(X_3=0:int,\"sys\":str,\"agents\":str,6:int,8:int);" ] The overall order of these three versions is to be expected. The simple like does least work. The with the ilike, the actual comparison is expected to be (slightly?) more expensive than with the simple like; basically, is also has to nomalize the case internally. The version with lower() + like --- basically re-implementing ilike "the crude way", is expected to be more expensive than the other two, mainly as it first needs to convert and materialize all data (in your case 5400543 strings of up to 100 characters each, i.e., roughly between 5.5 MB and 550 MB, or so) --- in memory, if it fits --- before it does the same like comparison as the simple version.
Whether the difference is bound to be that big is had to tell without running it myself, or at least without knowing some more details about your data and machine, e.g., how much memory does the machine have, and how big (in MB) is your data (in particular the "ua" column); you might consider using the storage() function to check the latter: select * from storage(); Also, memory performance might be influenced by Linux kernel settings that might have changed with your latest Debian upgrade, we cannot exclude that the performance changes are not only due to the MonetDB upgrade.
I cannot find any code chances in MonetDB's lower() implementation since the Oct2014 release that would trigger any performance degradation, let alone between the Jul2015-SP1 and Ju2015-SP2 releases.
All I can find is a very recent change that is supposed to improve performance, but that is not included in any release, yet. cf., http://dev.monetdb.org/hg/MonetDB/rev/43fb721ed302 and https://www.monetdb.org/pipermail/developers-list/2016-January/004563.html
I am sure you are facing the same problem. In fact a quick test on 20M strings of 2 to 147 chars (35 chars on average) long just revealed that a query similar to your lower() + like query takes 1.5 min with the Jul2015-SP2 release, but only 7 secs with the latest (not yet released) code base of the Jul2015 branch, i.e., after above fix. (all on my 16GB 8-core 3.4GHz machine)
(We are currently planning to prepare the next feature release, which will also include the above fix, but that will take a few more months. I'm not sure, whether we'll manage to issue a Jul2015-SP3 bugfix release before that ...)
The questions remains, why you pervious version of MonetDB showed so much better performance on the very same data and very same machine for the very same query, although I cannot find and lower() related code changes since the Oct2014 release. For this, it would really help us to know with which version of MonetDB your lower() + like was so much (> 100x ?) faster than it is now with Jul2015-SP2. Maybe your merovingian log(s) date back for enough?
Best, Stefan
----- On Feb 28, 2016, at 1:32 PM, Andreas F. Hoffmann me@theafh.net wrote:
Hello Martin and Stefan,
its Sunday and time for some server fun! ;-)
On 25.02.2016 09:47, Stefan Manegold wrote:
@Martin: The issue you mention where never release; hence, I do not think they play a role, here. OK, but for the sake of completeness: There is nothing suspicious to spot in the merovingian.log of this DB farm. Nothing in general and especially nothing when such a slow query is running...
@Andreas: Most important question: Which version of MonetDB did work fine before, and which does no longer after the upgrade? I have no idea how to find out which version was the last. Sorry! I regularly install the Debian updates, it was the last "official" packet before the one I installed a few days before... My current version of monetdb is shown as: “MonetDB Database Server Toolkit v1.1 (Jul2015-SP2)” and im running Debian GNU/Linux 7 (wheezy). To make it also clear: there was no change to the DB Farm between software updates. Probably it is a specific problem with that current MonetDB5-Debian Package?
For the new version, you could profile your quer(y|ies) by prefixing them with TRACE --- feel free to share the results for analysis --- unfortunately, you cannot (easily) downgrade to the older version to also profile that one. Would you be to share all ingredients to reproduce the problem? I can give you the monetdb version, the logs of the 3 traces ("lower", "ilike" and "simple", attached to this mail) unfortunately not the data itself... By the way, there is no difference in the behavior when I search other strings. The Breakdown comes when using LOWE(). The Table is extreme simple (filld with some million strings, the longest "ua" with some 100 chars). Its structure is build as follow: CREATE TABLE "agents" ("ua_id" int, "ua_hash" varchar(32),"ua" text, "class" varchar(3), "date" date);
Also, do I understand correctly that you tested the alternative omitting LOWER() and using ILIKE, instead, and that works fine (also) with the new version of MonetDB? Yes! Absolutely. Using ILIKE instead of LOWER() in the current version works fine.
If so, you might also want to profile (TRACE) that one to compare the trace with that of the "slow" query. I attached the 3 traces and I hope you gain insight from it! :-)
Best, Stefan
Thank you in advance, and greetings from Germany, Andreas (theafh)
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (4)
-
Andreas F. Hoffmann
-
Martin Kersten
-
Sjoerd Mullender
-
Stefan Manegold