On Mar 9, 2009, at 23:35, Martin Kersten wrote:
Please run it also against the HEAD, because most of the problems may have been resolved there.
Jan Rittinger wrote:
Hi Martin and others, I just tested what part the Pathfinder code generation plays and generated MIL code for the Aug2008 (0.24), the Nov2008, and the Feb2009 release branches. I ran all queries using the newest stable version (Feb2009) on Mac OS X. The observations are: * The problem with gdk_heap.mx, mmap, and Mac OS X still resides (all queries run in 10 seconds instead of 2 seconds)---Peter knows what I'm talking about. * Like Nils reported the queries are getting slower. * The main performance decrease in my scenario is the document loading. * The problem does not stem from Pathfinder's MIL code generation. For more details see the attached file... ------------------------------------------------------------------------ BTW: For todays' head version the results are even worse...
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Attached you find a bundle with the queries and the test results. (I'm using a Macbook Pro 2.6GHz Intel Core 2 Duo with 4GB of RAM running Mac OS X 10.5.6.)
Jan On Mar 9, 2009, at 18:08, Martin Kersten wrote:
For all interested. Indeed there are performance differences between the various releases. Some can be traced back to functional enhancements, others are a result from internal administrative activities.
Recent experiments with the TPC-H scale-factor 2 on Feb 2009 branch show a performance degradation compared to Aug 2008, as reported on the website.
It appears that some low-level actions related to allocation of BATs and their management in memory-scarce situations are debet to this situation.
Solutions are integrated with the HEAD, and may (depending on our resources) be back propagated into a bugfix release of the Feb 2009 version.
Nils Grimsmo wrote:
On Wed, Mar 04, 2009 at 11:08:40PM +0100, Jan Rittinger wrote:
Hi Nils,
I just ran your queries with the latest (not yet announced) Feb2009 release (http://monetdb.cwi.nl/downloads/sources/Feb2009/) and received an answer in 1.5 (Q1) and 2.5 (Q2) seconds. If you still have problems with the new version, then please let us know.
Thank you for your answer, Jan. Feb2009 is indeed faster than Nov2008, but on my computer it is still slower than Aug2008. I also see some strange and unfavorable performance characteristics on subsequent queries for Nov2008 and Feb2009 (see below).
Aug2008: # MonetDB Server v4.24.0 # based on GDK v1.24.0 # PF/Tijah module v0.5.0 loaded. http://dbappl.cs.utwente.nl/ pftijah # MonetDB/XQuery module v0.24.0 loaded (default back-end is 'algebra')
Nov2008-SP2: # MonetDB Server v4.26.4 # based on GDK v1.26.4 # PF/Tijah module v0.9.0 loaded. http://dbappl.cs.utwente.nl/ pftijah # MonetDB/XQuery module v0.26.4 loaded (default back-end is 'algebra')
Feb2009: # MonetDB Server v4.28.0 # Based on GDK v1.28.0 # PF/Tijah module v0.9.0 loaded. http://dbappl.cs.utwente.nl/ pftijah # MonetDB/XQuery module v0.28.0 loaded (default back-end is 'algebra')
I run the queries multiple times in different scenarios.
A - Have just indexed the document, first run. B - Second run (subsequent have similar timing). C - Restart the server (Mserver), then first run. D - Second run (subsequent have similar timing).
Query Q0: Aug2008 Nov2008 Feb2009 A 1101 3687 1760 B 1031 4510 3015 C 1350 5216 3390 D 1035 12620 9533
Query Q1: Aug2008 Nov2008 Feb2009 A 2161 15119 3013 B 2099 19292 4072 C 2526 18523 4567 D 2117 42555 10602
This seems very strange to me. The timings make sense for Aug2008, where the query is slightly slower right after restarting the server (C). For Nov2008 and Feb2009, the second (and subsequent) runs are slower than the first. How can this be? It can make sense for the first run after restarting the server (C) to be slower (reading stuff from disk etc.), but why is the second (D) terribly slower? If I just keep running the query, the timings are similar to D.
Note: If I start mixing Q0 and Q1 after step D, they are both as slow as in step D.
I hope this feedback is helpful. Is there something strange with my setup, or is this a "bug"? (My timings in step (A) seem similar to Jan's timings).
If I want to compare MonetDB/XQuery to other implementations in a scientific paper, I typically want to warm up the system, then run the query multiple times to get an average timing. It is kind of inconvenient not to be able to close down Mserver between experiments...
P.S.: The E-Mail subject seems slightly off topic here :)
Yes, thought I'd avoid touching the mouse to copy the email address. Cut away In-Reply-To:, but forgot to change Subject:...
Thank you for your assistance!
Klem fra Nils
On Mar 4, 2009, at 16:30, Nils Grimsmo wrote:
Hi, I just upgraded from the Augst to the Noveber super-ball, and the performance has worsened badly.
Example queries on dblp.xml (441 MB):
Q0: count(/dblp//author[text()="Michael Stonebraker"]) Q1: count(/dblp/*/author[text()="Michael Stonebraker"])
Query time in milliseconds:
August November Q0 1100 4867 Q1 3993 17999
I have compiled with --enable-optimise both times. I query with:
mclient --language=xquery --algebra --time < $QUERYFILE
Is this performance degradation expected? If so, why?
BTW: Is there any way of finding how much disk space a collection uses?
Thank you for contributing free software!
Klem fra Nils
-- Jan Rittinger Lehrstuhl Datenbanken und Informationssysteme Wilhelm-Schickard-Institut für Informatik Eberhard-Karls-Universität Tübingen http://www-db.informatik.uni-tuebingen.de/team/rittinger