
Hi,
I will try to answer some of your questions.
On Mon, Mar 16, 2009 at 2:56 PM, Nils Grimsmo
Hi!
I have some questions as to how MonetDB/XQuery should be compared fairly to other systems.
If I re-run a query multiple times in a single call to `mclient`, is any calculation re-used? How about if I run multiple similar queries in a single call?
MonetDB/Xquery uses the MonetDB 4 server which does not have any "recycling" of results. MonetDB5 has indeed these capabilities but it is used only woth the SQL front-end.
Example:
$ cat www.xq count(doc("dblp")//www)
$ cat www_s10.xq (count(doc("dblp")//www), count(doc("dblp")//www), count(doc("dblp")//www), count(doc("dblp")//www), count(doc("dblp")//www), count(doc("dblp")//www), count(doc("dblp")//www), count(doc("dblp")//www), count(doc("dblp")//www), count(doc("dblp")//www))
$ cat www_s100.xq (count(doc("dblp")//www), ... count(doc("dblp")//www))
$ mclient --language=xquery --time < www.xq 11760 Timer 22.552 msec
(Assert we are running hot.)
$ mclient --language=xquery --time < www.xq 11760 Timer 21.661 msec
$ mclient --language=xquery --time < www_s10.xq 11760, [snip] 11760 Timer 33.063 msec
$ mclient --language=xquery --time < www_s100.xq 11760, [snip] 11760 Timer 252.414 msec
So the average execution times are 22, 3.3 and 2.5 milliseconds. Is the extra cost for the first query just starting up the client program, or is some calculation re-used?
The extra cost is just the start-up, since you are already ensure warm runs and your data is small enough to fit in memory. We had some tests done in the past of our own and verified that it is the start-up cost.
If we now look at more expensive queries:
$ cat dblp_authors.xq count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"])
Just repeating the same:
$ cat dblp_authors_s10.xq (count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]))
Different but related queries:
$ cat dblp_authors_x10.xq (count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]), count(doc("dblp")/dblp//author[text()="Wen Gao"]), count(doc("dblp")/dblp//author[text()="Irith Pomeranz"]), count(doc("dblp")/dblp//author[text()="Hector Garcia-Molina"]), count(doc("dblp")/dblp//author[text()="Moshe Y. Vardi"]), count(doc("dblp")/dblp//author[text()="Joseph Y. Halpern"]), count(doc("dblp")/dblp//author[text()="Noga Alon"]), count(doc("dblp")/dblp//author[text()="Wei Li"]), count(doc("dblp")/dblp//author[text()="Ming Li"]), count(doc("dblp")/dblp//author[text()="Donald F. Towsley"]) )
$ mclient --language=xquery --time < dblp_authors.xq 351 Timer 1238.436 msec
$ mclient --language=xquery --time < dblp_authors.xq 351 Timer 1253.927 msec
$ mclient --language=xquery --time < dblp_authors_s10.xq 351, ... 351 Timer 1284.191 msec
$ mclient --language=xquery --time < dblp_authors_x10.xq 351, 347, 346, 341, 334, 334, 330, 320, 320, 317 Timer 2610.589 msec
Here the average times are 1238, 128 and 261 milliseconds. Here the difference is clearly not just startup of the client.
It is the startup cost still. In both cases you are sending 1 query, which has to be compiled, optimized and run, just because you are asking for 10 different things it does not mean that it runs 10 different separated plans (10 different xpath-axes steps for example.) So if you divide the time by 10 in the second case, you are just dividing by 10 the same amount of work in principle as your first query. Try to run this queries: $ cat dblp_authors_q10.xq count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]) <> count(doc("dblp")/dblp//author[text()="Wen Gao"]) <> count(doc("dblp")/dblp//author[text()="Irith Pomeranz"]) <> count(doc("dblp")/dblp//author[text()="Hector Garcia-Molina"]) <> count(doc("dblp")/dblp//author[text()="Moshe Y. Vardi"]) <> count(doc("dblp")/dblp//author[text()="Joseph Y. Halpern"]) <> count(doc("dblp")/dblp//author[text()="Noga Alon"]) <> count(doc("dblp")/dblp//author[text()="Wei Li"]) <> count(doc("dblp")/dblp//author[text()="Ming Li"]) <> count(doc("dblp")/dblp//author[text()="Donald F. Towsley"]) These are 10 different queries.
If this was not a client-server architecture, I would guess the difference came from opening files, getting stuff into cache, etc.. Are there similar reasons here?
What parts of the calculations are actually done inside the client, if any? If the answer is none, why is this behavior seen?
The client does not do any calculations.
In conclusion: When running multiple queries, what would be the most fair way to compare MonetDB/XQuery to other client/server architectures in your view? Concatenating the queries in a single call to `mclient`, or multiple calls?
If you are after multiple queries, then I would suggest to write all your queries in one file, seperating each query with '<>' and the feed that file to a single mclient.
When timing a single query, can it be repeated multiple times in a single call, and the average taken, without being unfair?
As long as you consider hot runs, I would say yes. Hope i could help, lefteris
If I use for example MS SQL Server 2008, there is no gain from a single invocation of the client, whether I have multiple SQL statements
SELECT x.query('$q) FROM t; ..., SELECT x.query('$q) FROM t;
Or a single SQL statement with a list of XPath queries
SELECT x.query('($q, ..., $q)') FROM t;
Klem fra Nils
-- http://www.idi.ntnu.no/~nilsgri/ Why is this thus? What is the reason of this thusness? - Artemus Ward
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users