Re: [MonetDB-users] Apples and oranges

16 Mar 2009

      Hi,

I will try to answer some of your questions.

On Mon, Mar 16, 2009 at 2:56 PM, Nils Grimsmo  wrote:
...
Hi!
I have some questions as to how MonetDB/XQuery should be compared fairly
to other systems.
If I re-run a query multiple times in a single call to `mclient`, is any
calculation re-used?  How about if I run multiple similar queries in a
single call?
MonetDB/Xquery uses the MonetDB 4 server which does not have any
"recycling" of results. MonetDB5 has indeed these capabilities but it
is used only woth the SQL front-end.
...
Example:
 $ cat www.xq
 count(doc("dblp")//www)
 $ cat www_s10.xq
 (count(doc("dblp")//www),
 count(doc("dblp")//www),
 count(doc("dblp")//www),
 count(doc("dblp")//www),
 count(doc("dblp")//www),
 count(doc("dblp")//www),
 count(doc("dblp")//www),
 count(doc("dblp")//www),
 count(doc("dblp")//www),
 count(doc("dblp")//www))
 $ cat www_s100.xq
 (count(doc("dblp")//www),
 ...
 count(doc("dblp")//www))
 $ mclient --language=xquery --time < www.xq
 11760
 Timer      22.552 msec
(Assert we are running hot.)
 $ mclient --language=xquery --time < www.xq
 11760
 Timer      21.661 msec
 $ mclient --language=xquery --time < www_s10.xq
 11760,
 [snip]
 11760
 Timer      33.063 msec
 $ mclient --language=xquery --time < www_s100.xq
 11760,
 [snip]
 11760
 Timer     252.414 msec
So the average execution times are 22, 3.3 and 2.5 milliseconds.  Is the
extra cost for the first query just starting up the client program, or is
some calculation re-used?
The extra cost is just the start-up, since you are already ensure warm
runs and your data is small enough to fit in memory. We had some tests
done in the past of our own and verified that it is the start-up cost.
...
If we now look at more expensive queries:
 $ cat dblp_authors.xq
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"])
Just repeating the same:
 $ cat dblp_authors_s10.xq
 (count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]))
Different but related queries:
 $ cat dblp_authors_x10.xq
 (count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
 count(doc("dblp")/dblp//author[text()="Wen Gao"]),
 count(doc("dblp")/dblp//author[text()="Irith Pomeranz"]),
 count(doc("dblp")/dblp//author[text()="Hector Garcia-Molina"]),
 count(doc("dblp")/dblp//author[text()="Moshe Y. Vardi"]),
 count(doc("dblp")/dblp//author[text()="Joseph Y. Halpern"]),
 count(doc("dblp")/dblp//author[text()="Noga Alon"]),
 count(doc("dblp")/dblp//author[text()="Wei Li"]),
 count(doc("dblp")/dblp//author[text()="Ming Li"]),
 count(doc("dblp")/dblp//author[text()="Donald F. Towsley"])
 )
 $ mclient --language=xquery --time < dblp_authors.xq
 351
 Timer    1238.436 msec
 $ mclient --language=xquery --time < dblp_authors.xq
 351
 Timer    1253.927 msec
 $ mclient --language=xquery --time < dblp_authors_s10.xq
 351,
 ...
 351
 Timer    1284.191 msec
 $ mclient --language=xquery --time < dblp_authors_x10.xq
 351,
 347,
 346,
 341,
 334,
 334,
 330,
 320,
 320,
 317
 Timer    2610.589 msec
Here the average times are 1238, 128 and 261 milliseconds.  Here the
difference is clearly not just startup of the client.
It is the startup cost still. In both cases you are sending 1 query,
which has to be compiled, optimized and run, just because you are
asking for 10 different things it does not mean that it runs 10
different separated plans (10 different xpath-axes steps for example.)
So if you divide the time by 10 in the second case, you are just
dividing by 10 the same amount of work in principle as your first
query. Try to run this queries:

$ cat dblp_authors_q10.xq
 count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"])
<>
 count(doc("dblp")/dblp//author[text()="Wen Gao"])
<>
 count(doc("dblp")/dblp//author[text()="Irith Pomeranz"])
<>
 count(doc("dblp")/dblp//author[text()="Hector Garcia-Molina"])
<>
 count(doc("dblp")/dblp//author[text()="Moshe Y. Vardi"])
<>
 count(doc("dblp")/dblp//author[text()="Joseph Y. Halpern"])
<>
 count(doc("dblp")/dblp//author[text()="Noga Alon"])
<>
 count(doc("dblp")/dblp//author[text()="Wei Li"])
<>
 count(doc("dblp")/dblp//author[text()="Ming Li"])
<>
 count(doc("dblp")/dblp//author[text()="Donald F. Towsley"])

These are 10 different queries.
...
If this was not a client-server architecture, I would guess the difference
came from opening files, getting stuff into cache, etc..  Are there
similar reasons here?
What parts of the calculations are actually done inside the client, if
any?  If the answer is none, why is this behavior seen?
The client does not do any calculations.
...
In conclusion:  When running multiple queries, what would be the most fair
way to compare MonetDB/XQuery to other client/server architectures in your
view?  Concatenating the queries in a single call to `mclient`, or
multiple calls?
If you are after multiple queries, then I would suggest to write all
your queries in one file, seperating each query with '<>' and the feed
that file to a single mclient.
...
When timing a single query, can it be repeated multiple times in a
single call, and the average taken, without being unfair?
As long as you consider hot runs, I would say yes.

Hope i could help,
lefteris
...
If I use for example MS SQL Server 2008, there is no gain from a single
invocation of the client, whether I have multiple SQL statements
 SELECT x.query('$q) FROM t; ..., SELECT x.query('$q) FROM t;
Or a single SQL statement with a list of XPath queries
 SELECT x.query('($q, ..., $q)') FROM t;
Klem fra Nils
--
http://www.idi.ntnu.no/~nilsgri/                Why is this thus?
                            What is the reason of this thusness?
                                                 - Artemus Ward
------------------------------------------------------------------------------
Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are
powering Web 2.0 with engaging, cross-platform capabilities. Quickly and
easily build your RIAs with Flex Builder, the Eclipse(TM)based development
software that enables intelligent coding and step-through debugging.
Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users

Re: [MonetDB-users] Apples and oranges

Lefteris