[MonetDB-users] Apples and oranges

16 Mar 2009

      Hi!

I have some questions as to how MonetDB/XQuery should be compared fairly
to other systems.

If I re-run a query multiple times in a single call to `mclient`, is any
calculation re-used?  How about if I run multiple similar queries in a
single call?

Example:

  $ cat www.xq 
  count(doc("dblp")//www)

  $ cat www_s10.xq 
  (count(doc("dblp")//www),
  count(doc("dblp")//www),
  count(doc("dblp")//www),
  count(doc("dblp")//www),
  count(doc("dblp")//www),
  count(doc("dblp")//www),
  count(doc("dblp")//www),
  count(doc("dblp")//www),
  count(doc("dblp")//www),
  count(doc("dblp")//www))

  $ cat www_s100.xq 
  (count(doc("dblp")//www),
  ...
  count(doc("dblp")//www))

  $ mclient --language=xquery --time < www.xq 
  11760
  Timer      22.552 msec 

(Assert we are running hot.)

  $ mclient --language=xquery --time < www.xq 
  11760
  Timer      21.661 msec 

  $ mclient --language=xquery --time < www_s10.xq 
  11760,
  [snip]
  11760
  Timer      33.063 msec 

  $ mclient --language=xquery --time < www_s100.xq 
  11760,
  [snip]
  11760
  Timer     252.414 msec 

So the average execution times are 22, 3.3 and 2.5 milliseconds.  Is the
extra cost for the first query just starting up the client program, or is
some calculation re-used?

If we now look at more expensive queries:

  $ cat dblp_authors.xq 
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"])

Just repeating the same:

  $ cat dblp_authors_s10.xq 
  (count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]))

Different but related queries:

  $ cat dblp_authors_x10.xq 
  (count(doc("dblp")/dblp//author[text()="Grzegorz Rozenberg"]),
  count(doc("dblp")/dblp//author[text()="Wen Gao"]),
  count(doc("dblp")/dblp//author[text()="Irith Pomeranz"]),
  count(doc("dblp")/dblp//author[text()="Hector Garcia-Molina"]),
  count(doc("dblp")/dblp//author[text()="Moshe Y. Vardi"]),
  count(doc("dblp")/dblp//author[text()="Joseph Y. Halpern"]),
  count(doc("dblp")/dblp//author[text()="Noga Alon"]),
  count(doc("dblp")/dblp//author[text()="Wei Li"]),
  count(doc("dblp")/dblp//author[text()="Ming Li"]),
  count(doc("dblp")/dblp//author[text()="Donald F. Towsley"])
  )

  $ mclient --language=xquery --time < dblp_authors.xq 
  351
  Timer    1238.436 msec 

  $ mclient --language=xquery --time < dblp_authors.xq 
  351
  Timer    1253.927 msec 

  $ mclient --language=xquery --time < dblp_authors_s10.xq 
  351,
  ...
  351
  Timer    1284.191 msec 

  $ mclient --language=xquery --time < dblp_authors_x10.xq 
  351,
  347,
  346,
  341,
  334,
  334,
  330,
  320,
  320,
  317
  Timer    2610.589 msec 

Here the average times are 1238, 128 and 261 milliseconds.  Here the
difference is clearly not just startup of the client.  

If this was not a client-server architecture, I would guess the difference
came from opening files, getting stuff into cache, etc..  Are there
similar reasons here?

What parts of the calculations are actually done inside the client, if
any?  If the answer is none, why is this behavior seen?

In conclusion:  When running multiple queries, what would be the most fair
way to compare MonetDB/XQuery to other client/server architectures in your
view?  Concatenating the queries in a single call to `mclient`, or
multiple calls?

When timing a single query, can it be repeated multiple times in a
single call, and the average taken, without being unfair?

If I use for example MS SQL Server 2008, there is no gain from a single
invocation of the client, whether I have multiple SQL statements

  SELECT x.query('$q) FROM t; ..., SELECT x.query('$q) FROM t;

Or a single SQL statement with a list of XPath queries

  SELECT x.query('($q, ..., $q)') FROM t;

Klem fra Nils

-- 
http://www.idi.ntnu.no/~nilsgri/                Why is this thus?
                             What is the reason of this thusness?
                                                  - Artemus Ward

[MonetDB-users] Apples and oranges

Nils Grimsmo