[Monetdb-developers] join performance issue!

25 Jul 2006

      Dear all,

to prevent you from falling into the same trap as I did this afternoon when
trying to analyze, measure & compare positional (fetch-)join performance,
here's a warning and my detailed story:

WARNING:

When running 'join(BAT,BAT);' in MIL or 'algebra.join(BAT,BAT);' in MAL,
be aware that this includes not only the actual join algorithm, but also a
BATpropcheck() call to check and set properties of the join result
(ALWAYS, i.e., also with the default --debug=0 !)

NOTE:
The latter can easily be up to 85% (*eighty-five* percent!) of the time you
measure with a simple wall-clock around the 'join(BAT,BAT);' or
'algebra.join(BAT,BAT);' (see details below) !

Detailed Story:

Preparing to invent and implement another cache-aware/-friendly positional
(fetch-)join algorithm for some special cases, I was trying to assess the
performance of the current positional (fetch-)join algorithms for the
following three cases:

	1) join(ld,r);
	2) join(lo,r);
	3) join(lu,r);

In all three cases, r is a [V(o)ID, INT] BAT, i.e., has a dense
non-materialized head and a randomly distributed integer tail.
ld, lo, lu are all [INT, OID] BATs with a randomly distributed integer tail,
and an OID head that is 
	in ld: dense (and marked so in the BAT header) (but materialized)
	in lo: ordered (actually also dense, but only marked ordered, not
	       marked dense in the BAT header)
	in lu: unordered (random shuffle of an originally dense sequense)

Thus, the three case trigger the use following join implementations,
respectively:
	1) densefetchjoin
	2) orderedfetchjoin
	3) defaultfetchjoin

I used the same size (count) four BATs, and the join is a perfect 1:1 hit;
hence the result is as big as the inputs.

Here are the wall-clock times (in micro seconds) for BATs of 10000000 tuples 
	MonetDB 4.13.1 (CVS HEAD)
	default compilation (gcc -g -O2)
	64-bits and 64-bit OIDs
	2 GHz AMD Athlon(tm) 64 X2 Dual Core Processor 3800+
	2 GB RAM

        1) join(ld,r);	2065162 us
        2) join(lo,r);	2090277 us
        3) join(lu,r);	3058936 us

A close look under the hood (i.e., in BATjoin in src/gdk/gdk_relop.mx) reveals the
following split timings for the actual join (batjoin()) and the property
checking (BATpropcheck()):

			batjoin()	BATpropcheck()	join()
	1) join(ld,r);	 275331 us	1774275 us	2065162 us	(abs)
			  13.33 %	  85.91 %	 100.00 %	(rel)
	2) join(lo,r);	 303311 us	1770722 us	2090277 us	(abs)
			  14.51 %	  84.71 %	 100.00 %	(rel)
	3) join(lu,r);	1268057 us	1775788 us	3058936 us	(abs)
			  41.45 %	  58.05 %	 100.00 %	(rel)

Well, there are some lessons to learn:

- Be aware of the BATpropcheck when you measure join performance!

- The "mandatory" BATpropcheck is only used for BATjoin and BATleftjoin
  in src/gdk/gdk_relop.mx --- at least as far as I know.

- The "mandatory" BATpropcheck in BATjoin and BATleftjoin is there for a
  good reason: With joins, it's hard to guess the properties of the result
  (are head and/or tail sorted, dense, and or key/unique?) in general, but
  the performance of MonetDB highly depends on the availability of these
  properties; hence, while making the join (considerably) more expensive,
  the investment (might) pay-off with later operation on the join result.

- We should consider making this BATpropcheck in BATjoin and BATleftjoin
  configurable from the outside, e.g., by adding an extra optional argument
  to join() to request to skipping the BATpropcheck.

I hope you find this information useful...

Stefan

-- 
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI,  P.O.Box 94079 | http://www.cwi.nl/~manegold/  |
| 1090 GB Amsterdam   | Tel.: +31 (20) 592-4212       |
| The Netherlands     | Fax : +31 (20) 592-4312       |