Dear all, to prevent you from falling into the same trap as I did this afternoon when trying to analyze, measure & compare positional (fetch-)join performance, here's a warning and my detailed story: WARNING: When running 'join(BAT,BAT);' in MIL or 'algebra.join(BAT,BAT);' in MAL, be aware that this includes not only the actual join algorithm, but also a BATpropcheck() call to check and set properties of the join result (ALWAYS, i.e., also with the default --debug=0 !) NOTE: The latter can easily be up to 85% (*eighty-five* percent!) of the time you measure with a simple wall-clock around the 'join(BAT,BAT);' or 'algebra.join(BAT,BAT);' (see details below) ! Detailed Story: Preparing to invent and implement another cache-aware/-friendly positional (fetch-)join algorithm for some special cases, I was trying to assess the performance of the current positional (fetch-)join algorithms for the following three cases: 1) join(ld,r); 2) join(lo,r); 3) join(lu,r); In all three cases, r is a [V(o)ID, INT] BAT, i.e., has a dense non-materialized head and a randomly distributed integer tail. ld, lo, lu are all [INT, OID] BATs with a randomly distributed integer tail, and an OID head that is in ld: dense (and marked so in the BAT header) (but materialized) in lo: ordered (actually also dense, but only marked ordered, not marked dense in the BAT header) in lu: unordered (random shuffle of an originally dense sequense) Thus, the three case trigger the use following join implementations, respectively: 1) densefetchjoin 2) orderedfetchjoin 3) defaultfetchjoin I used the same size (count) four BATs, and the join is a perfect 1:1 hit; hence the result is as big as the inputs. Here are the wall-clock times (in micro seconds) for BATs of 10000000 tuples MonetDB 4.13.1 (CVS HEAD) default compilation (gcc -g -O2) 64-bits and 64-bit OIDs 2 GHz AMD Athlon(tm) 64 X2 Dual Core Processor 3800+ 2 GB RAM 1) join(ld,r); 2065162 us 2) join(lo,r); 2090277 us 3) join(lu,r); 3058936 us A close look under the hood (i.e., in BATjoin in src/gdk/gdk_relop.mx) reveals the following split timings for the actual join (batjoin()) and the property checking (BATpropcheck()): batjoin() BATpropcheck() join() 1) join(ld,r); 275331 us 1774275 us 2065162 us (abs) 13.33 % 85.91 % 100.00 % (rel) 2) join(lo,r); 303311 us 1770722 us 2090277 us (abs) 14.51 % 84.71 % 100.00 % (rel) 3) join(lu,r); 1268057 us 1775788 us 3058936 us (abs) 41.45 % 58.05 % 100.00 % (rel) Well, there are some lessons to learn: - Be aware of the BATpropcheck when you measure join performance! - The "mandatory" BATpropcheck is only used for BATjoin and BATleftjoin in src/gdk/gdk_relop.mx --- at least as far as I know. - The "mandatory" BATpropcheck in BATjoin and BATleftjoin is there for a good reason: With joins, it's hard to guess the properties of the result (are head and/or tail sorted, dense, and or key/unique?) in general, but the performance of MonetDB highly depends on the availability of these properties; hence, while making the join (considerably) more expensive, the investment (might) pay-off with later operation on the join result. - We should consider making this BATpropcheck in BATjoin and BATleftjoin configurable from the outside, e.g., by adding an extra optional argument to join() to request to skipping the BATpropcheck. I hope you find this information useful... Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |