Thanks Stefan,
I'll leave the further interpretation of the above results to the interested recipient / reader.
I will give it a try, then.
S08-64: SR QR SU QU[1] QU[2] mh 659m34s 1m09s 81m06s ERROR - mH - - 383m17s ERROR - Mh - - 77m03s 344m32s 1m34s MH 644m01s 0m49s 390m42s 342m47s 1m36s
These h/SU results are strange, because a 5 x faster time due to a hash function with more collisions is.. puzzling. In any case, with 64-bits oids on a 8GB machine, the shredding generates very intense swapping, and Linux performance under swapping is known to be highly variable. Difficult to draw conclusions. I am still investigating the behavior of MT_mmap under load and it may well be that shredding performance can be improved by better memory management (less swapping). From the result below we know that it can be done in 25m. The not-yet checked in improvements in index construction will strongly reduce the QU[1] time by taking less memory; in the order of the 100m seen below.
S16-32: SR QR SU QU[1] QU[2] mh 127m59s 0m17s 43m14s ERROR - mH 110m33s 0m16s 26m26s ERROR - Mh 128m11s 0m18s 44m00s 100m50s 1m15s MH 191m42s 0m17s 25m43s 101m37s 1m21s
These numbers show that MonetDB prefers to operate on RAM resident data. I am glad you reproduced my 25 minutes h/SU result. Must be a coicidence that the mini hash self-join benchmark on 50M string numbers ran in 26s with the new hash and 43s with the old hash (we see the same numbers in minutes here repeated for shredding). In my current version, QU[2] with the ordered indices is back at 17s. Reason for 1m15s is that you tend to loose the hash-table on the index bats due to swapping. So that includes hash table creation. Whether QU[1] is acceptable is highly questionable. Starting the database takes 1.5 hours!! I am seriously considering to switch to persistent ordered indices for the updatable case. For that to happen, the worksing set needs to be extended with persistent delta-bats on those indices, changes to these deltas need to be logged (in order to make the indices recoverable).
-----Original Message----- From: Stefan Manegold [mailto:Stefan.Manegold@cwi.nl] Sent: Wednesday, October 17, 2007 1:54 PM To: monetdb-developers@lists.sourceforge.net Cc: Peter Boncz; Peter Boncz Subject: Re: [Monetdb-checkins] MonetDB/src/gdk gdk_atoms.mx, MonetDB_1-20, 1.134, 1.134.6.1 gdk_posix.mx, MonetDB_1-20, 1.143, 1.143.2.1
Just for the records:
I finally managed to finsh my experiments regarding [ 1811229 ] [ADT] Adding large document, with update support http://sourceforge.net/tracker/index.php?func=detail&aid=18112 29&group_id=56967&atid=482468 and the related code changes. For those interested, here's the detailed story:
"S08-64" System (beo-24): - 2x 64-bit Dual-Core Opteron270 @ 2 Ghz - 8 GB memory - MonetDB/XQuery 0.20, 64-bit, 64-bit OIDs, --enable-optimize (gcc 4.1.2)
"S16-32" System (core-1): - 4x 64-bit Dual-Core Opteron870 @ 2 Ghz - 16 GB memory - MonetDB/XQuery 0.20, 64-bit, 32-bit OIDs, --enable-optimize (gcc 4.1.2)
Document: http://mirror.openstreetmap.nl/planet/planet-071003.osm.bz2 (extracted: 19 GB XML file)
"SR" Shredding read-only: pf:add-doc(".../planet-071003.osm","planet-071003.osm")
"SU" Shredding updateable: pf:add-doc(".../planet-071003.osm","planet-071003.osm","planet -071003.osm",5)
"QR"/"QU" Count query: count(doc("planet-071003.osm")//*)
Configurations: m: without Peter's mmap fix in gdk_posix.mx (i.e., using rev. 1.143 of gdk_posix.mx) M: with Peter's mmap fix in gdk_posix.mx (i.e., using rev. 1.143.2.1 of gdk_posix.mx) h: without Peter's new string hash function in gdk_atoms.mx (i.e., using rev. 1.134 of gdk_atoms.mx) H: with Peter's new string hash function in gdk_atoms.mx (i.e., using rev. 1.134.6.1 of gdk_atoms.mx)
Results (wall-clock times):
S08-64: SR QR SU QU[1] QU[2] mh 659m34s 1m09s 81m06s ERROR - mH - - 383m17s ERROR - Mh - - 77m03s 344m32s 1m34s MH 644m01s 0m49s 390m42s 342m47s 1m36s
S16-32: SR QR SU QU[1] QU[2] mh 127m59s 0m17s 43m14s ERROR - mH 110m33s 0m16s 26m26s ERROR - Mh 128m11s 0m18s 44m00s 100m50s 1m15s MH 191m42s 0m17s 25m43s 101m37s 1m21s
(NB: "SR" includes building of indices, while "SU" does not; consequently, "QR" can exploit the indices built during "SR", while "QU[1]" has to build the indices first, and only "QU[2]" can exploit them.)
Apparently, the mmap fix in gdk_posix.mx seems to be sufficient to prevent the remap-ERROR reported (for a system & configuration similar to "S08-64") in [ 1811229 ] [ADT] Adding large document, with update support http://sourceforge.net/tracker/index.php?func=detail&aid=18112 29&group_id=56967&atid=482468
I'll leave the further interpretation of the above results to the interested recipient / reader.
Stefan