Peter, which part of your changes do fix the problem with updatedable shredding of large XML documents as reporten in [ 1811229 ] [ADT] Adding large document, with update support http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468 ? The new has string function in gdk_atoms.mx or the file descriptor fixes in gdk_posix.mx? The former looks for like a performance fix to me --- too many collisions should only slows the system down, but not copromize its fucntionallity/correctness, right? Also with the new string has functions ("too") many collisions can still occur with certain datasets ... Stefan On Sun, Oct 14, 2007 at 08:31:36PM +0000, Stefan Manegold wrote:
Update of /cvsroot/monetdb/MonetDB/src/gdk In directory sc8-pr-cvs16.sourceforge.net:/tmp/cvs-serv15103
Modified Files: Tag: MonetDB_1-20 gdk_atoms.mx gdk_posix.mx Log Message:
[checkin on behalf of Peter]
fixing XQuery bug [ 1811229 ] [ADT] Adding large document, with update support http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468
gdk_atoms.mx: - hash collisions in strings that consists of digits only (a common case!) we now use a fast derivative of the Bob Jenkins function from now on
Really bad collisions, in case of the 20GB document of the bug report, shredding took 8 hours before, 1 hour after this change.
NOTE: this change affects the binary format (string heaps) and all product families, as the hash function is a compiled-in macro! In particular, lookup operations and joins on SQL (Monet4/5) columns consisting of digits only, but stored in a VARCHAR, should be faster after this check-in.
gdk_posix.mx - we lost track of the file descriptor for large heaps (the file desc is given to the mmap-monitoring-thread to close later), such that the remap function could fail (when it was given the illegal file descriptor 0)
NOTE: this change only affects xquery it only uses remap()
Index: gdk_posix.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_posix.mx,v retrieving revision 1.143 retrieving revision 1.143.2.1 diff -u -d -r1.143 -r1.143.2.1 --- gdk_posix.mx 4 Sep 2007 17:55:20 -0000 1.143 +++ gdk_posix.mx 14 Oct 2007 20:31:33 -0000 1.143.2.1 @@ -615,7 +615,7 @@ MT_mmap_tab[i].writable = writable; MT_mmap_tab[i].fd = fd; MT_mmap_tab[i].pincnt = 0; - fd = -1; + fd = -fd; } (void) pthread_mutex_unlock(&MT_mmap_lock); return fd; @@ -1051,9 +1051,7 @@ } if (ret != (void *) -1L) { hdl->fixed = ret; - fd = MT_mmap_new(path, ret, len, fd, (mode & MMAP_WRITABLE)); - if (fd <= 0) - hdl->hdl = (void *) 0; /* MT_mmap_new keeps the fd */ + hdl->hdl = (void*) (ssize_t) MT_mmap_new(path, ret, len, fd, (mode & MMAP_WRITABLE)); } return ret; } @@ -1061,13 +1059,12 @@ void * MT_mmap_remap(MT_mmap_hdl *hdl, off_t off, size_t len) { - void *ret; - - ret = mmap(hdl->fixed, + int fd = (int) (ssize_t) hdl->hdl; + void *ret = mmap(hdl->fixed, len, ((hdl->mode & MMAP_WRITABLE) ? PROT_WRITE : 0) | PROT_READ, ((hdl->mode & MMAP_COPY) ? (MAP_PRIVATE | MAP_NORESERVE) : MAP_SHARED) | (hdl->fixed ? MAP_FIXED : 0), - (int) (ssize_t) hdl->hdl, + (fd < 0)?-fd:fd, off);
if (ret != (void *) -1L) { @@ -1083,9 +1080,7 @@ MT_mmap_close(MT_mmap_hdl *hdl) { int fd = (int) (ssize_t) hdl->hdl; - - if (fd) - close(fd); + if (fd > 0) close(fd); hdl->hdl = NULL; }
Index: gdk_atoms.mx =================================================================== RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_atoms.mx,v retrieving revision 1.134 retrieving revision 1.134.6.1 diff -u -d -r1.134 -r1.134.6.1 --- gdk_atoms.mx 2 May 2007 16:16:58 -0000 1.134 +++ gdk_atoms.mx 14 Oct 2007 20:31:32 -0000 1.134.6.1 @@ -1878,13 +1878,19 @@ rotates all characters together. It is optimized to process 2 characters at a time (adding 16-bits to the hash value each iteration). @h -#define GDK_STRHASH(x,y) { \ - str _c = (str) (x); \ - for((y)=0; _c[0] && _c[1]; _c+=2) { \ - (y) = ((y) << 3) ^ ((y) >> 11) ^ ((y) >> 17) ^ (_c[1] << 8) ^ _c[0];\ - } \ - (y) ^= _c[0]; \ +#define GDK_STRHASH(x,y) {\ + str _key = (str) (x);\ + int _i;\ + for (_i = y = 0; _key[_i]; _i++) {\ + y += _key[_i];\ + y += (y << 10);\ + y ^= (y >> 6);\ + }\ + y += (y << 3);\ + y ^= (y >> 11);\ + y += (y << 15);\ } + @c hash_t strHash(str s)
------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/ _______________________________________________ Monetdb-checkins mailing list Monetdb-checkins@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-checkins
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |