Peter,
which part of your changes do fix the problem with updatedable shredding of
large XML documents as reporten in
[ 1811229 ] [ADT] Adding large document, with update support
http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468
?
The new has string function in gdk_atoms.mx or the file descriptor fixes in
gdk_posix.mx?
The former looks for like a performance fix to me --- too many collisions
should only slows the system down, but not copromize its
fucntionallity/correctness, right?
Also with the new string has functions ("too") many collisions can still
occur with certain datasets ...
Stefan
On Sun, Oct 14, 2007 at 08:31:36PM +0000, Stefan Manegold wrote:
Update of /cvsroot/monetdb/MonetDB/src/gdk
In directory sc8-pr-cvs16.sourceforge.net:/tmp/cvs-serv15103
Modified Files:
Tag: MonetDB_1-20
gdk_atoms.mx gdk_posix.mx
Log Message:
[checkin on behalf of Peter]
fixing XQuery bug
[ 1811229 ] [ADT] Adding large document, with update support
http://sourceforge.net/tracker/index.php?func=detail&aid=1811229&group_id=56967&atid=482468
gdk_atoms.mx:
- hash collisions in strings that consists of digits only (a common case!)
we now use a fast derivative of the Bob Jenkins function from now on
Really bad collisions, in case of the 20GB document of the bug report,
shredding took 8 hours before, 1 hour after this change.
NOTE: this change affects the binary format (string heaps) and all product
families, as the hash function is a compiled-in macro!
In particular, lookup operations and joins on SQL (Monet4/5) columns
consisting of digits only, but stored in a VARCHAR, should be faster
after this check-in.
gdk_posix.mx
- we lost track of the file descriptor for large heaps (the file desc is given
to the mmap-monitoring-thread to close later), such that the remap function
could fail (when it was given the illegal file descriptor 0)
NOTE: this change only affects xquery it only uses remap()
Index: gdk_posix.mx
===================================================================
RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_posix.mx,v
retrieving revision 1.143
retrieving revision 1.143.2.1
diff -u -d -r1.143 -r1.143.2.1
--- gdk_posix.mx 4 Sep 2007 17:55:20 -0000 1.143
+++ gdk_posix.mx 14 Oct 2007 20:31:33 -0000 1.143.2.1
@@ -615,7 +615,7 @@
MT_mmap_tab[i].writable = writable;
MT_mmap_tab[i].fd = fd;
MT_mmap_tab[i].pincnt = 0;
- fd = -1;
+ fd = -fd;
}
(void) pthread_mutex_unlock(&MT_mmap_lock);
return fd;
@@ -1051,9 +1051,7 @@
}
if (ret != (void *) -1L) {
hdl->fixed = ret;
- fd = MT_mmap_new(path, ret, len, fd, (mode & MMAP_WRITABLE));
- if (fd <= 0)
- hdl->hdl = (void *) 0; /* MT_mmap_new keeps the fd */
+ hdl->hdl = (void*) (ssize_t) MT_mmap_new(path, ret, len, fd, (mode & MMAP_WRITABLE));
}
return ret;
}
@@ -1061,13 +1059,12 @@
void *
MT_mmap_remap(MT_mmap_hdl *hdl, off_t off, size_t len)
{
- void *ret;
-
- ret = mmap(hdl->fixed,
+ int fd = (int) (ssize_t) hdl->hdl;
+ void *ret = mmap(hdl->fixed,
len,
((hdl->mode & MMAP_WRITABLE) ? PROT_WRITE : 0) | PROT_READ,
((hdl->mode & MMAP_COPY) ? (MAP_PRIVATE | MAP_NORESERVE) : MAP_SHARED) | (hdl->fixed ? MAP_FIXED : 0),
- (int) (ssize_t) hdl->hdl,
+ (fd < 0)?-fd:fd,
off);
if (ret != (void *) -1L) {
@@ -1083,9 +1080,7 @@
MT_mmap_close(MT_mmap_hdl *hdl)
{
int fd = (int) (ssize_t) hdl->hdl;
-
- if (fd)
- close(fd);
+ if (fd > 0) close(fd);
hdl->hdl = NULL;
}
Index: gdk_atoms.mx
===================================================================
RCS file: /cvsroot/monetdb/MonetDB/src/gdk/gdk_atoms.mx,v
retrieving revision 1.134
retrieving revision 1.134.6.1
diff -u -d -r1.134 -r1.134.6.1
--- gdk_atoms.mx 2 May 2007 16:16:58 -0000 1.134
+++ gdk_atoms.mx 14 Oct 2007 20:31:32 -0000 1.134.6.1
@@ -1878,13 +1878,19 @@
rotates all characters together. It is optimized to process 2 characters
at a time (adding 16-bits to the hash value each iteration).
@h
-#define GDK_STRHASH(x,y) { \
- str _c = (str) (x); \
- for((y)=0; _c[0] && _c[1]; _c+=2) { \
- (y) = ((y) << 3) ^ ((y) >> 11) ^ ((y) >> 17) ^ (_c[1] << 8) ^ _c[0];\
- } \
- (y) ^= _c[0]; \
+#define GDK_STRHASH(x,y) {\
+ str _key = (str) (x);\
+ int _i;\
+ for (_i = y = 0; _key[_i]; _i++) {\
+ y += _key[_i];\
+ y += (y << 10);\
+ y ^= (y >> 6);\
+ }\
+ y += (y << 3);\
+ y ^= (y >> 11);\
+ y += (y << 15);\
}
+
@c
hash_t
strHash(str s)
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Monetdb-checkins mailing list
Monetdb-checkins@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-checkins
--
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ |
| 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 |
| The Netherlands | Fax : +31 (20) 592-4312 |