[Monetdb-developers] Fixing the update issue for Large XML documents
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hello, A short introduction. I'm Stefan de Konink, one of the students in the ADT course at the University of Amsterdam. I have a degree in Advanced Computer Science, so I know what C looks like ;) Currently I can use MonetDB on a rather heavy machine, basically to benchmark the storage of this machine. And look for bottle necks. I am also a participant of OpenStreetMap NL, as datacollector from already existing GIS data. OpenStreetMap runs on MySQL in the UK, and I wanted to find out if I could run it more efficiently on MonetDB. For compatibility reasons OSM publishes an XML file containing the current view of the database daily. My 'job' to see if I could get the same results on this XML file (using XQuery) as the active MySQL/Ruby implementation. Importing a 20GB XML file was great, but now the 'lessons' progressed we should update the data. So I took a recent version of the document (they migrated from one format to another this week), allocated 5% slackspace and run the pf:add-doc. This went ok, but the trick to count all nodes ended up in:
xquery> more> fn:count(doc("planet.osm")//*) more>MAPI = monetdb@localhost:50000 QUERY = fn:count(doc("planet.osm")//*) ERROR = !ERROR: [remap]: 5 times inserted nil due to errors at tuples 1@0, 2@0, 3@0, 4@0, 5@0. !ERROR: [remap]: first error was: !ERROR: CMDremap: operation failed. !ERROR: interpret_unpin: [remap] bat=492,stamp=-729 OVERWRITTEN !ERROR: BBPdecref: 1000000001_rid_nid does not have pointer fixes. !ERROR: interpret_params: leftfetchjoin(param 2): evaluation error.
(As posted to the bugtracker and privately to the lecturer.) My first guess was that the 1..5 maybe had something to do with the 'slackspace' that was five also. Today I imported the same document readonly without any problems. To confirm 'import-updating' in general works I took the data from only The Netherlands and imported this. (143MB) Which worked like a charm. I'm running MonetDB4/5 (SR3) on a Quad Xeon, 8GB memory. Storage is a bit variable ;) But currently on local raid5. Are there any suggestions from the developers to catch this bug? Upgrading to CVS could be an option. Trying again with CVS is an option, trying again with a smaller document too, or if someone instantly comes up with a solution that could be nice too ;) Yours Sincerely, Stefan de Konink -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFHDXXIYH1+F2Rqwn0RCk/VAJwLpXp0Yl1ENToPMRFgIJy2No/BMgCdFv7C dGQzJgoGmwM9ucPxrohIkFk= =sHEr -----END PGP SIGNATURE-----
participants (1)
-
Stefan de Konink