[Monetdb-developers] Large XML instances with MonetDB/XQuery 4.38.5
Hi everyone, is it still possible to shred large XML instances with MonetDB/XQuery, or do I need more memory for shredding? I was just trying to the 1GB instance of XMark with WinXP 32Bit and 2GB RAM, and this is what I got: MonetDB>shred_doc("1gb.xml", "xmark"); #GDKmmap(536870912) fails, try to free up space [memory in use=36143780,virtual memory in use=1015480320] #GDKmmap(536870912) result [mem=26016940,vm=1005387776] !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: HEAPextend: failed to extend to 536870912 for 06\632theap !ERROR: shredBAT_append_str: APPEND-STR[_prop_text]( follow murders prov shoes p resent eager delivered defy bend viol here marquis powers complexion pillar cran ts agate ), BUNappend fails !ERROR: CMDshred_url: operation failed. pf:add-doc(...) produces the same output; the XML document itself is well-formed. Thanks, Christian
To continue… Do you have some information on the current theoretical
limits for storing XML instances in MonetDB/XQuery?
Thanks,
Christian
On Fri, Oct 1, 2010 at 7:13 PM, Christian Grün
Hi everyone,
is it still possible to shred large XML instances with MonetDB/XQuery, or do I need more memory for shredding? I was just trying to the 1GB instance of XMark with WinXP 32Bit and 2GB RAM, and this is what I got:
MonetDB>shred_doc("1gb.xml", "xmark"); #GDKmmap(536870912) fails, try to free up space [memory in use=36143780,virtual memory in use=1015480320] #GDKmmap(536870912) result [mem=26016940,vm=1005387776] !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: HEAPextend: failed to extend to 536870912 for 06\632theap !ERROR: shredBAT_append_str: APPEND-STR[_prop_text]( follow murders prov shoes p resent eager delivered defy bend viol here marquis powers complexion pillar cran ts agate ), BUNappend fails !ERROR: CMDshred_url: operation failed.
pf:add-doc(...) produces the same output; the XML document itself is well-formed.
Thanks, Christian
On Fri, Oct 01, 2010 at 09:35:26PM +0200, Christian Grün wrote:
To continue… Do you have some information on the current theoretical limits for storing XML instances in MonetDB/XQuery?
The theoretical and pratical limit are the available address space and free disk space on your system, as well as the vulnerability of your OS to address space fragmentation. In practice, not only the plain serialized document size is a factor, but also the number of nodes in your document(s). In case of doubt, we recomment to use a 64-bit version of MonetDB (on a 64-bit system); cf., http://monetdb.cwi.nl/XQuery/Documentation/Scalability.html Stefan
Thanks, Christian
On Fri, Oct 1, 2010 at 7:13 PM, Christian Grün
wrote: Hi everyone,
is it still possible to shred large XML instances with MonetDB/XQuery, or do I need more memory for shredding? I was just trying to the 1GB instance of XMark with WinXP 32Bit and 2GB RAM, and this is what I got:
MonetDB>shred_doc("1gb.xml", "xmark"); #GDKmmap(536870912) fails, try to free up space [memory in use=36143780,virtual memory in use=1015480320] #GDKmmap(536870912) result [mem=26016940,vm=1005387776] !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: HEAPextend: failed to extend to 536870912 for 06\632theap !ERROR: shredBAT_append_str: APPEND-STR[_prop_text]( follow murders prov shoes p resent eager delivered defy bend viol here marquis powers complexion pillar cran ts agate ), BUNappend fails !ERROR: CMDshred_url: operation failed.
pf:add-doc(...) produces the same output; the XML document itself is well-formed.
Thanks, Christian
------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |
Thanks for the quick feedback, Stefan,
In case of doubt, we recomment to use a 64-bit version of MonetDB (on a 64-bit system); cf., http://monetdb.cwi.nl/XQuery/Documentation/Scalability.html
On this page, I found the information that "The default 64-bits MonetDB/XQuery binaries are built with 32-bits object identifiers (OIDs)". I noticed that the "--disable-oid32" isn't recognized by monetdb-install.sh; but this is probably negligible, as my Mserver instance shows me the following information: # MonetDB Server v4.38.5 # based on GDK v1.38.5 # release Jun2010-SP2 # Copyright (c) 1993-July 2008, CWI. All rights reserved. # Copyright (c) August 2008-2010, MonetDB B.V.. All rights reserved. # Compiled for x86_64-unknown-linux-gnu/64bit with 64bit OIDs; dynamically linked. …so I guess 64bit is now default? Next, I've now successfully created an 1GB XMark instance on a 64bit Linux machine with 32GB RAM; but when I try to shred an 11GB instance, I get the following output:
mclient -tlx pfadddoc.xq #GDKmmap(1336803328) fails, try to free up space [memory in use=33105304,virtual memory in use=36277911552] #GDKmmap(1336803328) result [mem=32625976,vm=35689922560] #GDKmmap: recovery ok. Continuing.. #GDKmmap(1336803328) fails, try to free up space [memory in use=32626608,virtual memory in use=36971282432] #GDKmmap(1336803328) result [mem=23481048,vm=36963221504] MAPI = monetdb@localhost:50000 QUERY = pf:add-doc("11gb.xml", "xmark") ERROR = !ERROR: HEAPalloc: Insufficient space for HEAP of 1336803328 bytes. !ERROR: CMDleftjoin: operation failed. !ERROR: interpret_params: reverse(param 1): evaluation error. Timer 720853.104 msec
Do you have an idea how to tweak MonetDB, or my system, to parse documents of that size? Currently, no other processes are running on the machine, and space on hard disk should be enough as well. Thanks again, Christian
On Sat, Oct 02, 2010 at 01:20:50AM +0200, Christian Grün wrote:
Thanks for the quick feedback, Stefan,
In case of doubt, we recomment to use a 64-bit version of MonetDB (on a 64-bit system); cf., http://monetdb.cwi.nl/XQuery/Documentation/Scalability.html
On this page, I found the information that "The default 64-bits MonetDB/XQuery binaries are built with 32-bits object identifiers (OIDs)". I noticed that the "--disable-oid32" isn't recognized by monetdb-install.sh; but this is probably negligible, as my Mserver instance shows me the following information:
monetdb-install.sh is to build from a source tar ball on unix-like systems; i.e., is not at all related to the binary builds we provide for Windows. The default is (and has always been) to build with 32-bit OIDs on 32-bit systems and with 64-it OIDs on 64-bit systems. Alternatively, there is an option to build with 32-bit OIDs on 64-bit systems to get a slightly smaller footprint / memory requirement --- at the expense of scalability: BATs (tables) can then contain at most 2^31 BUNs (tuples) rather than 2^63. You can enable that option by running ("plain") configure with "--enable-oid32" (see `monetdb-install.sh --devhelp` for expert help on monetdb-install.sh). We did once decide to build the 64-bit Windows installer with that option to cope with 64-bit Windows machines with (rather) smalle amounts of physical memory and their inherent vulnarability to address space fragmentation. We are currently considering to also provide the 64-bit Windows build with default 64-bit OIDs in the future.
# MonetDB Server v4.38.5 # based on GDK v1.38.5 # release Jun2010-SP2 # Copyright (c) 1993-July 2008, CWI. All rights reserved. # Copyright (c) August 2008-2010, MonetDB B.V.. All rights reserved. # Compiled for x86_64-unknown-linux-gnu/64bit with 64bit OIDs; dynamically linked.
…so I guess 64bit is now default?
As said above, not only now.
Next, I've now successfully created an 1GB XMark instance on a 64bit Linux machine with 32GB RAM; but when I try to shred an 11GB instance, I get the following output:
mclient -tlx pfadddoc.xq #GDKmmap(1336803328) fails, try to free up space [memory in use=33105304,virtual memory in use=36277911552] #GDKmmap(1336803328) result [mem=32625976,vm=35689922560] #GDKmmap: recovery ok. Continuing.. #GDKmmap(1336803328) fails, try to free up space [memory in use=32626608,virtual memory in use=36971282432] #GDKmmap(1336803328) result [mem=23481048,vm=36963221504] MAPI = monetdb@localhost:50000 QUERY = pf:add-doc("11gb.xml", "xmark") ERROR = !ERROR: HEAPalloc: Insufficient space for HEAP of 1336803328 bytes. !ERROR: CMDleftjoin: operation failed. !ERROR: interpret_params: reverse(param 1): evaluation error. Timer 720853.104 msec
Your Mserver is already using ~36 GB (virtual) address space --- I haven't recently loaded an 11 GB XMark doc., but that might be reasonable ---, and then (unexpectedly?) fails to allocate an other ~1.3 GB. How much free disk space it there on the partition that holds your dbfarm?
Do you have an idea how to tweak MonetDB, or my system, to parse documents of that size? Currently, no other processes are running on the machine, and space on hard disk should be enough as well.
What does "space on hard disk should be enough as well" mean? Is there (considerably) more than 36 GB free (on the partition that holds your dbfarm)? if so, please consider filing a bug report via http://bugs.monetdb.org/ Stefan
Thanks again, Christian
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |
The default is (and has always been) to build with 32-bit OIDs on 32-bit systems and with 64-it OIDs on 64-bit systems. [...] "--enable-oid32" (see `monetdb-install.sh --devhelp` for expert help on monetdb-install.sh).
Perfect, I got it.
What does "space on hard disk should be enough as well" mean? Is there (considerably) more than 36 GB free (on the partition that holds your dbfarm)?
80 GB of free space is left; do you think that's enough? Christian
Hi Christian, quick and simple answer, on a 32-bit Windows system, you are running out of address space, which is in practice limited to 2GB or at most 3GB if you configured your system accordingly (cf., http://monetdb.cwi.nl/XQuery/Documentation/Scalability.html http://zone.ni.com/reference/en-XX/help/371361D-01/lvhowto/enable_lrg_ad_awa... ), and know to be very vulnarable to address space fragmentation. I haven't tried it recently, but you might be able to load the 1 GB XMark document on a 32-bit Unix-like system. A 64-bit build of MonetDB/XQuery (on a 64-bit system) should not have any problems loading a 1 GB XMark document. Stefan On Fri, Oct 01, 2010 at 07:13:53PM +0200, Christian Grün wrote:
Hi everyone,
is it still possible to shred large XML instances with MonetDB/XQuery, or do I need more memory for shredding? I was just trying to the 1GB instance of XMark with WinXP 32Bit and 2GB RAM, and this is what I got:
MonetDB>shred_doc("1gb.xml", "xmark"); #GDKmmap(536870912) fails, try to free up space [memory in use=36143780,virtual memory in use=1015480320] #GDKmmap(536870912) result [mem=26016940,vm=1005387776] !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: HEAPextend: failed to extend to 536870912 for 06\632theap !ERROR: shredBAT_append_str: APPEND-STR[_prop_text]( follow murders prov shoes p resent eager delivered defy bend viol here marquis powers complexion pillar cran ts agate ), BUNappend fails !ERROR: CMDshred_url: operation failed.
pf:add-doc(...) produces the same output; the XML document itself is well-formed.
Thanks, Christian
------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |
participants (2)
-
Christian Grün
-
Stefan Manegold