[MonetDB-users] Slow import of large XML documents
Hi, Today I tried to shred a XML document of 1.6 giga bytes using mclient and it took more than 3 hours (~150 kb/s) on a quite fast machine. I was a bit suprised. because for smaller documents shredding is quite fast. Are these numbers expected? I know that it can be hard to say something about this case without the document itself. So I'm currently working to anonymize the document, but what is the prefered way to get the document to the CWI? System information: - Fedora Core 10, x86_64, 32 GB RAM (24G cached/free), 5.5 GB disk free (after import) - MonetDB4 / XQuery v4.28.4, 64bit OIDs (RPM version) Commands: $ time mclient -p 50000 -l xq -s"pf:add-doc('/tmp/structure.xml', 'structure3.xml')" real 198m49.543s user 0m0.002s sys 0m0.003s [user@host tmp]$ ls -lah /tmp/structure.xml -rw-r--r-- 1 user user 1.6G 2009-06-08 16:26 /tmp/structure.xml Regards, John van Schie
On Tue, Jun 09, 2009 at 02:01:51PM +0200, John van Schie (DT) wrote:
Hi,
Today I tried to shred a XML document of 1.6 giga bytes using mclient and it took more than 3 hours (~150 kb/s) on a quite fast machine. I was a bit suprised. because for smaller documents shredding is quite fast. Are these numbers expected?
Not necessarily --- do you have any idea, how many nodes there are in your document?
I know that it can be hard to say something about this case without the document itself. So I'm currently working to anonymize the document, but what is the prefered way to get the document to the CWI?
Preferably, put it in a place where we (i.e., at least one of us, say Sjoerd or myself) could donwload it from. In case that is not possible and the compressed file is of reasonable size (say < 1 GB --- I don't know the exact limits of our or your mail server), email to one of us (Sjoerd or myself) would also be OK. Stefan
System information: - Fedora Core 10, x86_64, 32 GB RAM (24G cached/free), 5.5 GB disk free (after import) - MonetDB4 / XQuery v4.28.4, 64bit OIDs (RPM version)
Commands: $ time mclient -p 50000 -l xq -s"pf:add-doc('/tmp/structure.xml', 'structure3.xml')"
real 198m49.543s user 0m0.002s sys 0m0.003s [user@host tmp]$ ls -lah /tmp/structure.xml -rw-r--r-- 1 user user 1.6G 2009-06-08 16:26 /tmp/structure.xml
Regards,
John van Schie
------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
On Tue, Jun 09, 2009 at 02:01:51PM +0200, John van Schie (DT) wrote:
Hi,
Today I tried to shred a XML document of 1.6 giga bytes using mclient and it took more than 3 hours (~150 kb/s) on a quite fast machine. I was a bit suprised. because for smaller documents shredding is quite fast. Are these numbers expected?
Hello John, Have you maybe had a look at the system load during this time? Are there any other processes that were heavily using resources like disk, CPU? Kind regards, Jennie
I know that it can be hard to say something about this case without the document itself. So I'm currently working to anonymize the document, but what is the prefered way to get the document to the CWI?
System information: - Fedora Core 10, x86_64, 32 GB RAM (24G cached/free), 5.5 GB disk free (after import) - MonetDB4 / XQuery v4.28.4, 64bit OIDs (RPM version)
Commands: $ time mclient -p 50000 -l xq -s"pf:add-doc('/tmp/structure.xml', 'structure3.xml')"
real 198m49.543s user 0m0.002s sys 0m0.003s [user@host tmp]$ ls -lah /tmp/structure.xml -rw-r--r-- 1 user user 1.6G 2009-06-08 16:26 /tmp/structure.xml
Regards,
John van Schie
------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Ying Zhang wrote:
On Tue, Jun 09, 2009 at 02:01:51PM +0200, John van Schie (DT) wrote:
Hi,
Today I tried to shred a XML document of 1.6 giga bytes using mclient and it took more than 3 hours (~150 kb/s) on a quite fast machine. I was a bit suprised. because for smaller documents shredding is quite fast. Are these numbers expected?
Hello John,
Have you maybe had a look at the system load during this time? Are there any other processes that were heavily using resources like disk, CPU?
Kind regards,
Jennie
Jennie, I have repeated the insertion twice while monitoring the machine for load. During both insertions, the load was constantly [1] 1.00, corresponding to 100% CPU load of MonetDB. Still, both times the document insertion took ~98 minutes (half of the previous reported time), but still high. Especially with regard to the insertion of an anonymized variant of the same document, that takes about 2 minutes to insert. Regards, John [1] Checked 1 minuted intervals
participants (3)
-
John van Schie (DT)
-
Stefan Manegold
-
Ying Zhang