Hi Jan, Thanks for the info. Actually I see the two solutions you propose as non-mutually exclusive. One thing is to be able to cache remote DTDs. Another thing is whether I want them to be processed at all, regardless of where they are located. I think it would be very useful to add an option at XQuery level to implement your second suggestion dynamically. Cheers, Roberto On Wed, 2010-04-14 at 22:02 +0200, Jan Rittinger wrote:
Hi Roberto,
the shredding of multiple documents is solved by a batloop on the MIL level calling the xml shredder for each document separately. The DTD lookup is performed by libxml2 (not by Pathfinder).
In my eyes there currently exists no real solution to your problem. But you might think about the following two work-arounds:
- use a proxy or an entry in /etc/hosts to reference a local copy of the DTD, or
- disable the DTD loading in the shredder (pathfinder/runtime/shredder.mx).
, .externalSubset = shred_external_subset < , .externalSubset = 0
Jan
On Apr 14, 2010, at 17:58, Roberto Cornacchia wrote:
Hi all,
I wonder is there is an option to make MonetDB/XQuery shredder ignore DTD files. What triggered this question is the need to shred about 20M files, each of them including *the same* online DTD. What happens is that, for each xml file, the DTD is downloaded and shredded/used(?). It is not even cached (I tried using xquery_cacherules, but apparently it has no effect on DTDs).
This makes shredding speed drop down to more than 3 seconds per document, which is not an option of course, as it would take me roughly 700 days to shred the entire collection :)
Removing DTD link from the original xml files is not an option either, the collection is simply too big to be replicated.
How can the DTD be ignored?
Roberto -- | M.Sc. Roberto Cornacchia | CWI (Centrum voor Wiskunde en Informatica) | Science Park 123, 1098XG Amsterdam, The Netherlands | tel: +31 20 592 4322 , http://www.cwi.nl/~roberto
------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | M.Sc. Roberto Cornacchia | CWI (Centrum voor Wiskunde en Informatica) | Science Park 123, 1098XG Amsterdam, The Netherlands | tel: +31 20 592 4322 , http://www.cwi.nl/~roberto