[MonetDB-users] [newbie] Loading a directory of XML files into a collection
Hi, I have a directory containing XML files, all sharing a common structure. I want to load them all into a collection so I can run queries against them using the XQuery interface in MonetDB4. The tutorials give useful examples that show how to load a single file into a collection, but I need an automated process to load all of the XML files I have. There are over 60,000 files, currently spread over 16 directories to make them more managable. I've been looking around on the Web, but I haven't found any XQuery examples that do what I want. Is it possible? If so, can anyone give an example to show me how to load my data? Thanks, Phil -- View this message in context: http://www.nabble.com/-newbie--Loading-a-directory-of-XML-files-into-a-colle... Sent from the monetdb-users mailing list archive at Nabble.com.
Hi Phil,
I'm not aware of any XQuery standard functionality to solve your problem.
However, a simple work-around could be to create an XML file (say
"docs.xml") that lists for each doument you need to load its location (URL /
absolute filename) and desired name in the collection, e.g., as follows
<docs>
<doc loc="...URL-1..." name="...name-1..."/>
...
<doc loc="...URL-n..." name="...name-n..."/>
</docs>
an then query this XML file using XQuery to load all document in a single
collection:
for $doc in fn:doc("[abolute_path_to]docs.xml")/docs/doc
return pf:add-doc($doc/@loc, $doc/@name, "MyCollection")
That should be it.
On you favorite unix-like system, you could easily create the docs.xml,
e.g., by executing the following shell (bash) commands in the top-level
directory of your document collection, asuming all files are called *.xml
and all filenames are unique
echo "<docs>" > /tmp/docs.xml
for i in `find $PWD -name \*.xml` ; do
echo "
Hi, I have a directory containing XML files, all sharing a common structure. I want to load them all into a collection so I can run queries against them using the XQuery interface in MonetDB4.
The tutorials give useful examples that show how to load a single file into a collection, but I need an automated process to load all of the XML files I have. There are over 60,000 files, currently spread over 16 directories to make them more managable.
I've been looking around on the Web, but I haven't found any XQuery examples that do what I want. Is it possible? If so, can anyone give an example to show me how to load my data?
Thanks,
Phil
-- View this message in context: http://www.nabble.com/-newbie--Loading-a-directory-of-XML-files-into-a-colle... Sent from the monetdb-users mailing list archive at Nabble.com.
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
Stefan Manegold wrote:
an then query this XML file using XQuery to load all document in a single collection:
for $doc in fn:doc("[abolute_path_to]docs.xml")/docs/doc return pf:add-doc($doc/@loc, $doc/@name, "MyCollection")
That should be it.
Hi Stefan, I've created the doc.xml file, which now has the unique filename of every xml file in it. However, I tried to use the XQuery sample you provided and got an error from MonetDB. I'm using the MAPI client in XQuery mode to enter the code from the post quoted above. The query spans over 2 lines and is terminated using Ctrl-D (It's a Red Hat Linux server). Here is the error: QUERY = for $doc in fn:doc("/mnt/da/monetdb/docs.xml")/docs/doc ERROR = !type error: no variant of function pf:add-doc accepts the given argument type(s): string?; string?; string !type error: maybe you meant: !type error: pf:add-doc (string, string) as docmgmt !type error: pf:add-doc (string, string, string) as docmgmt !type error: pf:add-doc (string, string, integer) as docmgmt !type error: pf:add-doc (string, string, string, integer) as docmgmt !type error: illegal arguments for function pf:add-doc I'm not intimately familiar with XQuery - is there some sort of type conversion that needs to be performed on the values extracted from the XML file? Regards, Phil -- View this message in context: http://www.nabble.com/-newbie--Loading-a-directory-of-XML-files-into-a-colle... Sent from the monetdb-users mailing list archive at Nabble.com.
On Tue, Sep 11, 2007 at 06:59:09AM -0700, Philip Webster wrote:
Stefan Manegold wrote:
an then query this XML file using XQuery to load all document in a single collection:
for $doc in fn:doc("[abolute_path_to]docs.xml")/docs/doc return pf:add-doc($doc/@loc, $doc/@name, "MyCollection")
That should be it.
Hi Stefan,
I've created the doc.xml file, which now has the unique filename of every xml file in it. However, I tried to use the XQuery sample you provided and got an error from MonetDB. I'm using the MAPI client in XQuery mode to enter the code from the post quoted above. The query spans over 2 lines and is terminated using Ctrl-D (It's a Red Hat Linux server).
Here is the error:
QUERY = for $doc in fn:doc("/mnt/da/monetdb/docs.xml")/docs/doc ERROR = !type error: no variant of function pf:add-doc accepts the given argument type(s): string?; string?; string !type error: maybe you meant: !type error: pf:add-doc (string, string) as docmgmt !type error: pf:add-doc (string, string, string) as docmgmt !type error: pf:add-doc (string, string, integer) as docmgmt !type error: pf:add-doc (string, string, string, integer) as docmgmt !type error: illegal arguments for function pf:add-doc
sorry, a static typing issue; try for $doc in fn:doc("/mnt/da/monetdb/docs.xml")/docs/doc return pf:add-doc(exactly-one($doc/@loc), exactly-one($doc/@name), "MyCollection") instead. Stefan
I'm not intimately familiar with XQuery - is there some sort of type conversion that needs to be performed on the values extracted from the XML file?
Regards,
Phil
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
participants (2)
-
Philip Webster
-
Stefan Manegold