[Monetdb-developers] adding documents to Monetdb
Hello Everyone, Is it possible to load the XML documents to MonetDb using a java application? I have been using Shred_doc() from the command line to load the documents. Is there any other method using JDBC. Thank you, Swati --------------------------------------------- Swati Tata Research Assistant Advanced Genetic Technology Center (AGTC) University of Kentucky Room# 222 Plant Science Building, 1405 Veterans Drive Lexington, KY 40546-0312 Ph: (859)257-7445 x80740 Email: swtata@uky.edu
Hi, Unfortunately this is not possible at the moment, but it is one of my biggest "feature requests" for the MonetDB/XQuery people to implement as you can understand. I'm sorry! On 11-03-2006 10:43:36 -0500, Swati Tata wrote:
Hello Everyone,
Is it possible to load the XML documents to MonetDb using a java application? I have been using Shred_doc() from the command line to load the documents. Is there any other method using JDBC.
Small update: I have been pointed at a possibility that I didn't know the existence of: the doc(url) function in XQuery. This function, called from XQuery mode, should do the same like shred_doc() from the command line. However, it wants an URL as argument, so it is impossible to load a string or stream from the client. Maybe this helps! On 12-03-2006 00:04:10 +0100, Fabian Groffen wrote:
Hi,
Unfortunately this is not possible at the moment, but it is one of my biggest "feature requests" for the MonetDB/XQuery people to implement as you can understand.
I'm sorry!
On 11-03-2006 10:43:36 -0500, Swati Tata wrote:
Hello Everyone,
Is it possible to load the XML documents to MonetDb using a java application? I have been using Shred_doc() from the command line to load the documents. Is there any other method using JDBC.
On Thu, Mar 16, 2006 at 09:41:35AM +0100, Fabian Groffen wrote:
I have been pointed at a possibility that I didn't know the existence of: the doc(url) function in XQuery. This function, called from XQuery mode, should do the same like shred_doc() from the command line. However, it wants an URL as argument, so it is impossible to load a string or stream from the client.
The XQuery fn:doc() function does almost the same as (the MIL procedure) shred_doc(). The MIL procedure shred_doc() is meant to *persistently* store XML documents in the MonetDB repository and at the same time give them an alias for easy reference via fn:doc() in XQuery expressions. Whenever MonetDB/XQuery encounters an URI as the argument to fn:doc() that is *not* an alias for a persistent document, it will shred that document on the fly before the actual query execution starts. Such documents will be *cached* in the MonetDB repository for later re-use. So if you reference an XML document with fn:doc() the first time, the system might take some time to shred it. A second call, however, will re-use the cached version and start query execution immediately. Both commands, shred_doc() and fn:doc(), by the way, accept URIs as their input only. There are basically two differences between cached and persistent documents: o Only a limited amount of memory will be reserved for cached documents, and "older" documents will be removed from the cache if that memory is used up. o Upon each access to a cached document, MonetDB will try to figure out if the document has changed since caching it. In particular, it verifies file size and timestamp for documents in the local filesystem. (I don't know, actually, what the caching system does for documents from the network.) Swati, I hope this helps for your needs. As Fabian mentioned, a means to shred XML documents via XQuery and/or JDBC is on our wishlist. But we cannot make any promises on when that wish will come true... Regards Jens -- Jens Teubner Technische Universitaet Muenchen, Department of Informatics D-85748 Garching, Germany Tel: +49 89 289-17259 Fax: +49 89 289-17263 Things are pretty mixed up, but I think the worst is over. -- TeX Error Message
Fabian, Jens, thanks you very much for providing Swati (and all other readers of this list) with the detailed information. Swati (and others), here's the "missing" bit:
o Upon each access to a cached document, MonetDB will try to figure out if the document has changed since caching it. In particular, it verifies file size and timestamp for documents in the local filesystem. (I don't know, actually, what the caching system does for documents from the network.)
======== # Monet Database Server V4.10.3 # Copyright (c) 1993-2006, CWI. All rights reserved. # Compiled for x86_64-redhat-linux-gnu/64bit with 64bit OIDs; dynamically linked. # Visit http://monetdb.cwi.nl/ for further information. MonetDB>module(pathfinder); MonetDB>sigs("pathfinder"); #---------------------------------------------------------# # signature # name # str # type #---------------------------------------------------------# [ "delete_all_docs(bit) : void" ] [ "delete_doc(str) : void" ] [ "pfstart(bit, lng) : void" ] [ "shred_doc(str, str) : void" ] [ "xmlcache_add_rule(str, any) : void" ] [ "xmlcache_del_rule(str) : void" ] [ "xmlcache_print() : void" ] [ "xmlcache_print_rules() : void" ] [ "xmldb_print() : void" ] [ "xquery(mode str, xquery str, is_url bit) : str " ] [ "xquery_server(Stream in, Stream out) : void " ] [ "xquery_start_query_cache(lng) : ptr" ] MonetDB>help("xmlcache_add_rule"); PROC: xmlcache_add_rule(str, lng) : void no text available PROC: xmlcache_add_rule(str, any) : void MODULE: pathfinder COMPILED: by boncz on May 2005 DESCRIPTION: add a new URI lifetime rule. The XML document cache keeps indexed copies of documents that where recently used in the fn:doc(URI) xquery function. The size of the cache is controlled using the 'xquery_cacheMB' setting in the 'MonetDB.conf' file. For file URIs, the cache looks at the last-modification-time of the file on disk to guarantee that the cached document is still up-to-date for answering queries from. For other URIs, *lifetime rules* determine how long documents can stay in the cache. Each lifetime rule consists of a URI prefix and the registered seconds of lifetime. The rule with longest prefix that matches an URI counts. Specifying a lifetime of 'int(nil)' seconds means that the URI will *not* be cached at all. This is also the default if no prefix matches an URI. The name of a cached document is the same as its location (URI). For explicitly shredded documents (with 'shred_doc(location,name)'), the name is an 'alias' and may differ from the URI. Explicitly shredded documents fall outside the XML document cache; documents are only removed at explicit user request (with 'delete_doc(name)'). MonetDB> ======== Cheers, Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
participants (4)
-
Fabian Groffen
-
Jens Teubner
-
Stefan Manegold
-
Swati Tata