[Monetdb-developers] trying to use monetdb pathfinder

Hi there, Just downloaded and installed the new release. Compiling and installing on my system went off without a hitch, my compliments. I've been trying to use pathfinder's xquery's support and I ran into some snags. It may be something I do wrong, but I'll ask anyway. First, I was testing the xquery support. I tried to shred a document like this: shred_doc('/home/faassen/xml/shaks/hamlet.xml', 'hamlet') and then to refer to this document from an xquery like this: doc('hamlet')//FM I get the following error: MAPI = monetdb@localhost:50000 QUERY = doc('hamlet')//FM ERROR = !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: shred: cannot stat `hamlet': No such file or directory !ERROR: CMDshred_url: operation failed. !ERROR: interpret_params: leftfetchjoin(param 2): evaluation error. Hinted by the 'cannot stat', I tried the following: doc('/home/faassen/xml/shaks/hamlet.xml')//FM and this gets a step further (still wondering why the former didn't work, though): MAPI = monetdb@localhost:50000 QUERY = doc('/home/faassen/xml/shaks/hamlet.xml')//FM ERROR = !ERROR: I/O warning : failed to load external entity "play.dtd" !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: shred_external_subset: WARNING: xmlParseDTD("play.dtd") FAILED, NO ID/IDREF QUERIES !ERROR: shred_external_subset: NOTE : maybe using absolute filenames works, sorry! !ERROR: CMDshred_url: operation failed. !ERROR: interpret_params: leftfetchjoin(param 2): evaluation error. The play.dtd is in the directory next to 'hamlet.xml', so don't know why that doesn't work either. So, I made a new hamlet2.xml removing the dtd reference. I don't need dtd support anyway, but it'd be nice if it could parse documents with dtds, of course. Now, it works. Next, I was trying out the xquery update support. Not hindered by any knowledge on how it works, I went to the referenced W3C document and adapted the first insert statement I saw there into this: do insert <year>2005</year> after fn:doc('/home/faassen/xml/shaks/hamlet2.xml')/PLAY/TITLE This however gives me an error message I do not understand: MAPI = monetdb@localhost:50000 QUERY = do insert <year>2005</year> after fn:doc('/home/faassen/xml/shaks/hamlet2.xml')/PLAY/TITLE ERROR = !type error: no variant of function upd:insertAfter accepts the given argument type(s): element TITLE { item* }*; (node* | node)* !type error: maybe you meant: !type error: upd:insertAfter (node, node*) as stmt !type error: illegal arguments for function upd:insertAfter What am I doing wrong there? Regards, Martijn

On Mon, Feb 05, 2007 at 09:08:59PM +0100, Martijn Faassen wrote:
shred_doc() is actually a deprecated interface to MonetDB/XQuery's document management. In the meantime we integrated document management into the query language itself, so you will no longer need to work with MIL for the document management and XQuery for the queries. The recommended way to load documents into MonetDB/XQuery is now the (XQuery) function pf:add-doc(): pf:add-doc (URI, alias) -> load document from URI and store it under the name alias (similar to shred_doc()) pf:add-doc (URI, alias, coll) -> same, but add document to the collection coll pf:add-doc (URI, alias, pct) -> leave pct % free space for future updates The latter two variants can have performance advantages depending on your scenario. Note that if you want to use shred_doc() in MIL, you need to specify both arguments as MIL strings, i.e., in double quotes.
Sorry, I currently don't know what could have gone wrong here. Anyone else on the list can give an answer? MonetDB/XQuery, by the way, uses DTDs to know about ID and IDREF attributes. The id() and idref() functions will only be supported if ID and IDREF attributes have been declared in the DTD. (Also, an additional index will be created to efficiently back id() and idref().)
The problem is that the expression `fn:doc(...)/PLAY/TITLE' could evaluate to a list of nodes. The `do insert ... after ...' clause, however, is only allowed for single nodes as the target expression. The Pathfinder XQuery compiler does static type checking. And if your query is not type-safe it will be rejected. If you are sure that your path evaluates to exactly one node, you can tell that to the compiler and it will (at least it should ;-)) accept your query: do insert ... after exactly-one (fn:doc (...)/PLAY/TITLE) Other way to make sure that the target expression evaluates to exactly one node include for $n in fn:doc (...)/PLAY/TITLE return do insert ... after $n I hope I could help you a bit. Don't hesitate to ask again. Jens -- Jens Teubner Technische Universitaet Muenchen, Department of Informatics D-85748 Garching, Germany Tel: +49 89 289-17259 Fax: +49 89 289-17263 XQuery processing at the speed of light: MonetDB/XQuery http://www.monetdb-xquery.org/ http://www.pathfinder-xquery.org/

Hi all, My real question is how to remove the added document from the system. pf:delete-doc does not seem to exist (?) My second question is how one could have known that the shred_doc interface is deprecated? It is not mentioned on the web, and I cannot even find a test using pf:add-doc... maybe I am looking in the wrong place for documentation? Cheers, Arjen | On Mon, Feb 05, 2007 at 09:08:59PM +0100, Martijn Faassen wrote: | | > First, I was testing the xquery support. I tried to shred a document | > like this: | > | > shred_doc('/home/faassen/xml/shaks/hamlet.xml', 'hamlet') | | shred_doc() is actually a deprecated interface to MonetDB/XQuery's | document management. In the meantime we integrated document management | into the query language itself, so you will no longer need to work with | MIL for the document management and XQuery for the queries. | | The recommended way to load documents into MonetDB/XQuery is now the | (XQuery) function pf:add-doc(): | | pf:add-doc (URI, alias) -> load document from URI and store it under | the name alias (similar to shred_doc()) | | pf:add-doc (URI, alias, coll) -> same, but add document to the | collection coll | | pf:add-doc (URI, alias, pct) -> leave pct % free space for future | updates | | The latter two variants can have performance advantages depending on | your scenario. | | Note that if you want to use shred_doc() in MIL, you need to specify | both arguments as MIL strings, i.e., in double quotes. | | > doc('/home/faassen/xml/shaks/hamlet.xml')//FM | > | > and this gets a step further (still wondering why the former didn't | > work, though): | > | > MAPI = monetdb@localhost:50000 | > QUERY = doc('/home/faassen/xml/shaks/hamlet.xml')//FM | > ERROR = !ERROR: I/O warning : failed to load external entity "play.dtd" | > !ERROR: [shred_url]: 1 times inserted nil due to errors at | > [...] | > | > The play.dtd is in the directory next to 'hamlet.xml', so don't know why | > that doesn't work either. | | Sorry, I currently don't know what could have gone wrong here. Anyone | else on the list can give an answer? | | MonetDB/XQuery, by the way, uses DTDs to know about ID and IDREF | attributes. The id() and idref() functions will only be supported if ID | and IDREF attributes have been declared in the DTD. (Also, an | additional index will be created to efficiently back id() and idref().) | | > Next, I was trying out the xquery update support. Not hindered by any | > knowledge on how it works, I went to the referenced W3C document and | > adapted the first insert statement I saw there into this: | > | > do insert <year>2005</year> | > after fn:doc('/home/faassen/xml/shaks/hamlet2.xml')/PLAY/TITLE | > | > This however gives me an error message I do not understand: | > | > MAPI = monetdb@localhost:50000 | > QUERY = do insert <year>2005</year> | > after fn:doc('/home/faassen/xml/shaks/hamlet2.xml')/PLAY/TITLE | > ERROR = !type error: no variant of function upd:insertAfter accepts the | > given argument type(s): element TITLE { item* }*; (node* | node)* | > !type error: maybe you meant: | > !type error: upd:insertAfter (node, node*) as stmt | > !type error: illegal arguments for function upd:insertAfter | | The problem is that the expression `fn:doc(...)/PLAY/TITLE' could | evaluate to a list of nodes. The `do insert ... after ...' clause, | however, is only allowed for single nodes as the target expression. | | The Pathfinder XQuery compiler does static type checking. And if your | query is not type-safe it will be rejected. If you are sure that your | path evaluates to exactly one node, you can tell that to the compiler | and it will (at least it should ;-)) accept your query: | | do insert ... after exactly-one (fn:doc (...)/PLAY/TITLE) | | Other way to make sure that the target expression evaluates to exactly | one node include | | for $n in fn:doc (...)/PLAY/TITLE return | do insert ... after $n | | I hope I could help you a bit. Don't hesitate to ask again. | | Jens | | -- | Jens Teubner | Technische Universitaet Muenchen, Department of Informatics | D-85748 Garching, Germany | Tel: +49 89 289-17259 Fax: +49 89 289-17263 | | XQuery processing at the speed of light: MonetDB/XQuery | http://www.monetdb-xquery.org/ http://www.pathfinder-xquery.org/ | | ------------------------------------------------------------------------- | Using Tomcat but need to do more? Need to support web services, security? | Get stuff done quickly with pre-integrated technology to make your job easier. | Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo | http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 | _______________________________________________ | Monetdb-developers mailing list | Monetdb-developers@lists.sourceforge.net | https://lists.sourceforge.net/lists/listinfo/monetdb-developers |

On Thu, Feb 08, 2007 at 07:14:07PM +0100, Arjen P. de Vries wrote:
pf:del-doc()
My second question is how one could have known that the shred_doc interface is deprecated?
... in fact, though deprecated, MIL shred_doc() should still work (as well as MIL delete_doc() & MIL delete_all_docs()): ======= $ Mserver --dbinit='module(pathfinder);' # Monet Database Server V4.16.1 # Copyright (c) 1993-2007, CWI. All rights reserved. # Compiled for x86_64-redhat-linux-gnu/64bit with 64bit OIDs; dynamically linked. # Visit http://monetdb.cwi.nl/ for further information. MonetDB>sigs("pathfinder"); #---------------------------------------------------------------------------------# # signature # name # str # type #---------------------------------------------------------------------------------# [ "delete_all_docs(bit) : void" ] [ "delete_doc(BAT[void,str]) : void" ] [ "shred_doc(BAT[void,str], BAT[void,str], BAT[void,str], BAT[void,lng]) : void" ] [ "shred_doc(str, str) : void" ] [ "shred_doc(str, str, str, lng) : void" ] [ "xmlcache_add_rule(str, any) : void" ] [ "xmlcache_del_rule(str) : void" ] [ "xmlcache_print() : void" ] [ "xmlcache_print_rules() : void" ] [ "xmldb_print() : void" ] [ "xquery(mode str, xquery str, is_url bit) : str " ] [ "xquery_frontend() : ptr" ] [ "xquery_start_query_cache(lng) : void" ] MonetDB>help("shred_doc"); PROC: shred_doc(str, str) : void MODULE: pathfinder COMPILED: by rittinge on Oct 2006 PARAMETERS: - str location: URI containing the xml document to be shredded) - str name: document name ('alias') in database DESCRIPTION: Shred single xml document to the internal Pathfinder format. (Leave no free pages and do not relate it to a collection.) PROC: shred_doc(str, str, str, lng) : void MODULE: pathfinder COMPILED: by rittinge on Oct 2006 PARAMETERS: - str location: URI refering to the xml documents to be shredded) - str name: document name ('alias') in the database - str colname: collection name ('alias') in the database - lng pageFree: percentage of pages left free in the database DESCRIPTION: Shred single xml documents to the internal Pathfinder format. PROC: shred_doc(BAT[void,str], BAT[void,str], BAT[void,str], BAT[void,lng]) : void MODULE: pathfinder COMPILED: by rittinge on Oct 2006 PARAMETERS: - BAT[void,str] locations: URIs refering to the xml documents to be shredded) - BAT[void,str] names: document names ('alias') in the database - BAT[void,str] colnames: collection names ('alias') in the database - BAT[void,lng] pageFrees: percentage of pages left free in the database DESCRIPTION: Shred multiple xml documents to the internal Pathfinder format. MonetDB> ========
It is not mentioned on the web, and I cannot even find a test using pf:add-doc... maybe I am looking in the wrong place for documentation?
more documentation is pending --- please bear with us, also we have only 24h per day (25 is we skip lunch) ... Stefan
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |

Another question - regarding the read-write vs. read-only mode: * Can I only turn on read-write mode by giving a % of free pages to pf:add-doc? * What is the recommended %; and, does it hurt to give 0% if I do append-only?
I know I know, but still... maybe to document first, and to deprecate second would be a better order of processing things over (no matter how limited) time! A. ==================================================================== CWI, room C1.16 Centre for Mathematics and Computer Science Kruislaan 413 Email: Arjen.de.Vries@cwi.nl 1098 SJ Amsterdam tel: +31-(0)20-5924306 The Netherlands fax: +31-(0)20-5924312 ===================== http://www.cwi.nl/~arjen/ ====================

On Thu, Feb 08, 2007 at 10:07:13PM +0100, Arjen P. de Vries wrote:
AFAIK, read-write mode is only activated by loading/shredding documents (via pf:add-doc() in XQuery or shred_doc() in MIL) with a fre % > 0.
* What is the recommended %; and, does it hurt to give 0% if I do append-only?
I cannot give any qualified recoomendation about the "optimal" %, yet(?); most probably depends on your expected workload; 10% seems a "nice" value to start with... 0% measn read-only; hence (AFAIK) also no appends are allowed/possible.
As said, shred_doc() in MIL does still work (and should be functionally equivalent to pf:add-doc() in XQuery --- AFAIK, they use the very same implementation). shred_doc() is "deprecated" as we plan to encourage users to stay in the XQuery domain, only, also for document management (instead of falling back to MIL for that) --- since W3C has not (yet?) specified any "real" document management functionality in XQuery (except from say fn:doc() and fn:collection()) we invented our own pf:add-doc() and pf:del-doc(). (shred_doc() is btw. documented see `help("shred_doc()");` and/or my previous posting on this thread.) I agree that we should add the documentation for our XQuery document management extensions ASAP ... (but documenting first and implementing then is also no good option...) Stefan
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |

per day (25 is we skip lunch) ... I know I know, but still...
Stefan Manegold wrote: please, respect Stefan's supper.
The responsibility for the XQuery documentation lies with Peter, and those directly responsible for the XQuery toolset development. Peter announced several weeks ago that he would take care of the current XQuery documentation on the web-site, but as his checkins show, stability comes before documentation, let alone giving the product to users living at the edge. Perhaps Peter can give you better advice (and concurrently document ;-))

On Mon, Feb 05, 2007 at 09:08:59PM +0100, Martijn Faassen wrote:
Hi Martijn, It seems that the "shred_doc" operation did not succeed. Is the above line with "shred_doc" a literal copy of what you entered in Mserver? Then you need to add the ';' sign at the end of the line: shred_doc('/home/faassen/xml/shaks/hamlet.xml', 'hamlet'); Note that a MIL statement needs to be terminated by a ';' sign. Otherwise, Mserver will wait for more input. If "shred_doc" succeeded, you should get some information like the following: MonetDB>shred_doc("/ufs/zhang/xrpcdemo/hello.xml", "hello.xml"); # Elapsed time = 010ms 671us [002ms 667us/node] # Shredded 1 XML document (hello.xml), total time after commit=0.065s Regards, Jennie
participants (7)
-
Arjen P. de Vries
-
Arjen P. de Vries
-
Jens Teubner
-
Martijn Faassen
-
Martin Kersten
-
Stefan Manegold
-
Ying Zhang