[Monetdb-developers] thoughts on XQJ
Hi Fabian, You report the following JDBC xquery error on an example from the XQJ spec:
monetdb-> import schema namespace foo='http://www.foo.com' monetdb=> at 'http://www.fooschema.com'; monetdb=> for $i in (foo:hatsize(1),foo:hatsize(4)) return $i Error: protocol violation, unexpected line: xquery_prepare: missing ';' after module import. monetdb->
this error message is indeed wrong: for some reason, the query cache mechanism incorrectly classifies this query as importing a module. Instead, you should have received the following error: error in function application: at (3,12-3,25): reference to undefined function `foo:hatsize' # halted in /data/home/boncz/pathfinder/compiler/semantics/functions.c (check_fun_usage), line 291 I think this example is wrong, apparently the XQuery system the XQJ developers are using (DB2?) allows to import modules by just importing a schema, or allows schemas to define methods, because 'hatsize()' is not a built-in function, so it should be either defined in the XQuery itself or in a module. It seems most likely that 'schema' should be replaced by 'module'. The other question is how to handle prepared queries. It will be some work to support this, but it can be done. MonetDB/XQuery has a query cache that can accelerate queries if they consist of a single function call. That function must be defined in a module. Suppose an XQJ client has the query: "declare variable $x as xs:integer external; for $n in fn:doc('catalog.xml')//item where $n/price <= $x return fn:data($n/name)" At the prepare phase (either by a call to the PrepareQuery method, or implicitly in the ExecuteQuery method), this should be translated to: "module namespace tmpXXX_1 = "http://monetdb.cwi.nl/XQuery"; function tmpXXX_1:preparedQuery($x as xs:integer) as xs:anyNode* for $n in fn:doc('catalog.xml')//item where $n/price <= $x return fn:data($n/name)" Here XXX is some session-ID, that should be known to the client. Notice that the external variable declarations where "eaten away", and the query was wrapped in a module function, which as signature has the externally declared variables. The default return type is a sequence of items (xs:AnyNode*). This query can be executed immediately. At server-side, we should have a local directory, e.g. 'preparedXQuery/' available, and executing the module query will just put it in the server-side file 'preparedXQuery/tmpXXX_1.xq'. Now suppose variable $x is bound to 42 and the query is executed. XQJ sould generate the following query: "import module namespace tmpXXX_1='file://preparedXQuery/tmpXXX_1.xq'; tmpXXX_1:preparedQuery(42)" Note that in order to get the benefits of prepared queries, we do not need a prepare-bind-execute mechanism. You just need to define your query as a function in a module. The benefits can be very substantial, a factor 10 is often observed on small queries. But as you see, we can support prepare-bind-query with it as well. The question is whether we should, because the scenario is still suboptimal in case there are many sessions that use this same prepared query. All these sessions will pay prepare costs once, instead of just the first. In that sense, the module-approach is actually better. So maybe we should just take a simplictic approach to variable substitutions by generating the stand-alone query: "let $x := 42 return for $n in fn:doc('catalog.xml')//item where $n/price <= $x return fn:data($n/name)" I advise against trying to substitute $x in place, because lexical analysis of xqueries (which may contain XML and comments) is difficult, so you do not want to replicate all that in the XQJ layer at client-side. Thus, in this approach, all external variable declarations are tranformed into let-statements that define and binds the variables. Between the let's and the query, a 'return' keyword should be inserted. A final question is what should be done in sessions with multiple queries. Supporting this would not be difficult if each query is executed in a single transaction (autocommit on). Autocommit off is more work, because this means we have to preserve the "working-set" (database state) between queries, and postpone the commit until the last one in the session. Multi-query scenarios do raise the question what to do when a query returns XML nodes. If these values are fed as parameters into a next query, should we copy by reference, or by value? Currently we serialize nodes, so you can only get by value (i.e. nodes will take serialized form, which when passed in comes down to sequence construction). Passing by reference adds the implicit semantic that documents opened with fn:doc() stay "open" across queries in a single transaction. Like autocommit off, this also requires the working set to be preserved. Peter
participants (1)
-
p.a.boncz