[Monetdb-developers] thoughts on XQJ

15 Aug 2006

      Hi Fabian,

You report the following JDBC xquery error on an example from the XQJ spec:
...
monetdb-> import schema namespace foo='http://www.foo.com'
monetdb=> at 'http://www.fooschema.com';
monetdb=> for $i in (foo:hatsize(1),foo:hatsize(4)) return $i
Error: protocol violation, unexpected line: xquery_prepare: missing ';'
after module import.
monetdb->
this error message is indeed wrong: for some reason, the query cache
mechanism incorrectly classifies this query as importing a module.

Instead, you should have received the following error:

error in function application: at (3,12-3,25): reference to undefined
function `foo:hatsize'
# halted in /data/home/boncz/pathfinder/compiler/semantics/functions.c
(check_fun_usage), line 291

I think this example is wrong, apparently the XQuery system the XQJ
developers are using (DB2?) allows to import modules by just importing a
schema, or allows schemas to define methods, because 'hatsize()' is not a
built-in function, so it should be either defined in the XQuery itself or in
a module. It seems most likely that 'schema' should be replaced by 'module'.

The other question is how to handle prepared queries. It will be some work
to support this, but it can be done.

MonetDB/XQuery has a query cache that can accelerate queries if they consist
of a single function call. That function must be defined in a module.

Suppose an XQJ client has the query:

"declare variable $x as xs:integer external;
 for $n in fn:doc('catalog.xml')//item
 where $n/price <= $x
 return fn:data($n/name)"

At the prepare phase (either by a call to the PrepareQuery method, or
implicitly in the ExecuteQuery method), this should be translated to:

"module namespace tmpXXX_1 = "http://monetdb.cwi.nl/XQuery";

 function tmpXXX_1:preparedQuery($x as xs:integer) as xs:anyNode* 
 for $n in fn:doc('catalog.xml')//item
 where $n/price <= $x
 return fn:data($n/name)"

Here XXX is some session-ID, that should be known to the client. Notice that
the external variable declarations where "eaten away", and the query was
wrapped in a module function, which as signature has the externally declared
variables. The default return type is a sequence of items (xs:AnyNode*).

This query can be executed immediately. At server-side, we should have a
local directory, e.g. 'preparedXQuery/' available, and executing the module
query will just put it in the server-side file 'preparedXQuery/tmpXXX_1.xq'.

Now suppose variable $x is bound to 42 and the query is executed. XQJ sould
generate the following query:

"import module namespace tmpXXX_1='file://preparedXQuery/tmpXXX_1.xq';
 tmpXXX_1:preparedQuery(42)"

Note that in order to get the benefits of prepared queries, we do not need a
prepare-bind-execute mechanism. You just need to define your query as a
function in a module. The benefits can be very substantial, a factor 10 is
often observed on small queries.

But as you see, we can support prepare-bind-query with it as well. The
question is whether we should, because the scenario is still suboptimal in
case there are many sessions that use this same prepared query. All these
sessions will pay prepare costs once, instead of just the first. In that
sense, the module-approach is actually better.

So maybe we should just take a simplictic approach to variable substitutions
by generating the stand-alone query:

"let $x := 42
 return 
   for $n in fn:doc('catalog.xml')//item
   where $n/price <= $x
   return fn:data($n/name)"

I advise against trying to substitute $x in place, because lexical analysis
of xqueries (which may contain XML and comments) is difficult, so you do not
want to replicate all that in the XQJ layer at client-side.

Thus, in this approach, all external variable declarations are tranformed
into let-statements that define and binds the variables. Between the let's
and the query, a 'return' keyword should be inserted.

A final question is what should be done in sessions with multiple queries.
Supporting this would not be difficult if each query is executed in a single
transaction (autocommit on). Autocommit off is more work, because this means
we have to preserve the "working-set" (database state) between queries, and
postpone the commit until the last one in the session.

Multi-query scenarios do raise the question what to do when a query returns
XML nodes. If these values are fed as parameters into a next query, should
we copy by reference, or by value? Currently we serialize nodes, so you can
only get by value (i.e. nodes will take serialized form, which when passed
in comes down to sequence construction). Passing by reference adds the
implicit semantic that documents opened with fn:doc() stay "open" across
queries in a single transaction. Like autocommit off, this also requires the
working set to be preserved.

Peter

p.a.boncz

tags

participants (1)