
-----Original Message----- From: Sjoerd Mullender [mailto:sjoerd@acm.org] Sent: 24 May 2007 18:48 To: monetdb-users@lists.sourceforge.net Cc: M.Clements@cwi.nl Subject: Re: [MonetDB-users] MonetDb-XQuery problem
On 05/24/2007 06:27 PM, Stefan Manegold wrote:
On Thu, May 24, 2007 at 06:15:43PM +0200, Maarten Clements wrote:
-----Original Message----- From: Stefan Manegold [mailto:Stefan.Manegold@cwi.nl] Sent: 24 May 2007 18:00 To: M.Clements@cwi.nl Cc: monetdb-users@lists.sourceforge.net Subject: Re: [MonetDB-users] MonetDb-XQuery problem
Using the following XQuery in MonetDb (WinXP) crashes on large documents
On Thu, May 24, 2007 at 05:07:03PM +0200, Maarten Clements wrote: the usual questions:
1) which version of MonetDB XQuery are you using? The latest, from sjoerd's directory
so I guess, this is MonetDB 4.17.1 / MonetDB/XQuery 0.17.1, right?
Yep, and not terribly recent (at least several weeks).
2) what exactly does "crash" mean? does the Mserver simply stop? Is the any error message? Does Mserver grow/excessively use resources before crashing? ...? It fills up my memory and after doing that Mserver stops.
Ok.
3) what does "large" and "small" mean wrt., your document sizes? (serialized file size in byte? number of /collection/doc nodes in either document? ...?) The problem occurs with C1 = 59Mb, C2 = 7Kb (but I didn't search for the critical sizes:) Both files contain movie data: C1 is IMDB data with nodes: title,year,tag1,tag2...tagx (im not sure how many movies) C2 is Netflix data with nodes: title, year (100 movies)
count(doc("C1.xml")/collection/doc) ? count(doc("C2.xml")/collection/doc) ?
count(doc("C1.xml")/collection/doc) = 245 count(doc("C2.xml")/collection/doc) = 100 Works count(doc("C1.xml")/collection/doc) = 221097 count(doc("C2.xml")/collection/doc) = 100 Fails
for both the size that wokr for you, and that that does not
(just one
for each, no need to find the switching point)
4) how much memory does your machine have? 1 Gb
(when C1 and C2 are small it works ok):
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/year > 1950) and ($b/title = $top/title) return <movie> {$b/year} </movie> } </col> If I remove one of the two conditions it works perfectly, e.g.:
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/title = $top/title) return <movie> {$b/year} </movie> } </col> Can anyone explain this problem? most probably, the more complex join condition is not recognized as join, then the intermediate result (corss product) blows up. It is very likely that this is indeed the problem... Why does this happen for a query this simple?
simply because join recognition in XQuery is not "simple" at all.
But you can try the algebra version, instead --- well, assuming that you won't/can't recompile on Windows, you need to use the pf compiler "by hand", from a shell 9"command prompt") like :
pf -A query.xq | Mserver <options>
or, with a MonetDB/XQuery server running:
pf -A query.xq | MapiClient -lmil <options>
However, I cannot tell you, which <options> you need to sucessfully run Mserver and/or MapiClient "by hand" from a shell on Windows ...
In the top-level folder of the installation (i.e. in the CWI folder), there is a .bat script to run a client. I think there is one for MIL as well as SQL. Use that script and it will figure out the magic arguments to MapiClient.
Cheers!