[MonetDB-users] MonetDb-XQuery problem

Using the following XQuery in MonetDb (WinXP) crashes on large documents (when C1 and C2 are small it works ok):
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/year > 1950) and ($b/title = $top/title) return <movie> {$b/year} </movie> } </col>
If I remove one of the two conditions it works perfectly, e.g.:
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/title = $top/title) return <movie> {$b/year} </movie> } </col>
Can anyone explain this problem? ______________________________ Maarten Clements

On Thu, May 24, 2007 at 05:07:03PM +0200, Maarten Clements wrote:
Using the following XQuery in MonetDb (WinXP) crashes on large documents
the usual questions: 1) which version of MonetDB XQuery are you using? 2) what exactly does "crash" mean? does the Mserver simply stop? Is the any error message? Does Mserver grow/excessively use resources before crashing? ...? 3) what does "large" and "small" mean wrt., your document sizes? (serialized file size in byte? number of /collection/doc nodes in either document? ...?) 4) how much memory does your machine have?
(when C1 and C2 are small it works ok):
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/year > 1950) and ($b/title = $top/title) return <movie> {$b/year} </movie> } </col>
If I remove one of the two conditions it works perfectly, e.g.:
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/title = $top/title) return <movie> {$b/year} </movie> } </col>
Can anyone explain this problem?
most probably, the more complex join condition is not recognized as join, then the intermediate result (corss product) blows up. Stefan
______________________________ Maarten Clements
------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |

-----Original Message----- From: Stefan Manegold [mailto:Stefan.Manegold@cwi.nl] Sent: 24 May 2007 18:00 To: M.Clements@cwi.nl Cc: monetdb-users@lists.sourceforge.net Subject: Re: [MonetDB-users] MonetDb-XQuery problem
On Thu, May 24, 2007 at 05:07:03PM +0200, Maarten Clements wrote:
Using the following XQuery in MonetDb (WinXP) crashes on large documents
the usual questions:
1) which version of MonetDB XQuery are you using? The latest, from sjoerd's directory
2) what exactly does "crash" mean? does the Mserver simply stop? Is the any error message? Does Mserver grow/excessively use resources before crashing? ...? It fills up my memory and after doing that Mserver stops.
3) what does "large" and "small" mean wrt., your document sizes? (serialized file size in byte? number of /collection/doc nodes in either document? ...?) The problem occurs with C1 = 59Mb, C2 = 7Kb (but I didn't search for the critical sizes:) Both files contain movie data: C1 is IMDB data with nodes: title,year,tag1,tag2...tagx (im not sure how many movies) C2 is Netflix data with nodes: title, year (100 movies)
4) how much memory does your machine have? 1 Gb
(when C1 and C2 are small it works ok):
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/year > 1950) and ($b/title = $top/title) return <movie> {$b/year} </movie> } </col>
If I remove one of the two conditions it works perfectly, e.g.:
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/title = $top/title) return <movie> {$b/year} </movie> } </col>
Can anyone explain this problem?
most probably, the more complex join condition is not recognized as join, then the intermediate result (corss product) blows up.
It is very likely that this is indeed the problem... Why does this happen for a query this simple?
Stefan
______________________________ Maarten Clements
----------------------------------------------------------------------
--- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | CWI, P.O.Box | 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |

On Thu, May 24, 2007 at 06:15:43PM +0200, Maarten Clements wrote:
-----Original Message----- From: Stefan Manegold [mailto:Stefan.Manegold@cwi.nl] Sent: 24 May 2007 18:00 To: M.Clements@cwi.nl Cc: monetdb-users@lists.sourceforge.net Subject: Re: [MonetDB-users] MonetDb-XQuery problem
On Thu, May 24, 2007 at 05:07:03PM +0200, Maarten Clements wrote:
Using the following XQuery in MonetDb (WinXP) crashes on large documents
the usual questions:
1) which version of MonetDB XQuery are you using? The latest, from sjoerd's directory
so I guess, this is MonetDB 4.17.1 / MonetDB/XQuery 0.17.1, right?
2) what exactly does "crash" mean? does the Mserver simply stop? Is the any error message? Does Mserver grow/excessively use resources before crashing? ...? It fills up my memory and after doing that Mserver stops.
Ok.
3) what does "large" and "small" mean wrt., your document sizes? (serialized file size in byte? number of /collection/doc nodes in either document? ...?) The problem occurs with C1 = 59Mb, C2 = 7Kb (but I didn't search for the critical sizes:) Both files contain movie data: C1 is IMDB data with nodes: title,year,tag1,tag2...tagx (im not sure how many movies) C2 is Netflix data with nodes: title, year (100 movies)
count(doc("C1.xml")/collection/doc) ? count(doc("C2.xml")/collection/doc) ? for both the size that wokr for you, and that that does not (just one for each, no need to find the switching point)
4) how much memory does your machine have? 1 Gb
(when C1 and C2 are small it works ok):
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/year > 1950) and ($b/title = $top/title) return <movie> {$b/year} </movie> } </col>
If I remove one of the two conditions it works perfectly, e.g.:
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/title = $top/title) return <movie> {$b/year} </movie> } </col>
Can anyone explain this problem?
most probably, the more complex join condition is not recognized as join, then the intermediate result (corss product) blows up.
It is very likely that this is indeed the problem... Why does this happen for a query this simple?
simply because join recognition in XQuery is not "simple" at all. But you can try the algebra version, instead --- well, assuming that you won't/can't recompile on Windows, you need to use the pf compiler "by hand", from a shell 9"command prompt") like : pf -A query.xq | Mserver <options> or, with a MonetDB/XQuery server running: pf -A query.xq | MapiClient -lmil <options> However, I cannot tell you, which <options> you need to sucessfully run Mserver and/or MapiClient "by hand" from a shell on Windows ... Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |

On 05/24/2007 06:27 PM, Stefan Manegold wrote:
On Thu, May 24, 2007 at 06:15:43PM +0200, Maarten Clements wrote:
-----Original Message----- From: Stefan Manegold [mailto:Stefan.Manegold@cwi.nl] Sent: 24 May 2007 18:00 To: M.Clements@cwi.nl Cc: monetdb-users@lists.sourceforge.net Subject: Re: [MonetDB-users] MonetDb-XQuery problem
Using the following XQuery in MonetDb (WinXP) crashes on large documents
On Thu, May 24, 2007 at 05:07:03PM +0200, Maarten Clements wrote: the usual questions:
1) which version of MonetDB XQuery are you using? The latest, from sjoerd's directory
so I guess, this is MonetDB 4.17.1 / MonetDB/XQuery 0.17.1, right?
Yep, and not terribly recent (at least several weeks).
2) what exactly does "crash" mean? does the Mserver simply stop? Is the any error message? Does Mserver grow/excessively use resources before crashing? ...? It fills up my memory and after doing that Mserver stops.
Ok.
3) what does "large" and "small" mean wrt., your document sizes? (serialized file size in byte? number of /collection/doc nodes in either document? ...?) The problem occurs with C1 = 59Mb, C2 = 7Kb (but I didn't search for the critical sizes:) Both files contain movie data: C1 is IMDB data with nodes: title,year,tag1,tag2...tagx (im not sure how many movies) C2 is Netflix data with nodes: title, year (100 movies)
count(doc("C1.xml")/collection/doc) ? count(doc("C2.xml")/collection/doc) ?
for both the size that wokr for you, and that that does not (just one for each, no need to find the switching point)
4) how much memory does your machine have? 1 Gb
(when C1 and C2 are small it works ok):
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/year > 1950) and ($b/title = $top/title) return <movie> {$b/year} </movie> } </col> If I remove one of the two conditions it works perfectly, e.g.:
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/title = $top/title) return <movie> {$b/year} </movie> } </col> Can anyone explain this problem? most probably, the more complex join condition is not recognized as join, then the intermediate result (corss product) blows up. It is very likely that this is indeed the problem... Why does this happen for a query this simple?
simply because join recognition in XQuery is not "simple" at all.
But you can try the algebra version, instead --- well, assuming that you won't/can't recompile on Windows, you need to use the pf compiler "by hand", from a shell 9"command prompt") like :
pf -A query.xq | Mserver <options>
or, with a MonetDB/XQuery server running:
pf -A query.xq | MapiClient -lmil <options>
However, I cannot tell you, which <options> you need to sucessfully run Mserver and/or MapiClient "by hand" from a shell on Windows ...
In the top-level folder of the installation (i.e. in the CWI folder), there is a .bat script to run a client. I think there is one for MIL as well as SQL. Use that script and it will figure out the magic arguments to MapiClient. -- Sjoerd Mullender

-----Original Message----- From: Sjoerd Mullender [mailto:sjoerd@acm.org] Sent: 24 May 2007 18:48 To: monetdb-users@lists.sourceforge.net Cc: M.Clements@cwi.nl Subject: Re: [MonetDB-users] MonetDb-XQuery problem
On 05/24/2007 06:27 PM, Stefan Manegold wrote:
On Thu, May 24, 2007 at 06:15:43PM +0200, Maarten Clements wrote:
-----Original Message----- From: Stefan Manegold [mailto:Stefan.Manegold@cwi.nl] Sent: 24 May 2007 18:00 To: M.Clements@cwi.nl Cc: monetdb-users@lists.sourceforge.net Subject: Re: [MonetDB-users] MonetDb-XQuery problem
Using the following XQuery in MonetDb (WinXP) crashes on large documents
On Thu, May 24, 2007 at 05:07:03PM +0200, Maarten Clements wrote: the usual questions:
1) which version of MonetDB XQuery are you using? The latest, from sjoerd's directory
so I guess, this is MonetDB 4.17.1 / MonetDB/XQuery 0.17.1, right?
Yep, and not terribly recent (at least several weeks).
2) what exactly does "crash" mean? does the Mserver simply stop? Is the any error message? Does Mserver grow/excessively use resources before crashing? ...? It fills up my memory and after doing that Mserver stops.
Ok.
3) what does "large" and "small" mean wrt., your document sizes? (serialized file size in byte? number of /collection/doc nodes in either document? ...?) The problem occurs with C1 = 59Mb, C2 = 7Kb (but I didn't search for the critical sizes:) Both files contain movie data: C1 is IMDB data with nodes: title,year,tag1,tag2...tagx (im not sure how many movies) C2 is Netflix data with nodes: title, year (100 movies)
count(doc("C1.xml")/collection/doc) ? count(doc("C2.xml")/collection/doc) ?
count(doc("C1.xml")/collection/doc) = 245 count(doc("C2.xml")/collection/doc) = 100 Works count(doc("C1.xml")/collection/doc) = 221097 count(doc("C2.xml")/collection/doc) = 100 Fails
for both the size that wokr for you, and that that does not
(just one
for each, no need to find the switching point)
4) how much memory does your machine have? 1 Gb
(when C1 and C2 are small it works ok):
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/year > 1950) and ($b/title = $top/title) return <movie> {$b/year} </movie> } </col> If I remove one of the two conditions it works perfectly, e.g.:
<col> { for $b in doc("C1.xml")/collection/doc for $top in doc("C2.xml")/collection/doc where ($b/title = $top/title) return <movie> {$b/year} </movie> } </col> Can anyone explain this problem? most probably, the more complex join condition is not recognized as join, then the intermediate result (corss product) blows up. It is very likely that this is indeed the problem... Why does this happen for a query this simple?
simply because join recognition in XQuery is not "simple" at all.
But you can try the algebra version, instead --- well, assuming that you won't/can't recompile on Windows, you need to use the pf compiler "by hand", from a shell 9"command prompt") like :
pf -A query.xq | Mserver <options>
or, with a MonetDB/XQuery server running:
pf -A query.xq | MapiClient -lmil <options>
However, I cannot tell you, which <options> you need to sucessfully run Mserver and/or MapiClient "by hand" from a shell on Windows ...
In the top-level folder of the installation (i.e. in the CWI folder), there is a .bat script to run a client. I think there is one for MIL as well as SQL. Use that script and it will figure out the magic arguments to MapiClient.
Cheers!
participants (3)
-
Maarten Clements
-
Sjoerd Mullender
-
Stefan Manegold