Re: [MonetDB-users] Performance

20 Jul 2009


      On Mon, Jul 20, 2009 at 8:45 PM, Jan
Rittinger wrote:
...
Hi Roy,
I'm not sure how familiar you are with XQuery...
The problem for MonetDB/XQuery (without PF/Tijah) in your query could be
that underneath the p elements (even if not used in a nested variant) there
might be a large number of nodes. Your query asks for the atomization of all
nodes p which leads to a concatenation of all descendant text nodes. (The
reason is that node p in '<p>wind <foo/>farm</p>' should be a match as
well.) Only then function contains does it's work.
I guess a slightly modified variant 'for $p in collection("papers")//p where
some $t in $p//text() satisfies contains($t, "wind farm") return $p' where
no concatenation takes place might give you a better performance. It however
does not catch text snippets across textnodes.
Answering your question about the size of the documents vs. collections:
Currently all documents in a collection are stored in a single big relation.
So if you store many big documents in a collection you will get quite a
large relation. (If you split your data into multiple collections you will
get smaller relations and on a machine with small RAM perhaps less
swapping.)
As to Lefteris comment: PF/Tijah uses its indexes only for the additional
text retrieval operations. It does not speed up standard XQuery functions
such as contains.
Jan
Yes I was not clear on that. You have to use the pf/tijah specific
functions, the fn:contains and other string functions will still be
generic as before. I just assumed that your application is text
retrieval of some kind and the functionality of pf/tijah will be more
helpful.

Thank you Jan for pointing this.
...
On Jul 20, 2009, at 18:28, Roy Walter wrote:
Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM.
What is the best way to organise XML in MonetDB for rapid text searching? A
run down of my recent experience might help.
I created a collection of around 450 documents (153MB approx.). I ran the
following query from the command line:
collection("papers")//p[contains(., 'wind farm')]
The query time is at best 19 seconds. That's bad. (It's worse than querying
a Postgres database with documents stored in the XML field type.)
So to get a reference point I loaded up the 114MB XMark document and ran
this query:
doc("standard")//text[contains(., "yoke")]
The query time varies from 2 to 4 seconds. Better, but still not great.
Now, adding more RAM (and moving to 64-bit) would speed things up I hope!
But hardware aside:
1. Is it better to have big documents rather than big collections?
2. Is having small collections (<10 docs) of big documents also inefficient?
Ideally I need to query collections comprising several thousand documents
using 'text search' predicates. Are there other, better ways to run this
type of query against a MonetDB XML database? Or should I really be using
some other platform for this task?
Thanks in advance for any pointers.
-- Roy
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize
details at:
http://p.sf.net/sfu/Challenge_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users
--
Jan Rittinger
Lehrstuhl Datenbanken und Informationssysteme
Wilhelm-Schickard-Institut für Informatik
Eberhard-Karls-Universität Tübingen
http://www-db.informatik.uni-tuebingen.de/team/rittinger
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users