Running MonetDB/XQuery on a 2.6GHz 32-bit Windows
XP box with 1GB of RAM.
What is the best way to organise XML in MonetDB for rapid text
searching? A run down of my recent experience might help.
I created a collection of around 450 documents (153MB approx.). I ran
the following query from the command line:
collection("papers")//p[contains(., 'wind farm')]
The query time is at best 19 seconds. That's bad. (It's worse than
querying a Postgres database with documents stored in the XML field
type.)
So to get a reference point I loaded up the 114MB XMark document and
ran this query:
doc("standard")//text[contains(., "yoke")]
The query time varies from 2 to 4 seconds. Better, but still not great.
Now, adding more RAM (and moving to 64-bit) would speed things up I
hope! But hardware aside:
1. Is it better to have big documents rather than big collections?
2. Is having small collections (<10 docs) of big documents also
inefficient?
Ideally I need to query collections comprising several thousand
documents using 'text search' predicates. Are there other, better ways
to run this type of query against a MonetDB XML database? Or should I
really be using some other platform for this task?
Thanks in advance for any pointers.
-- Roy