Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM.

What is the best way to organise XML in MonetDB for rapid text searching? A run down of my recent experience might help.

I created a collection of around 450 documents (153MB approx.). I ran the following query from the command line:

collection("papers")//p[contains(., 'wind farm')]

The query time is at best 19 seconds. That's bad. (It's worse than querying a Postgres database with documents stored in the XML field type.)

So to get a reference point I loaded up the 114MB XMark document and ran this query:

doc("standard")//text[contains(., "yoke")]

The query time varies from 2 to 4 seconds. Better, but still not great.

Now, adding more RAM (and moving to 64-bit) would speed things up I hope! But hardware aside:

1. Is it better to have big documents rather than big collections?

2. Is having small collections (<10 docs) of big documents also inefficient?

Ideally I need to query collections comprising several thousand documents using 'text search' predicates. Are there other, better ways to run this type of query against a MonetDB XML database? Or should I really be using some other platform for this task?

Thanks in advance for any pointers.

-- Roy