[MonetDB-users] Performance

20 Jul 2009

      Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM.

What is the best way to organise XML in MonetDB for rapid text 
searching? A run down of my recent experience might help.

I created a collection of around 450 documents (153MB approx.). I ran 
the following query from the command line:

collection("papers")//p[contains(., 'wind farm')]

The query time is at best 19 seconds. That's bad. (It's worse than 
querying a Postgres database with documents stored in the XML field type.)

So to get a reference point I loaded up the 114MB XMark document and ran 
this query:

doc("standard")//text[contains(., "yoke")]

The query time varies from 2 to 4 seconds. Better, but still not great.

Now, adding more RAM (and moving to 64-bit) would speed things up I 
hope! But hardware aside:

1. Is it better to have big documents rather than big collections?

2. Is having small collections (<10 docs) of big documents also inefficient?

Ideally I need to query collections comprising several thousand 
documents using 'text search' predicates. Are there other, better ways 
to run this type of query against a MonetDB XML database? Or should I 
really be using some other platform for this task?

Thanks in advance for any pointers.

-- Roy

Roy Walter

Lefteris

Roy Walter

Lefteris

Jan Rittinger

Lefteris

tags

participants (3)