[MonetDB-users] Large Database / Is there hope?
Hello, I am still having problems compiling from sources, however, I would like to determine whether these is hope in continuing. We have a corpus of about 18,000 documents averaging about 5000 words each. We additionally have stand-off annotations that we would like to add and query against. This totals about 300 million XML elements. We are running 64-bit Linux with 6GiB RAM. Database A: Loading the documents (in batches of 100) as separate collections takes about 30 min and consumes 30GiB disk space in var/MonetDB4/dbfarm. Simple queries against database A take a long time, consume only 5% CPU and heavily work kswapd count( for $d in pf:documents-unsafe() return doc($d)/tCorpus ) # killed after1 hour Database B: Loading the documents (in batches of 100) into a single huge collection takes about 6 hours and consumes 120GiB disk space in var/MonetDB4/dbfarm. Simple queries against Database B take a long time, consume only 5% CPU and heavily work pdflush count( pf:collection("tcorpus")//tCorpus ) # killed after 3 hours My questions: 1. Is there any hope of successfully performing non-trivial queries on either of these databases using MonetDB? 2. If so, Is loading into a single collection or separate collections likely to be preferable? 3. The above work was done using the Jun2010-SP2 SuperBall, which is the only version I have been able to compile. Has any relevant work been done since then on MonetDB4 or any of the XQuery code that might improve performance? Thank you, Dean
participants (1)
-
Dean Serenevy