Peter, On Wed, Aug 17, 2005 at 04:22:21PM +0200, Peter van der Kamp wrote:
My first experiments were done on an old Compaq Proliant 3000 with a 350 Mhz Pentium II processor and 128 Mb memory, running Fedora Core 3. I loaded 10 dictionary documents with a total size of 146 Mb. Retrieving the headwords from that file took about 2 minutes. Full text searches were slower and searching on attributes did not finish. As MonetDB was designed for high-performance I became suspicious, not only about the machine, but also about my queries. E.g. I had to loop over all the documents and I wonder if this could be a drawback. So I 'glued' the documents together and loaded that single file. But from a performance point of view it didn't make much difference.
I have now transferred my experiments to a 2.8 GHz Pentium machine with 1 Gb of memory, also running Fedora Core 3, and that's much better with respect to performance. The complete (WNT) dictionary is now loaded, 40 files, total size c. 450 Mb.
indeed, your first machine *seems* to be a bit small for the given document size, especially as (some) XQuery queries might require large intermediate results. In order to know, whether this is the case with your queries, and whether we might improve/extend the MonetDB XQuery compiler to avoid these large intermediate results (provided they are not query inherent), I/we would need to know your queries. Please feel free to send then to me/us via this list or via personal email.
To give some more background information: we are currently in the process of selecting an xml database system for our dictionary data: Woordenboek der Nederlandsche Taal (WNT, Dictionary of the Dutch Language), Dictionary of Early Middle Dutch and General Dutch Dictionary (ANW). Requirements are (amongst others): good performance especially for full text searches e.g. searching for a word(s) in a sense or citation) and the ability to cooperate with xml editors like XMLSpy.
We have not yet spent any time/efford on inpluding any particular high-performance full text search support in MonetDB/XQuery. However, within the CIRQUID project (http://wwwhome.cs.utwente.nl/~cirquid/), Arjen de Vries (http://www.cwi.nl/~arjen/, arjen@acm.org) and his colleagues are investigating how to "design and build a DBMS that seemlessly integrates relevance-oriented querying of semi-structured data (XML) with traditional querying of this data". This work also incudes full text search of XML documents. Please feel free to contact Arjen (directly or via me) for more details. Could yoy please specify, what kind of "cooperation" with xml editors you do require? Regards, Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |