Re: [MonetDB-users] Performance

22 Jul 2009

      Another problem:

Because of the difficulty I was having with collections and indexes I 
decided to combine my XML documents to create several large documents of 
approx. 100MB. Adding and querying one document worked fine. Then I 
tried to add a second document which produced the following error:

    xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2006.xml",
    "debates20
    06.xml", "debates")
    more>^Z
    xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml",
    "debates20
    07.xml", "debates")
    more>^Z
    MAPI  = monetdb@localhost:50000
    QUERY =
    pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates2
    007.xml", "debates")
    ERROR = !ERROR: GDKremovedir: rmdir(bat\DELETE_ME) failed.
            !OS: The directory is not empty.
            !OS: XQDY0062: checkpoint failed (in pf_checkpoint), query
    aborted.
    xquery>

Anyone know what this means?

-- Roy

Roy Walter wrote:
...
OK a few more problems and a success.
I deleted my data and started again to get around the index corruption.
I reloaded the 110MB xmark document and ran a tijah:query(). The query 
was completed in < 0.5 second. Good!
I then reloaded a collection of 360 of my documents. The load was fine 
so I ran a basic tijah:query() that took a long time. Something must 
be wrong so I thought I would delete and re-create the index.
Deleting the index appeared to work without error. On recreating the 
index I got an error. The error refers to the XML document that was 
used to create my collection, i.e., the offending document is not in 
the database. Here's the console output:
QUERY = tijah:create-ft-index()
ERROR = !ERROR: [shred_url]: 1 times inserted nil due to errors at 
tuples 0@0.
        !ERROR: [shred_url]: first error was:
        !ERROR: shred: cannot stat 
`::4417490901795001::\\nas\public\2006docs.xm
l': No such file or directory
        !ERROR: CMDshred_url: operation failed.
        !ERROR: interpret_params: leftfetchjoin(param 2): evaluation 
error.
Timer    1439.966 msec
-- Roy
Sjoerd Mullender wrote:
...
On Windows the .bat scripts that you use to start the server (possibly
via the Start menu) specify the dbfarm directory as
%APPDATA%\MonetDB4\dbfarm.  APPDATA is you Application Data folder
C:\Documents and Settings\<username>\Application Data (on XP).
Roy Walter wrote:
...
OK so this is interesting. I don't have a dbfarm directory.
I looked in monetdb.conf and noticed a couple of things.
The MonetDB installation path value appears as:
prefix=c:\documents and settings\sjoerd\my documents\src\stable\icc32\nt32
This means that neither the datadir nor the gdk_dbfarm entries can be
properly processed. Might this explain why I [and others] have to start
the XQuery Server from the command line and load the modules manually?
So right now I can't find my data :) What file extensions does MonetDB use?
-- Roy
Lefteris wrote:
...
Hi,
it looks that something got corrupted in the process. This is starting
to be very usual with windows installations. Anyway, you should delete
the contents of the dbfarm directory in the monetdb installation
directory. This will *delete* all data in your database (i guess you
dont have any important data still in cause you are testing). After
you delete dbfarm, start mserver, that will populate again dbfarm with
the default files. then, add again your data with the usual
pf:add-doc, then create the tijah indices and then run your queries:)
If this does not work, we will have to try to "wake up" the pf/tijah
people to help us.
Hope this will help,
lefteris
On Tue, Jul 21, 2009 at 10:35 AM, Roy Walter wrote:
...
OK, I restarted and got an error message that pointed to a problem document
in a collection. I deleted the offending document and then tried to generate
the default index with tijah:create-ft-index(). This failed because,
apparently, the DFLT_FT_INDEX already exists.
So I thought that even though the index compilation appeared to have failed
at the earlier error an index must have been created.
I tried a tijah:query(). That failed because the DFLT_FT_INDEX does not
exist.
Hmm, so I tried tijah:delete-ft-index() and it too told me that
DFLT_FT_INDEX does not exist.
tijah:create-ft-index() still fails with: !ERROR tj_init_collection, pftijah
collection already exists: DFLT_FT_INDEX
How do I reset?
-- Roy
Lefteris wrote:
This is not expected.
Did you try to restart the server and retry?
You might also have a corrupted dbfarm or the documents didn't shred
correctly to begin with. Which version of monet are you using? how did
you installed it?
lefteris
On Mon, Jul 20, 2009 at 8:52 PM, Roy Walter
wrote:
Hi lefteris
Well that seems to tick all the boxes.
I tried the global index creation:
tijah:create-ft-index()
and it crashed the server with:
!WARNING: readClient: unexpected end of file; discarding partial input
Hmm...
R.
Lefteris wrote:
Hi Roy,
I suggest that you try the pf/tijah module for MonetDB/XQuery.
http://dbappl.cs.utwente.nl/pftijah/
This will create specific indices for your queries to facilitate text
search.
Hope this helps for now. We will also investigate were the time is
spent in your case (without pf/tijah) and come back to you. How many p
elements your documents have? The problem might be that because monet
does not build inverted indices on text by itself, it has to visit
each p element and search with the help of the pcre library. Pf/tijah
was build for that purpose and should help alot.
Please feel free to contact us for further clarification and new
findings from your tests:)
cheers,
lefteris
On Mon, Jul 20, 2009 at 6:28 PM, Roy Walter
wrote:
Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM.
What is the best way to organise XML in MonetDB for rapid text searching?
A
run down of my recent experience might help.
I created a collection of around 450 documents (153MB approx.). I ran the
following query from the command line:
collection("papers")//p[contains(., 'wind farm')]
The query time is at best 19 seconds. That's bad. (It's worse than
querying
a Postgres database with documents stored in the XML field type.)
So to get a reference point I loaded up the 114MB XMark document and ran
this query:
doc("standard")//text[contains(., "yoke")]
The query time varies from 2 to 4 seconds. Better, but still not great.
Now, adding more RAM (and moving to 64-bit) would speed things up I hope!
But hardware aside:
1. Is it better to have big documents rather than big collections?
2. Is having small collections (<10 docs) of big documents also
inefficient?
Ideally I need to query collections comprising several thousand documents
using 'text search' predicates. Are there other, better ways to run this
type of query against a MonetDB XML database? Or should I really be using
some other platform for this task?
Thanks in advance for any pointers.
-- Roy
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge
This is your chance to win up to $100,000 in prizes! For a limited time,
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full
prize
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
------------------------------------------------------------------------
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
------------------------------------------------------------------------
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users