Re: [MonetDB-users] Performance
OK so this is interesting. I don't have a dbfarm directory. I looked in monetdb.conf and noticed a couple of things. The MonetDB installation path value appears as: prefix=c:\documents and settings\sjoerd\my documents\src\stable\icc32\nt32 This means that neither the datadir nor the gdk_dbfarm entries can be properly processed. Might this explain why I [and others] have to start the XQuery Server from the command line and load the modules manually? So right now I can't find my data :) What file extensions does MonetDB use? -- Roy Lefteris wrote:
Hi,
it looks that something got corrupted in the process. This is starting to be very usual with windows installations. Anyway, you should delete the contents of the dbfarm directory in the monetdb installation directory. This will *delete* all data in your database (i guess you dont have any important data still in cause you are testing). After you delete dbfarm, start mserver, that will populate again dbfarm with the default files. then, add again your data with the usual pf:add-doc, then create the tijah indices and then run your queries:)
If this does not work, we will have to try to "wake up" the pf/tijah people to help us.
Hope this will help,
lefteris
On Tue, Jul 21, 2009 at 10:35 AM, Roy Walter
wrote: OK, I restarted and got an error message that pointed to a problem document in a collection. I deleted the offending document and then tried to generate the default index with tijah:create-ft-index(). This failed because, apparently, the DFLT_FT_INDEX already exists.
So I thought that even though the index compilation appeared to have failed at the earlier error an index must have been created.
I tried a tijah:query(). That failed because the DFLT_FT_INDEX does not exist.
Hmm, so I tried tijah:delete-ft-index() and it too told me that DFLT_FT_INDEX does not exist.
tijah:create-ft-index() still fails with: !ERROR tj_init_collection, pftijah collection already exists: DFLT_FT_INDEX
How do I reset?
-- Roy
Lefteris wrote:
This is not expected.
Did you try to restart the server and retry?
You might also have a corrupted dbfarm or the documents didn't shred correctly to begin with. Which version of monet are you using? how did you installed it?
lefteris
On Mon, Jul 20, 2009 at 8:52 PM, Roy Walter
wrote: Hi lefteris
Well that seems to tick all the boxes.
I tried the global index creation:
tijah:create-ft-index()
and it crashed the server with:
!WARNING: readClient: unexpected end of file; discarding partial input
Hmm...
R.
Lefteris wrote:
Hi Roy,
I suggest that you try the pf/tijah module for MonetDB/XQuery.
http://dbappl.cs.utwente.nl/pftijah/
This will create specific indices for your queries to facilitate text search.
Hope this helps for now. We will also investigate were the time is spent in your case (without pf/tijah) and come back to you. How many p elements your documents have? The problem might be that because monet does not build inverted indices on text by itself, it has to visit each p element and search with the help of the pcre library. Pf/tijah was build for that purpose and should help alot.
Please feel free to contact us for further clarification and new findings from your tests:)
cheers,
lefteris
On Mon, Jul 20, 2009 at 6:28 PM, Roy Walter
wrote: Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM.
What is the best way to organise XML in MonetDB for rapid text searching? A run down of my recent experience might help.
I created a collection of around 450 documents (153MB approx.). I ran the following query from the command line:
collection("papers")//p[contains(., 'wind farm')]
The query time is at best 19 seconds. That's bad. (It's worse than querying a Postgres database with documents stored in the XML field type.)
So to get a reference point I loaded up the 114MB XMark document and ran this query:
doc("standard")//text[contains(., "yoke")]
The query time varies from 2 to 4 seconds. Better, but still not great.
Now, adding more RAM (and moving to 64-bit) would speed things up I hope! But hardware aside:
1. Is it better to have big documents rather than big collections?
2. Is having small collections (<10 docs) of big documents also inefficient?
Ideally I need to query collections comprising several thousand documents using 'text search' predicates. Are there other, better ways to run this type of query against a MonetDB XML database? Or should I really be using some other platform for this task?
Thanks in advance for any pointers.
-- Roy
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Windows the .bat scripts that you use to start the server (possibly via the Start menu) specify the dbfarm directory as %APPDATA%\MonetDB4\dbfarm. APPDATA is you Application Data folder C:\Documents and Settings\<username>\Application Data (on XP). Roy Walter wrote:
OK so this is interesting. I don't have a dbfarm directory.
I looked in monetdb.conf and noticed a couple of things.
The MonetDB installation path value appears as:
prefix=c:\documents and settings\sjoerd\my documents\src\stable\icc32\nt32
This means that neither the datadir nor the gdk_dbfarm entries can be properly processed. Might this explain why I [and others] have to start the XQuery Server from the command line and load the modules manually?
So right now I can't find my data :) What file extensions does MonetDB use?
-- Roy
Lefteris wrote:
Hi,
it looks that something got corrupted in the process. This is starting to be very usual with windows installations. Anyway, you should delete the contents of the dbfarm directory in the monetdb installation directory. This will *delete* all data in your database (i guess you dont have any important data still in cause you are testing). After you delete dbfarm, start mserver, that will populate again dbfarm with the default files. then, add again your data with the usual pf:add-doc, then create the tijah indices and then run your queries:)
If this does not work, we will have to try to "wake up" the pf/tijah people to help us.
Hope this will help,
lefteris
On Tue, Jul 21, 2009 at 10:35 AM, Roy Walter
wrote: OK, I restarted and got an error message that pointed to a problem document in a collection. I deleted the offending document and then tried to generate the default index with tijah:create-ft-index(). This failed because, apparently, the DFLT_FT_INDEX already exists.
So I thought that even though the index compilation appeared to have failed at the earlier error an index must have been created.
I tried a tijah:query(). That failed because the DFLT_FT_INDEX does not exist.
Hmm, so I tried tijah:delete-ft-index() and it too told me that DFLT_FT_INDEX does not exist.
tijah:create-ft-index() still fails with: !ERROR tj_init_collection, pftijah collection already exists: DFLT_FT_INDEX
How do I reset?
-- Roy
Lefteris wrote:
This is not expected.
Did you try to restart the server and retry?
You might also have a corrupted dbfarm or the documents didn't shred correctly to begin with. Which version of monet are you using? how did you installed it?
lefteris
On Mon, Jul 20, 2009 at 8:52 PM, Roy Walter
wrote: Hi lefteris
Well that seems to tick all the boxes.
I tried the global index creation:
tijah:create-ft-index()
and it crashed the server with:
!WARNING: readClient: unexpected end of file; discarding partial input
Hmm...
R.
Lefteris wrote:
Hi Roy,
I suggest that you try the pf/tijah module for MonetDB/XQuery.
http://dbappl.cs.utwente.nl/pftijah/
This will create specific indices for your queries to facilitate text search.
Hope this helps for now. We will also investigate were the time is spent in your case (without pf/tijah) and come back to you. How many p elements your documents have? The problem might be that because monet does not build inverted indices on text by itself, it has to visit each p element and search with the help of the pcre library. Pf/tijah was build for that purpose and should help alot.
Please feel free to contact us for further clarification and new findings from your tests:)
cheers,
lefteris
On Mon, Jul 20, 2009 at 6:28 PM, Roy Walter
wrote: Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM.
What is the best way to organise XML in MonetDB for rapid text searching? A run down of my recent experience might help.
I created a collection of around 450 documents (153MB approx.). I ran the following query from the command line:
collection("papers")//p[contains(., 'wind farm')]
The query time is at best 19 seconds. That's bad. (It's worse than querying a Postgres database with documents stored in the XML field type.)
So to get a reference point I loaded up the 114MB XMark document and ran this query:
doc("standard")//text[contains(., "yoke")]
The query time varies from 2 to 4 seconds. Better, but still not great.
Now, adding more RAM (and moving to 64-bit) would speed things up I hope! But hardware aside:
1. Is it better to have big documents rather than big collections?
2. Is having small collections (<10 docs) of big documents also inefficient?
Ideally I need to query collections comprising several thousand documents using 'text search' predicates. Are there other, better ways to run this type of query against a MonetDB XML database? Or should I really be using some other platform for this task?
Thanks in advance for any pointers.
-- Roy
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Sjoerd Mullender
Thanks, I just figured that out from looking at the DB environment in the admin GUI and I have now found my dbfarm folder! -- Roy Sjoerd Mullender wrote:
On Windows the .bat scripts that you use to start the server (possibly via the Start menu) specify the dbfarm directory as %APPDATA%\MonetDB4\dbfarm. APPDATA is you Application Data folder C:\Documents and Settings\<username>\Application Data (on XP).
Roy Walter wrote:
OK so this is interesting. I don't have a dbfarm directory.
I looked in monetdb.conf and noticed a couple of things.
The MonetDB installation path value appears as:
prefix=c:\documents and settings\sjoerd\my documents\src\stable\icc32\nt32
This means that neither the datadir nor the gdk_dbfarm entries can be properly processed. Might this explain why I [and others] have to start the XQuery Server from the command line and load the modules manually?
So right now I can't find my data :) What file extensions does MonetDB use?
-- Roy
Lefteris wrote:
Hi,
it looks that something got corrupted in the process. This is starting to be very usual with windows installations. Anyway, you should delete the contents of the dbfarm directory in the monetdb installation directory. This will *delete* all data in your database (i guess you dont have any important data still in cause you are testing). After you delete dbfarm, start mserver, that will populate again dbfarm with the default files. then, add again your data with the usual pf:add-doc, then create the tijah indices and then run your queries:)
If this does not work, we will have to try to "wake up" the pf/tijah people to help us.
Hope this will help,
lefteris
On Tue, Jul 21, 2009 at 10:35 AM, Roy Walter
wrote: OK, I restarted and got an error message that pointed to a problem document in a collection. I deleted the offending document and then tried to generate the default index with tijah:create-ft-index(). This failed because, apparently, the DFLT_FT_INDEX already exists.
So I thought that even though the index compilation appeared to have failed at the earlier error an index must have been created.
I tried a tijah:query(). That failed because the DFLT_FT_INDEX does not exist.
Hmm, so I tried tijah:delete-ft-index() and it too told me that DFLT_FT_INDEX does not exist.
tijah:create-ft-index() still fails with: !ERROR tj_init_collection, pftijah collection already exists: DFLT_FT_INDEX
How do I reset?
-- Roy
Lefteris wrote:
This is not expected.
Did you try to restart the server and retry?
You might also have a corrupted dbfarm or the documents didn't shred correctly to begin with. Which version of monet are you using? how did you installed it?
lefteris
On Mon, Jul 20, 2009 at 8:52 PM, Roy Walter
wrote: Hi lefteris
Well that seems to tick all the boxes.
I tried the global index creation:
tijah:create-ft-index()
and it crashed the server with:
!WARNING: readClient: unexpected end of file; discarding partial input
Hmm...
R.
Lefteris wrote:
Hi Roy,
I suggest that you try the pf/tijah module for MonetDB/XQuery.
http://dbappl.cs.utwente.nl/pftijah/
This will create specific indices for your queries to facilitate text search.
Hope this helps for now. We will also investigate were the time is spent in your case (without pf/tijah) and come back to you. How many p elements your documents have? The problem might be that because monet does not build inverted indices on text by itself, it has to visit each p element and search with the help of the pcre library. Pf/tijah was build for that purpose and should help alot.
Please feel free to contact us for further clarification and new findings from your tests:)
cheers,
lefteris
On Mon, Jul 20, 2009 at 6:28 PM, Roy Walter
wrote: Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM.
What is the best way to organise XML in MonetDB for rapid text searching? A run down of my recent experience might help.
I created a collection of around 450 documents (153MB approx.). I ran the following query from the command line:
collection("papers")//p[contains(., 'wind farm')]
The query time is at best 19 seconds. That's bad. (It's worse than querying a Postgres database with documents stored in the XML field type.)
So to get a reference point I loaded up the 114MB XMark document and ran this query:
doc("standard")//text[contains(., "yoke")]
The query time varies from 2 to 4 seconds. Better, but still not great.
Now, adding more RAM (and moving to 64-bit) would speed things up I hope! But hardware aside:
1. Is it better to have big documents rather than big collections?
2. Is having small collections (<10 docs) of big documents also inefficient?
Ideally I need to query collections comprising several thousand documents using 'text search' predicates. Are there other, better ways to run this type of query against a MonetDB XML database? Or should I really be using some other platform for this task?
Thanks in advance for any pointers.
-- Roy
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
OK a few more problems and a success. I deleted my data and started again to get around the index corruption. I reloaded the 110MB xmark document and ran a tijah:query(). The query was completed in < 0.5 second. Good! I then reloaded a collection of 360 of my documents. The load was fine so I ran a basic tijah:query() that took a long time. Something must be wrong so I thought I would delete and re-create the index. Deleting the index appeared to work without error. On recreating the index I got an error. The error refers to the XML document that was used to create my collection, i.e., the offending document is not in the database. Here's the console output: QUERY = tijah:create-ft-index() ERROR = !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: shred: cannot stat `::4417490901795001::\\nas\public\2006docs.xm l': No such file or directory !ERROR: CMDshred_url: operation failed. !ERROR: interpret_params: leftfetchjoin(param 2): evaluation error. Timer 1439.966 msec -- Roy Sjoerd Mullender wrote:
On Windows the .bat scripts that you use to start the server (possibly via the Start menu) specify the dbfarm directory as %APPDATA%\MonetDB4\dbfarm. APPDATA is you Application Data folder C:\Documents and Settings\<username>\Application Data (on XP).
Roy Walter wrote:
OK so this is interesting. I don't have a dbfarm directory.
I looked in monetdb.conf and noticed a couple of things.
The MonetDB installation path value appears as:
prefix=c:\documents and settings\sjoerd\my documents\src\stable\icc32\nt32
This means that neither the datadir nor the gdk_dbfarm entries can be properly processed. Might this explain why I [and others] have to start the XQuery Server from the command line and load the modules manually?
So right now I can't find my data :) What file extensions does MonetDB use?
-- Roy
Lefteris wrote:
Hi,
it looks that something got corrupted in the process. This is starting to be very usual with windows installations. Anyway, you should delete the contents of the dbfarm directory in the monetdb installation directory. This will *delete* all data in your database (i guess you dont have any important data still in cause you are testing). After you delete dbfarm, start mserver, that will populate again dbfarm with the default files. then, add again your data with the usual pf:add-doc, then create the tijah indices and then run your queries:)
If this does not work, we will have to try to "wake up" the pf/tijah people to help us.
Hope this will help,
lefteris
On Tue, Jul 21, 2009 at 10:35 AM, Roy Walter
wrote: OK, I restarted and got an error message that pointed to a problem document in a collection. I deleted the offending document and then tried to generate the default index with tijah:create-ft-index(). This failed because, apparently, the DFLT_FT_INDEX already exists.
So I thought that even though the index compilation appeared to have failed at the earlier error an index must have been created.
I tried a tijah:query(). That failed because the DFLT_FT_INDEX does not exist.
Hmm, so I tried tijah:delete-ft-index() and it too told me that DFLT_FT_INDEX does not exist.
tijah:create-ft-index() still fails with: !ERROR tj_init_collection, pftijah collection already exists: DFLT_FT_INDEX
How do I reset?
-- Roy
Lefteris wrote:
This is not expected.
Did you try to restart the server and retry?
You might also have a corrupted dbfarm or the documents didn't shred correctly to begin with. Which version of monet are you using? how did you installed it?
lefteris
On Mon, Jul 20, 2009 at 8:52 PM, Roy Walter
wrote: Hi lefteris
Well that seems to tick all the boxes.
I tried the global index creation:
tijah:create-ft-index()
and it crashed the server with:
!WARNING: readClient: unexpected end of file; discarding partial input
Hmm...
R.
Lefteris wrote:
Hi Roy,
I suggest that you try the pf/tijah module for MonetDB/XQuery.
http://dbappl.cs.utwente.nl/pftijah/
This will create specific indices for your queries to facilitate text search.
Hope this helps for now. We will also investigate were the time is spent in your case (without pf/tijah) and come back to you. How many p elements your documents have? The problem might be that because monet does not build inverted indices on text by itself, it has to visit each p element and search with the help of the pcre library. Pf/tijah was build for that purpose and should help alot.
Please feel free to contact us for further clarification and new findings from your tests:)
cheers,
lefteris
On Mon, Jul 20, 2009 at 6:28 PM, Roy Walter
wrote: Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM.
What is the best way to organise XML in MonetDB for rapid text searching? A run down of my recent experience might help.
I created a collection of around 450 documents (153MB approx.). I ran the following query from the command line:
collection("papers")//p[contains(., 'wind farm')]
The query time is at best 19 seconds. That's bad. (It's worse than querying a Postgres database with documents stored in the XML field type.)
So to get a reference point I loaded up the 114MB XMark document and ran this query:
doc("standard")//text[contains(., "yoke")]
The query time varies from 2 to 4 seconds. Better, but still not great.
Now, adding more RAM (and moving to 64-bit) would speed things up I hope! But hardware aside:
1. Is it better to have big documents rather than big collections?
2. Is having small collections (<10 docs) of big documents also inefficient?
Ideally I need to query collections comprising several thousand documents using 'text search' predicates. Are there other, better ways to run this type of query against a MonetDB XML database? Or should I really be using some other platform for this task?
Thanks in advance for any pointers.
-- Roy
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Another problem: Because of the difficulty I was having with collections and indexes I decided to combine my XML documents to create several large documents of approx. 100MB. Adding and querying one document worked fine. Then I tried to add a second document which produced the following error: xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2006.xml", "debates20 06.xml", "debates") more>^Z xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates20 07.xml", "debates") more>^Z MAPI = monetdb@localhost:50000 QUERY = pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates2 007.xml", "debates") ERROR = !ERROR: GDKremovedir: rmdir(bat\DELETE_ME) failed. !OS: The directory is not empty. !OS: XQDY0062: checkpoint failed (in pf_checkpoint), query aborted. xquery> Anyone know what this means? -- Roy Roy Walter wrote:
OK a few more problems and a success.
I deleted my data and started again to get around the index corruption.
I reloaded the 110MB xmark document and ran a tijah:query(). The query was completed in < 0.5 second. Good!
I then reloaded a collection of 360 of my documents. The load was fine so I ran a basic tijah:query() that took a long time. Something must be wrong so I thought I would delete and re-create the index.
Deleting the index appeared to work without error. On recreating the index I got an error. The error refers to the XML document that was used to create my collection, i.e., the offending document is not in the database. Here's the console output:
QUERY = tijah:create-ft-index() ERROR = !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: shred: cannot stat `::4417490901795001::\\nas\public\2006docs.xm l': No such file or directory !ERROR: CMDshred_url: operation failed. !ERROR: interpret_params: leftfetchjoin(param 2): evaluation error. Timer 1439.966 msec
-- Roy
Sjoerd Mullender wrote:
On Windows the .bat scripts that you use to start the server (possibly via the Start menu) specify the dbfarm directory as %APPDATA%\MonetDB4\dbfarm. APPDATA is you Application Data folder C:\Documents and Settings\<username>\Application Data (on XP).
Roy Walter wrote:
OK so this is interesting. I don't have a dbfarm directory.
I looked in monetdb.conf and noticed a couple of things.
The MonetDB installation path value appears as:
prefix=c:\documents and settings\sjoerd\my documents\src\stable\icc32\nt32
This means that neither the datadir nor the gdk_dbfarm entries can be properly processed. Might this explain why I [and others] have to start the XQuery Server from the command line and load the modules manually?
So right now I can't find my data :) What file extensions does MonetDB use?
-- Roy
Lefteris wrote:
Hi,
it looks that something got corrupted in the process. This is starting to be very usual with windows installations. Anyway, you should delete the contents of the dbfarm directory in the monetdb installation directory. This will *delete* all data in your database (i guess you dont have any important data still in cause you are testing). After you delete dbfarm, start mserver, that will populate again dbfarm with the default files. then, add again your data with the usual pf:add-doc, then create the tijah indices and then run your queries:)
If this does not work, we will have to try to "wake up" the pf/tijah people to help us.
Hope this will help,
lefteris
On Tue, Jul 21, 2009 at 10:35 AM, Roy Walter
wrote: OK, I restarted and got an error message that pointed to a problem document in a collection. I deleted the offending document and then tried to generate the default index with tijah:create-ft-index(). This failed because, apparently, the DFLT_FT_INDEX already exists.
So I thought that even though the index compilation appeared to have failed at the earlier error an index must have been created.
I tried a tijah:query(). That failed because the DFLT_FT_INDEX does not exist.
Hmm, so I tried tijah:delete-ft-index() and it too told me that DFLT_FT_INDEX does not exist.
tijah:create-ft-index() still fails with: !ERROR tj_init_collection, pftijah collection already exists: DFLT_FT_INDEX
How do I reset?
-- Roy
Lefteris wrote:
This is not expected.
Did you try to restart the server and retry?
You might also have a corrupted dbfarm or the documents didn't shred correctly to begin with. Which version of monet are you using? how did you installed it?
lefteris
On Mon, Jul 20, 2009 at 8:52 PM, Roy Walter
wrote: Hi lefteris
Well that seems to tick all the boxes.
I tried the global index creation:
tijah:create-ft-index()
and it crashed the server with:
!WARNING: readClient: unexpected end of file; discarding partial input
Hmm...
R.
Lefteris wrote:
Hi Roy,
I suggest that you try the pf/tijah module for MonetDB/XQuery.
http://dbappl.cs.utwente.nl/pftijah/
This will create specific indices for your queries to facilitate text search.
Hope this helps for now. We will also investigate were the time is spent in your case (without pf/tijah) and come back to you. How many p elements your documents have? The problem might be that because monet does not build inverted indices on text by itself, it has to visit each p element and search with the help of the pcre library. Pf/tijah was build for that purpose and should help alot.
Please feel free to contact us for further clarification and new findings from your tests:)
cheers,
lefteris
On Mon, Jul 20, 2009 at 6:28 PM, Roy Walter
wrote: Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM.
What is the best way to organise XML in MonetDB for rapid text searching? A run down of my recent experience might help.
I created a collection of around 450 documents (153MB approx.). I ran the following query from the command line:
collection("papers")//p[contains(., 'wind farm')]
The query time is at best 19 seconds. That's bad. (It's worse than querying a Postgres database with documents stored in the XML field type.)
So to get a reference point I loaded up the 114MB XMark document and ran this query:
doc("standard")//text[contains(., "yoke")]
The query time varies from 2 to 4 seconds. Better, but still not great.
Now, adding more RAM (and moving to 64-bit) would speed things up I hope! But hardware aside:
1. Is it better to have big documents rather than big collections?
2. Is having small collections (<10 docs) of big documents also inefficient?
Ideally I need to query collections comprising several thousand documents using 'text search' predicates. Are there other, better ways to run this type of query against a MonetDB XML database? Or should I really be using some other platform for this task?
Thanks in advance for any pointers.
-- Roy
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
This is an internal error. Just yesterday I fixed a problem which resulted in this very error. I hope that my fix also fixes this instance of the error. I'm currently working on a new bug fix release, to be released next week (if things work out with the build) which contains my fix. Roy Walter wrote:
Another problem:
Because of the difficulty I was having with collections and indexes I decided to combine my XML documents to create several large documents of approx. 100MB. Adding and querying one document worked fine. Then I tried to add a second document which produced the following error:
xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2006.xml", "debates20 06.xml", "debates") more>^Z xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates20 07.xml", "debates") more>^Z MAPI = monetdb@localhost:50000 QUERY = pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates2 007.xml", "debates") ERROR = !ERROR: GDKremovedir: rmdir(bat\DELETE_ME) failed. !OS: The directory is not empty. !OS: XQDY0062: checkpoint failed (in pf_checkpoint), query aborted. xquery>
Anyone know what this means?
-- Roy
Roy Walter wrote:
OK a few more problems and a success.
I deleted my data and started again to get around the index corruption.
I reloaded the 110MB xmark document and ran a tijah:query(). The query was completed in < 0.5 second. Good!
I then reloaded a collection of 360 of my documents. The load was fine so I ran a basic tijah:query() that took a long time. Something must be wrong so I thought I would delete and re-create the index.
Deleting the index appeared to work without error. On recreating the index I got an error. The error refers to the XML document that was used to create my collection, i.e., the offending document is not in the database. Here's the console output:
QUERY = tijah:create-ft-index() ERROR = !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: shred: cannot stat `::4417490901795001::\\nas\public\2006docs.xm l': No such file or directory !ERROR: CMDshred_url: operation failed. !ERROR: interpret_params: leftfetchjoin(param 2): evaluation error. Timer 1439.966 msec
-- Roy
Sjoerd Mullender wrote:
On Windows the .bat scripts that you use to start the server (possibly via the Start menu) specify the dbfarm directory as %APPDATA%\MonetDB4\dbfarm. APPDATA is you Application Data folder C:\Documents and Settings\<username>\Application Data (on XP).
Roy Walter wrote:
OK so this is interesting. I don't have a dbfarm directory.
I looked in monetdb.conf and noticed a couple of things.
The MonetDB installation path value appears as:
prefix=c:\documents and settings\sjoerd\my documents\src\stable\icc32\nt32
This means that neither the datadir nor the gdk_dbfarm entries can be properly processed. Might this explain why I [and others] have to start the XQuery Server from the command line and load the modules manually?
So right now I can't find my data :) What file extensions does MonetDB use?
-- Roy
Lefteris wrote:
Hi,
it looks that something got corrupted in the process. This is starting to be very usual with windows installations. Anyway, you should delete the contents of the dbfarm directory in the monetdb installation directory. This will *delete* all data in your database (i guess you dont have any important data still in cause you are testing). After you delete dbfarm, start mserver, that will populate again dbfarm with the default files. then, add again your data with the usual pf:add-doc, then create the tijah indices and then run your queries:)
If this does not work, we will have to try to "wake up" the pf/tijah people to help us.
Hope this will help,
lefteris
On Tue, Jul 21, 2009 at 10:35 AM, Roy Walter
wrote: OK, I restarted and got an error message that pointed to a problem document in a collection. I deleted the offending document and then tried to generate the default index with tijah:create-ft-index(). This failed because, apparently, the DFLT_FT_INDEX already exists.
So I thought that even though the index compilation appeared to have failed at the earlier error an index must have been created.
I tried a tijah:query(). That failed because the DFLT_FT_INDEX does not exist.
Hmm, so I tried tijah:delete-ft-index() and it too told me that DFLT_FT_INDEX does not exist.
tijah:create-ft-index() still fails with: !ERROR tj_init_collection, pftijah collection already exists: DFLT_FT_INDEX
How do I reset?
-- Roy
Lefteris wrote:
This is not expected.
Did you try to restart the server and retry?
You might also have a corrupted dbfarm or the documents didn't shred correctly to begin with. Which version of monet are you using? how did you installed it?
lefteris
On Mon, Jul 20, 2009 at 8:52 PM, Roy Walter
wrote: Hi lefteris
Well that seems to tick all the boxes.
I tried the global index creation:
tijah:create-ft-index()
and it crashed the server with:
!WARNING: readClient: unexpected end of file; discarding partial input
Hmm...
R.
Lefteris wrote:
Hi Roy,
I suggest that you try the pf/tijah module for MonetDB/XQuery.
http://dbappl.cs.utwente.nl/pftijah/
This will create specific indices for your queries to facilitate text search.
Hope this helps for now. We will also investigate were the time is spent in your case (without pf/tijah) and come back to you. How many p elements your documents have? The problem might be that because monet does not build inverted indices on text by itself, it has to visit each p element and search with the help of the pcre library. Pf/tijah was build for that purpose and should help alot.
Please feel free to contact us for further clarification and new findings from your tests:)
cheers,
lefteris
On Mon, Jul 20, 2009 at 6:28 PM, Roy Walter
wrote: Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM.
What is the best way to organise XML in MonetDB for rapid text searching? A run down of my recent experience might help.
I created a collection of around 450 documents (153MB approx.). I ran the following query from the command line:
collection("papers")//p[contains(., 'wind farm')]
The query time is at best 19 seconds. That's bad. (It's worse than querying a Postgres database with documents stored in the XML field type.)
So to get a reference point I loaded up the 114MB XMark document and ran this query:
doc("standard")//text[contains(., "yoke")]
The query time varies from 2 to 4 seconds. Better, but still not great.
Now, adding more RAM (and moving to 64-bit) would speed things up I hope! But hardware aside:
1. Is it better to have big documents rather than big collections?
2. Is having small collections (<10 docs) of big documents also inefficient?
Ideally I need to query collections comprising several thousand documents using 'text search' predicates. Are there other, better ways to run this type of query against a MonetDB XML database? Or should I really be using some other platform for this task?
Thanks in advance for any pointers.
-- Roy
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Sjoerd Mullender
OK thanks. Trying to create an index on a single 300MB document I get this error: xquery>tijah:create-ft-index() more>^Z #VirtualAlloc(00000000,161714176,MEM_COMMIT,PAGE_READWRITE): failed #GDKvmalloc(161710088) fails, try to free up space [memory in use=269395036,vir ual memory in use=1611661312] #GDKvmalloc(161710088) result [mem=220991540,vm=1554972672] #VirtualAlloc(00000000,161714176,MEM_COMMIT,PAGE_READWRITE): failed #GDKmmap(162004992) fails, try to free up space [memory in use=220991564,virtua memory in use=1554972672] #GDKmmap(162004992) result [mem=218123036,vm=1553268736] MAPI = monetdb@localhost:50000 QUERY = tijah:create-ft-index() ERROR = !ERROR: HEAPalloc: Insufficient space for HEAP of 162004992 bytes. !ERROR: CMDkdiff: operation failed. Is there a a workaround? Incidentally, I tried to install the Ubuntu version from http://monetdb.cwi.nl/downloads/Ubuntu/. After |apt-get install monetdb\* I get an error: Couldn't find package monetdb*.| -- Roy Sjoerd Mullender wrote:
This is an internal error. Just yesterday I fixed a problem which resulted in this very error. I hope that my fix also fixes this instance of the error.
I'm currently working on a new bug fix release, to be released next week (if things work out with the build) which contains my fix.
Roy Walter wrote:
Another problem:
Because of the difficulty I was having with collections and indexes I decided to combine my XML documents to create several large documents of approx. 100MB. Adding and querying one document worked fine. Then I tried to add a second document which produced the following error:
xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2006.xml", "debates20 06.xml", "debates") more>^Z xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates20 07.xml", "debates") more>^Z MAPI = monetdb@localhost:50000 QUERY = pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates2 007.xml", "debates") ERROR = !ERROR: GDKremovedir: rmdir(bat\DELETE_ME) failed. !OS: The directory is not empty. !OS: XQDY0062: checkpoint failed (in pf_checkpoint), query aborted. xquery>
Anyone know what this means?
-- Roy
Roy Walter wrote:
OK a few more problems and a success.
I deleted my data and started again to get around the index corruption.
I reloaded the 110MB xmark document and ran a tijah:query(). The query was completed in < 0.5 second. Good!
I then reloaded a collection of 360 of my documents. The load was fine so I ran a basic tijah:query() that took a long time. Something must be wrong so I thought I would delete and re-create the index.
Deleting the index appeared to work without error. On recreating the index I got an error. The error refers to the XML document that was used to create my collection, i.e., the offending document is not in the database. Here's the console output:
QUERY = tijah:create-ft-index() ERROR = !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: shred: cannot stat `::4417490901795001::\\nas\public\2006docs.xm l': No such file or directory !ERROR: CMDshred_url: operation failed. !ERROR: interpret_params: leftfetchjoin(param 2): evaluation error. Timer 1439.966 msec
-- Roy
Sjoerd Mullender wrote:
On Windows the .bat scripts that you use to start the server (possibly via the Start menu) specify the dbfarm directory as %APPDATA%\MonetDB4\dbfarm. APPDATA is you Application Data folder C:\Documents and Settings\<username>\Application Data (on XP).
Roy Walter wrote:
OK so this is interesting. I don't have a dbfarm directory.
I looked in monetdb.conf and noticed a couple of things.
The MonetDB installation path value appears as:
prefix=c:\documents and settings\sjoerd\my documents\src\stable\icc32\nt32
This means that neither the datadir nor the gdk_dbfarm entries can be properly processed. Might this explain why I [and others] have to start the XQuery Server from the command line and load the modules manually?
So right now I can't find my data :) What file extensions does MonetDB use?
-- Roy
Lefteris wrote:
Hi,
it looks that something got corrupted in the process. This is starting to be very usual with windows installations. Anyway, you should delete the contents of the dbfarm directory in the monetdb installation directory. This will *delete* all data in your database (i guess you dont have any important data still in cause you are testing). After you delete dbfarm, start mserver, that will populate again dbfarm with the default files. then, add again your data with the usual pf:add-doc, then create the tijah indices and then run your queries:)
If this does not work, we will have to try to "wake up" the pf/tijah people to help us.
Hope this will help,
lefteris
On Tue, Jul 21, 2009 at 10:35 AM, Roy Walter
wrote: > OK, I restarted and got an error message that pointed to a problem document > in a collection. I deleted the offending document and then tried to generate > the default index with tijah:create-ft-index(). This failed because, > apparently, the DFLT_FT_INDEX already exists. > > So I thought that even though the index compilation appeared to have failed > at the earlier error an index must have been created. > > I tried a tijah:query(). That failed because the DFLT_FT_INDEX does not > exist. > > Hmm, so I tried tijah:delete-ft-index() and it too told me that > DFLT_FT_INDEX does not exist. > > tijah:create-ft-index() still fails with: !ERROR tj_init_collection, pftijah > collection already exists: DFLT_FT_INDEX > > How do I reset? > > -- Roy > > > Lefteris wrote: > > This is not expected. > > Did you try to restart the server and retry? > > You might also have a corrupted dbfarm or the documents didn't shred > correctly to begin with. Which version of monet are you using? how did > you installed it? > > lefteris > > On Mon, Jul 20, 2009 at 8:52 PM, Roy Walter
> wrote: > > > Hi lefteris > > Well that seems to tick all the boxes. > > I tried the global index creation: > > tijah:create-ft-index() > > and it crashed the server with: > > !WARNING: readClient: unexpected end of file; discarding partial input > > Hmm... > > R. > > Lefteris wrote: > > > Hi Roy, > > I suggest that you try the pf/tijah module for MonetDB/XQuery. > > http://dbappl.cs.utwente.nl/pftijah/ > > This will create specific indices for your queries to facilitate text > search. > > Hope this helps for now. We will also investigate were the time is > spent in your case (without pf/tijah) and come back to you. How many p > elements your documents have? The problem might be that because monet > does not build inverted indices on text by itself, it has to visit > each p element and search with the help of the pcre library. Pf/tijah > was build for that purpose and should help alot. > > Please feel free to contact us for further clarification and new > findings from your tests:) > > cheers, > > lefteris > > On Mon, Jul 20, 2009 at 6:28 PM, Roy Walter > wrote: > > > > Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM. > > What is the best way to organise XML in MonetDB for rapid text searching? > A > run down of my recent experience might help. > > I created a collection of around 450 documents (153MB approx.). I ran the > following query from the command line: > > collection("papers")//p[contains(., 'wind farm')] > > The query time is at best 19 seconds. That's bad. (It's worse than > querying > a Postgres database with documents stored in the XML field type.) > > So to get a reference point I loaded up the 114MB XMark document and ran > this query: > > doc("standard")//text[contains(., "yoke")] > > The query time varies from 2 to 4 seconds. Better, but still not great. > > Now, adding more RAM (and moving to 64-bit) would speed things up I hope! > But hardware aside: > > 1. Is it better to have big documents rather than big collections? > > 2. Is having small collections (<10 docs) of big documents also > inefficient? > > Ideally I need to query collections comprising several thousand documents > using 'text search' predicates. Are there other, better ways to run this > type of query against a MonetDB XML database? Or should I really be using > some other platform for this task? > > Thanks in advance for any pointers. > > -- Roy > > > ------------------------------------------------------------------------------ > Enter the BlackBerry Developer Challenge > This is your chance to win up to $100,000 in prizes! For a limited time, > vendors submitting new applications to BlackBerry App World(TM) will have > the opportunity to enter the BlackBerry Developer Challenge. See full > prize > details at: http://p.sf.net/sfu/Challenge > _______________________________________________ > MonetDB-users mailing list > MonetDB-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/monetdb-users > > > > > > > > > > > ------------------------------------------------------------------------
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge ------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi,
Incidentally, I tried to install the Ubuntu version from http://monetdb.cwi.nl/downloads/Ubuntu/. After apt-get install monetdb \* I get an error: Couldn't find package monetdb*.
Works here with: deb http://monetdb.cwi.nl/downloads/Ubuntu/ jaunty monetdb Franck
On 2009-07-22 16:18, Roy Walter wrote:
OK thanks.
Trying to create an index on a single 300MB document I get this error:
xquery>tijah:create-ft-index() more>^Z #VirtualAlloc(00000000,161714176,MEM_COMMIT,PAGE_READWRITE): failed #GDKvmalloc(161710088) fails, try to free up space [memory in use=269395036,vir ual memory in use=1611661312] #GDKvmalloc(161710088) result [mem=220991540,vm=1554972672] #VirtualAlloc(00000000,161714176,MEM_COMMIT,PAGE_READWRITE): failed #GDKmmap(162004992) fails, try to free up space [memory in use=220991564,virtua memory in use=1554972672] #GDKmmap(162004992) result [mem=218123036,vm=1553268736] MAPI = monetdb@localhost:50000 QUERY = tijah:create-ft-index() ERROR = !ERROR: HEAPalloc: Insufficient space for HEAP of 162004992 bytes. !ERROR: CMDkdiff: operation failed.
Is there a a workaround?
Assuming this is the same system as the one you started this thread with, the "workaround" may have to be to get a bigger machine. In particular, you may have to upgrade to a 64 bit architecture. Since MonetDB needs to address all tables it is working on simultaneously, it needs a large enough address space. With a 32 bit architecture you reach the limit (2 GB address space) pretty quickly if you're using large documents.
Incidentally, I tried to install the Ubuntu version from http://monetdb.cwi.nl/downloads/Ubuntu/. After |apt-get install monetdb\* I get an error: Couldn't find package monetdb*.|
Last time I tried it it worked, but it's been a while since I tried it. I'll try it again, as soon as I'm near a Ubuntu virtual machine. By the way, you *are* trying it for Jaunty?
-- Roy
Sjoerd Mullender wrote:
This is an internal error. Just yesterday I fixed a problem which resulted in this very error. I hope that my fix also fixes this instance of the error.
I'm currently working on a new bug fix release, to be released next week (if things work out with the build) which contains my fix.
Roy Walter wrote:
Another problem:
Because of the difficulty I was having with collections and indexes I decided to combine my XML documents to create several large documents of approx. 100MB. Adding and querying one document worked fine. Then I tried to add a second document which produced the following error:
xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2006.xml", "debates20 06.xml", "debates") more>^Z xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates20 07.xml", "debates") more>^Z MAPI = monetdb@localhost:50000 QUERY = pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates2 007.xml", "debates") ERROR = !ERROR: GDKremovedir: rmdir(bat\DELETE_ME) failed. !OS: The directory is not empty. !OS: XQDY0062: checkpoint failed (in pf_checkpoint), query aborted. xquery>
Anyone know what this means?
-- Roy
Roy Walter wrote:
OK a few more problems and a success.
I deleted my data and started again to get around the index corruption.
I reloaded the 110MB xmark document and ran a tijah:query(). The query was completed in< 0.5 second. Good!
I then reloaded a collection of 360 of my documents. The load was fine so I ran a basic tijah:query() that took a long time. Something must be wrong so I thought I would delete and re-create the index.
Deleting the index appeared to work without error. On recreating the index I got an error. The error refers to the XML document that was used to create my collection, i.e., the offending document is not in the database. Here's the console output:
QUERY = tijah:create-ft-index() ERROR = !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: shred: cannot stat `::4417490901795001::\\nas\public\2006docs.xm l': No such file or directory !ERROR: CMDshred_url: operation failed. !ERROR: interpret_params: leftfetchjoin(param 2): evaluation error. Timer 1439.966 msec
-- Roy
Sjoerd Mullender wrote:
On Windows the .bat scripts that you use to start the server (possibly via the Start menu) specify the dbfarm directory as %APPDATA%\MonetDB4\dbfarm. APPDATA is you Application Data folder C:\Documents and Settings\<username>\Application Data (on XP).
Roy Walter wrote:
OK so this is interesting. I don't have a dbfarm directory.
I looked in monetdb.conf and noticed a couple of things.
The MonetDB installation path value appears as:
prefix=c:\documents and settings\sjoerd\my documents\src\stable\icc32\nt32
This means that neither the datadir nor the gdk_dbfarm entries can be properly processed. Might this explain why I [and others] have to start the XQuery Server from the command line and load the modules manually?
So right now I can't find my data :) What file extensions does MonetDB use?
-- Roy
Lefteris wrote:
> Hi, > > it looks that something got corrupted in the process. This is starting > to be very usual with windows installations. Anyway, you should delete > the contents of the dbfarm directory in the monetdb installation > directory. This will *delete* all data in your database (i guess you > dont have any important data still in cause you are testing). After > you delete dbfarm, start mserver, that will populate again dbfarm with > the default files. then, add again your data with the usual > pf:add-doc, then create the tijah indices and then run your queries:) > > If this does not work, we will have to try to "wake up" the pf/tijah > people to help us. > > Hope this will help, > > lefteris > > On Tue, Jul 21, 2009 at 10:35 AM, Roy Walter
wrote: > > > >> OK, I restarted and got an error message that pointed to a problem document >> in a collection. I deleted the offending document and then tried to generate >> the default index with tijah:create-ft-index(). This failed because, >> apparently, the DFLT_FT_INDEX already exists. >> >> So I thought that even though the index compilation appeared to have failed >> at the earlier error an index must have been created. >> >> I tried a tijah:query(). That failed because the DFLT_FT_INDEX does not >> exist. >> >> Hmm, so I tried tijah:delete-ft-index() and it too told me that >> DFLT_FT_INDEX does not exist. >> >> tijah:create-ft-index() still fails with: !ERROR tj_init_collection, pftijah >> collection already exists: DFLT_FT_INDEX >> >> How do I reset? >> >> -- Roy >> >> >> Lefteris wrote: >> >> This is not expected. >> >> Did you try to restart the server and retry? >> >> You might also have a corrupted dbfarm or the documents didn't shred >> correctly to begin with. Which version of monet are you using? how did >> you installed it? >> >> lefteris >> >> On Mon, Jul 20, 2009 at 8:52 PM, Roy Walter >> wrote: >> >> >> Hi lefteris >> >> Well that seems to tick all the boxes. >> >> I tried the global index creation: >> >> tijah:create-ft-index() >> >> and it crashed the server with: >> >> !WARNING: readClient: unexpected end of file; discarding partial input >> >> Hmm... >> >> R. >> >> Lefteris wrote: >> >> >> Hi Roy, >> >> I suggest that you try the pf/tijah module for MonetDB/XQuery. >> >> http://dbappl.cs.utwente.nl/pftijah/ >> >> This will create specific indices for your queries to facilitate text >> search. >> >> Hope this helps for now. We will also investigate were the time is >> spent in your case (without pf/tijah) and come back to you. How many p >> elements your documents have? The problem might be that because monet >> does not build inverted indices on text by itself, it has to visit >> each p element and search with the help of the pcre library. Pf/tijah >> was build for that purpose and should help alot. >> >> Please feel free to contact us for further clarification and new >> findings from your tests:) >> >> cheers, >> >> lefteris >> >> On Mon, Jul 20, 2009 at 6:28 PM, Roy Walter >> wrote: >> >> >> >> Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with 1GB of RAM. >> >> What is the best way to organise XML in MonetDB for rapid text searching? >> A >> run down of my recent experience might help. >> >> I created a collection of around 450 documents (153MB approx.). I ran the >> following query from the command line: >> >> collection("papers")//p[contains(., 'wind farm')] >> >> The query time is at best 19 seconds. That's bad. (It's worse than >> querying >> a Postgres database with documents stored in the XML field type.) >> >> So to get a reference point I loaded up the 114MB XMark document and ran >> this query: >> >> doc("standard")//text[contains(., "yoke")] >> >> The query time varies from 2 to 4 seconds. Better, but still not great. >> >> Now, adding more RAM (and moving to 64-bit) would speed things up I hope! >> But hardware aside: >> >> 1. Is it better to have big documents rather than big collections? >> >> 2. Is having small collections (<10 docs) of big documents also >> inefficient? >> >> Ideally I need to query collections comprising several thousand documents >> using 'text search' predicates. Are there other, better ways to run this >> type of query against a MonetDB XML database? Or should I really be using >> some other platform for this task? >> >> Thanks in advance for any pointers. >> >> -- Roy >> >> >> ------------------------------------------------------------------------------ >> Enter the BlackBerry Developer Challenge >> This is your chance to win up to $100,000 in prizes! For a limited time, >> vendors submitting new applications to BlackBerry App World(TM) will have >> the opportunity to enter the BlackBerry Developer Challenge. See full >> prize >> details at:http://p.sf.net/sfu/Challenge >> _______________________________________________ >> MonetDB-users mailing list >> MonetDB-users@lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/monetdb-users >> >> >> >> >> >> >> >> >> >> >> > > > ------------------------------------------------------------------------ ------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at:http://p.sf.net/sfu/Challenge
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at:http://p.sf.net/sfu/Challenge ------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Sjoerd Mullender
Yes, a hardware upgrade is on the cards. I was able to get around the error as follows: 1. Load first document using mapi client 2. Restart server 3. Load second document into same collection using mapi client. (I'm guessing that restarting the server removes the temporary storage created during the first document load hence getting around the error I encountered when trying to load two documents one after the other.) 4. Run tijah:create-ft-query() and it works. For me, running tijah:create-ft-query() on a collection of two documents worked (2 x 100MB), whereas running tijah:create-ft-query on a single document (1 x 200MB) caused the below error. So for now I have enough data to work with :) [Do we top or bottom post here?] -- Roy Sjoerd Mullender wrote:
On 2009-07-22 16:18, Roy Walter wrote:
OK thanks.
Trying to create an index on a single 300MB document I get this error:
xquery>tijah:create-ft-index() more>^Z #VirtualAlloc(00000000,161714176,MEM_COMMIT,PAGE_READWRITE): failed #GDKvmalloc(161710088) fails, try to free up space [memory in use=269395036,vir ual memory in use=1611661312] #GDKvmalloc(161710088) result [mem=220991540,vm=1554972672] #VirtualAlloc(00000000,161714176,MEM_COMMIT,PAGE_READWRITE): failed #GDKmmap(162004992) fails, try to free up space [memory in use=220991564,virtua memory in use=1554972672] #GDKmmap(162004992) result [mem=218123036,vm=1553268736] MAPI = monetdb@localhost:50000 QUERY = tijah:create-ft-index() ERROR = !ERROR: HEAPalloc: Insufficient space for HEAP of 162004992 bytes. !ERROR: CMDkdiff: operation failed.
Is there a a workaround?
Assuming this is the same system as the one you started this thread with, the "workaround" may have to be to get a bigger machine. In particular, you may have to upgrade to a 64 bit architecture.
Since MonetDB needs to address all tables it is working on simultaneously, it needs a large enough address space. With a 32 bit architecture you reach the limit (2 GB address space) pretty quickly if you're using large documents.
Incidentally, I tried to install the Ubuntu version from http://monetdb.cwi.nl/downloads/Ubuntu/. After |apt-get install monetdb\* I get an error: Couldn't find package monetdb*.|
Last time I tried it it worked, but it's been a while since I tried it. I'll try it again, as soon as I'm near a Ubuntu virtual machine.
By the way, you *are* trying it for Jaunty?
-- Roy
Sjoerd Mullender wrote:
This is an internal error. Just yesterday I fixed a problem which resulted in this very error. I hope that my fix also fixes this instance of the error.
I'm currently working on a new bug fix release, to be released next week (if things work out with the build) which contains my fix.
Roy Walter wrote:
Another problem:
Because of the difficulty I was having with collections and indexes I decided to combine my XML documents to create several large documents of approx. 100MB. Adding and querying one document worked fine. Then I tried to add a second document which produced the following error:
xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2006.xml", "debates20 06.xml", "debates") more>^Z
xquery>pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates20 07.xml", "debates") more>^Z MAPI = monetdb@localhost:50000 QUERY = pf:add-doc("http://dev.govmonitor.com/export/debates2007.xml", "debates2 007.xml", "debates") ERROR = !ERROR: GDKremovedir: rmdir(bat\DELETE_ME) failed. !OS: The directory is not empty. !OS: XQDY0062: checkpoint failed (in pf_checkpoint), query aborted. xquery>
Anyone know what this means?
-- Roy
Roy Walter wrote:
OK a few more problems and a success.
I deleted my data and started again to get around the index corruption.
I reloaded the 110MB xmark document and ran a tijah:query(). The query was completed in< 0.5 second. Good!
I then reloaded a collection of 360 of my documents. The load was fine so I ran a basic tijah:query() that took a long time. Something must be wrong so I thought I would delete and re-create the index.
Deleting the index appeared to work without error. On recreating the index I got an error. The error refers to the XML document that was used to create my collection, i.e., the offending document is not in the database. Here's the console output:
QUERY = tijah:create-ft-index() ERROR = !ERROR: [shred_url]: 1 times inserted nil due to errors at tuples 0@0. !ERROR: [shred_url]: first error was: !ERROR: shred: cannot stat `::4417490901795001::\\nas\public\2006docs.xm l': No such file or directory !ERROR: CMDshred_url: operation failed. !ERROR: interpret_params: leftfetchjoin(param 2): evaluation error. Timer 1439.966 msec
-- Roy
Sjoerd Mullender wrote:
On Windows the .bat scripts that you use to start the server (possibly via the Start menu) specify the dbfarm directory as %APPDATA%\MonetDB4\dbfarm. APPDATA is you Application Data folder C:\Documents and Settings\<username>\Application Data (on XP).
Roy Walter wrote:
> OK so this is interesting. I don't have a dbfarm directory. > > I looked in monetdb.conf and noticed a couple of things. > > The MonetDB installation path value appears as: > > prefix=c:\documents and settings\sjoerd\my > documents\src\stable\icc32\nt32 > > This means that neither the datadir nor the gdk_dbfarm entries > can be > properly processed. Might this explain why I [and others] have > to start > the XQuery Server from the command line and load the modules > manually? > > So right now I can't find my data :) What file extensions does > MonetDB use? > > -- Roy > > Lefteris wrote: > > >> Hi, >> >> it looks that something got corrupted in the process. This is >> starting >> to be very usual with windows installations. Anyway, you should >> delete >> the contents of the dbfarm directory in the monetdb installation >> directory. This will *delete* all data in your database (i >> guess you >> dont have any important data still in cause you are testing). >> After >> you delete dbfarm, start mserver, that will populate again >> dbfarm with >> the default files. then, add again your data with the usual >> pf:add-doc, then create the tijah indices and then run your >> queries:) >> >> If this does not work, we will have to try to "wake up" the >> pf/tijah >> people to help us. >> >> Hope this will help, >> >> lefteris >> >> On Tue, Jul 21, 2009 at 10:35 AM, Roy >> Walter
wrote: >> >> >> >>> OK, I restarted and got an error message that pointed to a >>> problem document >>> in a collection. I deleted the offending document and then >>> tried to generate >>> the default index with tijah:create-ft-index(). This failed >>> because, >>> apparently, the DFLT_FT_INDEX already exists. >>> >>> So I thought that even though the index compilation appeared >>> to have failed >>> at the earlier error an index must have been created. >>> >>> I tried a tijah:query(). That failed because the DFLT_FT_INDEX >>> does not >>> exist. >>> >>> Hmm, so I tried tijah:delete-ft-index() and it too told me that >>> DFLT_FT_INDEX does not exist. >>> >>> tijah:create-ft-index() still fails with: !ERROR >>> tj_init_collection, pftijah >>> collection already exists: DFLT_FT_INDEX >>> >>> How do I reset? >>> >>> -- Roy >>> >>> >>> Lefteris wrote: >>> >>> This is not expected. >>> >>> Did you try to restart the server and retry? >>> >>> You might also have a corrupted dbfarm or the documents didn't >>> shred >>> correctly to begin with. Which version of monet are you using? >>> how did >>> you installed it? >>> >>> lefteris >>> >>> On Mon, Jul 20, 2009 at 8:52 PM, Roy >>> Walter >>> wrote: >>> >>> >>> Hi lefteris >>> >>> Well that seems to tick all the boxes. >>> >>> I tried the global index creation: >>> >>> tijah:create-ft-index() >>> >>> and it crashed the server with: >>> >>> !WARNING: readClient: unexpected end of file; discarding >>> partial input >>> >>> Hmm... >>> >>> R. >>> >>> Lefteris wrote: >>> >>> >>> Hi Roy, >>> >>> I suggest that you try the pf/tijah module for MonetDB/XQuery. >>> >>> http://dbappl.cs.utwente.nl/pftijah/ >>> >>> This will create specific indices for your queries to >>> facilitate text >>> search. >>> >>> Hope this helps for now. We will also investigate were the >>> time is >>> spent in your case (without pf/tijah) and come back to you. >>> How many p >>> elements your documents have? The problem might be that >>> because monet >>> does not build inverted indices on text by itself, it has to >>> visit >>> each p element and search with the help of the pcre library. >>> Pf/tijah >>> was build for that purpose and should help alot. >>> >>> Please feel free to contact us for further clarification and new >>> findings from your tests:) >>> >>> cheers, >>> >>> lefteris >>> >>> On Mon, Jul 20, 2009 at 6:28 PM, Roy >>> Walter >>> wrote: >>> >>> >>> >>> Running MonetDB/XQuery on a 2.6GHz 32-bit Windows XP box with >>> 1GB of RAM. >>> >>> What is the best way to organise XML in MonetDB for rapid text >>> searching? >>> A >>> run down of my recent experience might help. >>> >>> I created a collection of around 450 documents (153MB >>> approx.). I ran the >>> following query from the command line: >>> >>> collection("papers")//p[contains(., 'wind farm')] >>> >>> The query time is at best 19 seconds. That's bad. (It's worse >>> than >>> querying >>> a Postgres database with documents stored in the XML field type.) >>> >>> So to get a reference point I loaded up the 114MB XMark >>> document and ran >>> this query: >>> >>> doc("standard")//text[contains(., "yoke")] >>> >>> The query time varies from 2 to 4 seconds. Better, but still >>> not great. >>> >>> Now, adding more RAM (and moving to 64-bit) would speed things >>> up I hope! >>> But hardware aside: >>> >>> 1. Is it better to have big documents rather than big >>> collections? >>> >>> 2. Is having small collections (<10 docs) of big documents also >>> inefficient? >>> >>> Ideally I need to query collections comprising several >>> thousand documents >>> using 'text search' predicates. Are there other, better ways >>> to run this >>> type of query against a MonetDB XML database? Or should I >>> really be using >>> some other platform for this task? >>> >>> Thanks in advance for any pointers. >>> >>> -- Roy >>> >>> >>> ------------------------------------------------------------------------------ >>> >>> Enter the BlackBerry Developer Challenge >>> This is your chance to win up to $100,000 in prizes! For a >>> limited time, >>> vendors submitting new applications to BlackBerry App >>> World(TM) will have >>> the opportunity to enter the BlackBerry Developer Challenge. >>> See full >>> prize >>> details at:http://p.sf.net/sfu/Challenge >>> _______________________________________________ >>> MonetDB-users mailing list >>> MonetDB-users@lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/monetdb-users >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >> >> >> > ------------------------------------------------------------------------ > > > ------------------------------------------------------------------------------ > > Enter the BlackBerry Developer Challenge > This is your chance to win up to $100,000 in prizes! For a > limited time, > vendors submitting new applications to BlackBerry App World(TM) > will have > the opportunity to enter the BlackBerry Developer Challenge. See > full prize > details at:http://p.sf.net/sfu/Challenge > > > ------------------------------------------------------------------------ > > > _______________________________________________ > MonetDB-users mailing list > MonetDB-users@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/monetdb-users > > ------------------------------------------------------------------------
------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at:http://p.sf.net/sfu/Challenge ------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
------------------------------------------------------------------------------
------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (3)
-
Franck Routier
-
Roy Walter
-
Sjoerd Mullender