As a further experiment I shredded the individual documents used to make the large composite into the db on the Linux box. The shredding completed without error, adding 532 documents to the database. After creating the tijah index I ran the following query: tijah:queryall("//p[about(., 'drug misuse')]") and it returned a number of correct results. (The same sequence, i.e., shredding->indexing->querying, on a Windows box produced no results.) Not all results were printed to the console, however, as the query produced the following error: !ERROR: XML Generation: tmpr_1231 BAT does not have a 120 head. ... ERROR = !ERROR: !ERROR: xquery_print_result_main: operation failed. It's possible that the error is memory related as my Jaunty installation is running under VirtualBox. -- Roy Roy Walter wrote:
It's a strange one. I have been experimenting a little.
I was working with a single large [composite] document (176MB) that showed the problem. So I took a subset of files containing the search terms and created a smaller composite (8MB).
The small composite works correctly, as far as I can tell. So instead of a large composite I shredded the all the individual files used to create the large composite to see if it makes any difference. It doesn't. The problem persists.
When working with the small composite I noticed too that the query produces more [correct] results than when working with the large composite. I noticed too that the large composite returns incorrect results.
For example, when searching for the phrase 'drug treatment', querying the large composite document returned hits containing 'drug' AND 'treatment' and only one hit containing the sought phrase. Searching the small composite returned 20 correct results for the sought phrase. (To clarify: the large and small composites contain the same documents.)
I don't know if it's related, but I installed MonetDB4/XQuery on an Ubuntu box and I cannot shred the large composite into the database owing to a parsing error. This error, clearly, is not occurring under Windows. I will try shredding the small composite on the Linux box.
(Incidentally, if files are added to the database in bulk using UNC pathnames, e.g., <doc path="\\nas\public\export\" name="myfile.xml"/> the files are shredded into the db but tijah:create-ft-index() thereafter fails with a shred error because of the pathname. I guess it's expecting Unix style pathnames. Although I question why the tijah indexer cares about the pathname since the documents are in the db. I will add this to the bug list.)
-- Roy
Henning Rode wrote:
sounds clearly like a bug. could you send me a short example document that i can index and experiment with to find the bug?
-henning
Roy Walter wrote:
The following query:
tijah:queryall("//p[about(., 'drug treatment')]")
returns a number of results from my sample document. Some of these results contain the phrase "drug misuse". The following query:
tijah:queryall("//p[about(., 'drug misuse')]")
returns zero results from the sample document, which is clearly incorrect since some results returned by the first query should be returned by the second.
I have deleted and reloaded the sample document and I have recreated the tijah index and the result is consistently incorrect. Is this a bug?
-- Roy
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.66/2325 - Release Date: 08/25/09 06:08:00
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users ------------------------------------------------------------------------
No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.67/2326 - Release Date: 08/25/09 18:07:00