It's a strange one. I have been experimenting a little. I was working with a single large [composite] document (176MB) that showed the problem. So I took a subset of files containing the search terms and created a smaller composite (8MB). The small composite works correctly, as far as I can tell. So instead of a large composite I shredded the all the individual files used to create the large composite to see if it makes any difference. It doesn't. The problem persists. When working with the small composite I noticed too that the query produces more [correct] results than when working with the large composite. I noticed too that the large composite returns incorrect results. For example, when searching for the phrase 'drug treatment', querying the large composite document returned hits containing 'drug' AND 'treatment' and only one hit containing the sought phrase. Searching the small composite returned 20 correct results for the sought phrase. (To clarify: the large and small composites contain the same documents.) I don't know if it's related, but I installed MonetDB4/XQuery on an Ubuntu box and I cannot shred the large composite into the database owing to a parsing error. This error, clearly, is not occurring under Windows. I will try shredding the small composite on the Linux box. (Incidentally, if files are added to the database in bulk using UNC pathnames, e.g., <doc path="\\nas\public\export\" name="myfile.xml"/> the files are shredded into the db but tijah:create-ft-index() thereafter fails with a shred error because of the pathname. I guess it's expecting Unix style pathnames. Although I question why the tijah indexer cares about the pathname since the documents are in the db. I will add this to the bug list.) -- Roy Henning Rode wrote:
sounds clearly like a bug. could you send me a short example document that i can index and experiment with to find the bug?
-henning
Roy Walter wrote:
The following query:
tijah:queryall("//p[about(., 'drug treatment')]")
returns a number of results from my sample document. Some of these results contain the phrase "drug misuse". The following query:
tijah:queryall("//p[about(., 'drug misuse')]")
returns zero results from the sample document, which is clearly incorrect since some results returned by the first query should be returned by the second.
I have deleted and reloaded the sample document and I have recreated the tijah index and the result is consistently incorrect. Is this a bug?
-- Roy
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.66/2325 - Release Date: 08/25/09 06:08:00