It's a strange one. I have been experimenting a little.
I was working with a single large [composite] document (176MB) that
showed the problem. So I took a subset of files containing the search
terms and created a smaller composite (8MB).
The small composite works correctly, as far as I can tell. So instead of
a large composite I shredded the all the individual files used to create
the large composite to see if it makes any difference. It doesn't. The
problem persists.
When working with the small composite I noticed too that the query
produces more [correct] results than when working with the large
composite. I noticed too that the large composite returns incorrect results.
For example, when searching for the phrase 'drug treatment', querying
the large composite document returned hits containing 'drug' AND
'treatment' and only one hit containing the sought phrase. Searching the
small composite returned 20 correct results for the sought phrase. (To
clarify: the large and small composites contain the same documents.)
I don't know if it's related, but I installed MonetDB4/XQuery on an
Ubuntu box and I cannot shred the large composite into the database
owing to a parsing error. This error, clearly, is not occurring under
Windows. I will try shredding the small composite on the Linux box.
(Incidentally, if files are added to the database in bulk using UNC
pathnames, e.g., <doc path="\\nas\public\export\" name="myfile.xml"/>
the files are shredded into the db but tijah:create-ft-index()
thereafter fails with a shred error because of the pathname. I guess
it's expecting Unix style pathnames. Although I question why the tijah
indexer cares about the pathname since the documents are in the db. I
will add this to the bug list.)
-- Roy
Henning Rode wrote:
sounds clearly like a bug. could you send me a short example document
that i can index and experiment with to find the bug?
-henning
Roy Walter wrote:
The following query:
tijah:queryall("//p[about(., 'drug treatment')]")
returns a number of results from my sample document. Some of these
results contain the phrase "drug misuse". The following query:
tijah:queryall("//p[about(., 'drug misuse')]")
returns zero results from the sample document, which is clearly
incorrect since some results returned by the first query should be
returned by the second.
I have deleted and reloaded the sample document and I have recreated
the tijah index and the result is consistently incorrect. Is this a bug?
-- Roy
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008
30-Day trial. Simplify your report design, integration and deployment
- and focus on what you do best, core application coding. Discover
what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.66/2325 - Release Date: 08/25/09 06:08:00
------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now. http://p.sf.net/sfu/bobj-july
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.67/2326 - Release Date: 08/25/09 18:07:00