[MonetDB-users] PF/Tijah Question: BM25F?

I was wondering whether it would be possible to use BM25F? http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf Anyone else tried this before with an XQuery? junte

hej Junte, you mean you would like to use/test the BM25F retrieval model? If you set the option <TijahOptions ir-model="OKAPI"/> PF/Tijah uses the BM25 retrieval model. I cannot say at the moment if it is an implementation of BM25F. You need to ask Djoerd in that case, since he implemented the OKAPI retrieval model. best -henning
I was wondering whether it would be possible to use BM25F? http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf
Anyone else tried this before with an XQuery?
junte
------------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Hi Henning,
Yeah, I got reviewers who want me to use BM25F since I cannot beat
BM25 with the other models that PF/Tijah offers.
I have an xquery that does some relevance boosting after the results
are returned, so I guess BM25F can be used by post-processing the
runs.
I was also wondering whether anyone tried pseudo-relevance feedback
with PF/Tijah?
junte
On Wed, Jun 2, 2010 at 12:26 PM,
hej Junte,
you mean you would like to use/test the BM25F retrieval model? If you set the option <TijahOptions ir-model="OKAPI"/> PF/Tijah uses the BM25 retrieval model. I cannot say at the moment if it is an implementation of BM25F. You need to ask Djoerd in that case, since he implemented the OKAPI retrieval model.
best -henning
I was wondering whether it would be possible to use BM25F? http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf
Anyone else tried this before with an XQuery?
junte
------------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Dear Junte and Henning, The model "OKAPI" implements traditional BM25 weighting, not BM25F, but it will behave a bit as BM25F if you do something like: tijah:queryall("//document[about(., boo) or about(.//title, boo)]") i.e., this ranks documents with 'boo' in the title (-field) higher. PF/Tijah being an XML search system, we do not like to talk about fields, but about elements ;-) But seriously, fielded search as done in BM25F and XML element search done by PF/Tijah do not match very well. The scores in PF/Tijah need to be combined in an algebraic way. You might be able to do something like this (including weighting of fields) directly in our score region algebra in MIL, but because BM25 function is non-linear in element term frequency, this will not be trivial, or maybe even impossible in our framework... ir-model="PRF" implements pseudo relevance feedback following the relevance model paper by Lavrenko & Croft as SIGIR 2001. This option is not documented, which usually means it is not heavily tested as well. I checked it in about two years ago without test cases (woops, sorry) and haven't used it since. I hope this helps, please let me know if you need help implementing new features in PF/Tijah. Best, Djoerd. Henning.Rode@cwi.nl schreef:
hej Junte,
you mean you would like to use/test the BM25F retrieval model? If you set the option <TijahOptions ir-model="OKAPI"/> PF/Tijah uses the BM25 retrieval model. I cannot say at the moment if it is an implementation of BM25F. You need to ask Djoerd in that case, since he implemented the OKAPI retrieval model.
best -henning
I was wondering whether it would be possible to use BM25F? http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf
Anyone else tried this before with an XQuery?
junte
------------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Hi Djoerd,
Hartelijk dank voor je uitleg!
I think I leave BM25F aside for now, perhaps I'll dust off Lemur for
my retrieval experiments. ;)
I will test "PRF"! :)
Cheers,
junte
On Thu, Jun 3, 2010 at 10:28 AM, Djoerd Hiemstra
Dear Junte and Henning,
The model "OKAPI" implements traditional BM25 weighting, not BM25F, but it will behave a bit as BM25F if you do something like: tijah:queryall("//document[about(., boo) or about(.//title, boo)]") i.e., this ranks documents with 'boo' in the title (-field) higher. PF/Tijah being an XML search system, we do not like to talk about fields, but about elements ;-) But seriously, fielded search as done in BM25F and XML element search done by PF/Tijah do not match very well. The scores in PF/Tijah need to be combined in an algebraic way. You might be able to do something like this (including weighting of fields) directly in our score region algebra in MIL, but because BM25 function is non-linear in element term frequency, this will not be trivial, or maybe even impossible in our framework...
ir-model="PRF" implements pseudo relevance feedback following the relevance model paper by Lavrenko & Croft as SIGIR 2001. This option is not documented, which usually means it is not heavily tested as well. I checked it in about two years ago without test cases (woops, sorry) and haven't used it since.
I hope this helps, please let me know if you need help implementing new features in PF/Tijah.
Best, Djoerd.
Henning.Rode@cwi.nl schreef:
hej Junte,
you mean you would like to use/test the BM25F retrieval model? If you set the option <TijahOptions ir-model="OKAPI"/> PF/Tijah uses the BM25 retrieval model. I cannot say at the moment if it is an implementation of BM25F. You need to ask Djoerd in that case, since he implemented the OKAPI retrieval model.
best -henning
I was wondering whether it would be possible to use BM25F? http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf
Anyone else tried this before with an XQuery?
junte
------------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Question for the PF/Tijah developers:
So I tried the ir-model "PRF", but it did not seem to be working, and
reverts to the standard model.
I discovered it is not listed in the /pftijah/tjc/tj_main.c file
I added it there. Compiled the file.
When I run the stuff, I get this error:
ERROR = !ERROR: interpret: no matching MIL operator to
'tj_containing_query_unnest_nid_term_PRF(BAT[oid,oid], BAT[oid,BAT])'.
!ERROR: CMDtijah_query: execute MIL failed.
!ERROR: CMDtijah_query: operation failed.
So I am stuck at this point.
I'd like to use it. I know this feature is not officially support, but
it would be really cool to try it out.
Could someone help me out?
junte
ps: about http://dbappl.cs.utwente.nl/pftijah/Documentation/ForDevelopers
modifying the pftijah.mx file and then do a 'make install' does not
change the pftijah.mil file.
On Thu, Jun 3, 2010 at 11:32 AM, jz@uva
Hi Djoerd,
Hartelijk dank voor je uitleg!
I think I leave BM25F aside for now, perhaps I'll dust off Lemur for my retrieval experiments. ;)
I will test "PRF"! :)
Cheers, junte
On Thu, Jun 3, 2010 at 10:28 AM, Djoerd Hiemstra
wrote: Dear Junte and Henning,
The model "OKAPI" implements traditional BM25 weighting, not BM25F, but it will behave a bit as BM25F if you do something like: tijah:queryall("//document[about(., boo) or about(.//title, boo)]") i.e., this ranks documents with 'boo' in the title (-field) higher. PF/Tijah being an XML search system, we do not like to talk about fields, but about elements ;-) But seriously, fielded search as done in BM25F and XML element search done by PF/Tijah do not match very well. The scores in PF/Tijah need to be combined in an algebraic way. You might be able to do something like this (including weighting of fields) directly in our score region algebra in MIL, but because BM25 function is non-linear in element term frequency, this will not be trivial, or maybe even impossible in our framework...
ir-model="PRF" implements pseudo relevance feedback following the relevance model paper by Lavrenko & Croft as SIGIR 2001. This option is not documented, which usually means it is not heavily tested as well. I checked it in about two years ago without test cases (woops, sorry) and haven't used it since.
I hope this helps, please let me know if you need help implementing new features in PF/Tijah.
Best, Djoerd.
Henning.Rode@cwi.nl schreef:
hej Junte,
you mean you would like to use/test the BM25F retrieval model? If you set the option <TijahOptions ir-model="OKAPI"/> PF/Tijah uses the BM25 retrieval model. I cannot say at the moment if it is an implementation of BM25F. You need to ask Djoerd in that case, since he implemented the OKAPI retrieval model.
best -henning
I was wondering whether it would be possible to use BM25F? http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf
Anyone else tried this before with an XQuery?
junte
------------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

On Sat, Jun 05, 2010 at 12:53:52PM +0200, jz@uva wrote:
Question for the PF/Tijah developers:
So I tried the ir-model "PRF", but it did not seem to be working, and reverts to the standard model. I discovered it is not listed in the /pftijah/tjc/tj_main.c file I added it there. Compiled the file.
When I run the stuff, I get this error:
ERROR = !ERROR: interpret: no matching MIL operator to 'tj_containing_query_unnest_nid_term_PRF(BAT[oid,oid], BAT[oid,BAT])'. !ERROR: CMDtijah_query: execute MIL failed. !ERROR: CMDtijah_query: operation failed.
So I am stuck at this point.
I'd like to use it. I know this feature is not officially support, but it would be really cool to try it out. Could someone help me out?
junte
ps: about http://dbappl.cs.utwente.nl/pftijah/Documentation/ForDevelopers modifying the pftijah.mx file and then do a 'make install' does not change the pftijah.mil file.
Did you modify a @mil section in pftijah.mx? Do you build from a hg clone (or CVS checkout) or from a source tarball? Stefan
On Thu, Jun 3, 2010 at 11:32 AM, jz@uva
wrote: Hi Djoerd,
Hartelijk dank voor je uitleg!
I think I leave BM25F aside for now, perhaps I'll dust off Lemur for my retrieval experiments. ;)
I will test "PRF"! :)
Cheers, junte
On Thu, Jun 3, 2010 at 10:28 AM, Djoerd Hiemstra
wrote: Dear Junte and Henning,
The model "OKAPI" implements traditional BM25 weighting, not BM25F, but it will behave a bit as BM25F if you do something like: tijah:queryall("//document[about(., boo) or about(.//title, boo)]") i.e., this ranks documents with 'boo' in the title (-field) higher. PF/Tijah being an XML search system, we do not like to talk about fields, but about elements ;-) But seriously, fielded search as done in BM25F and XML element search done by PF/Tijah do not match very well. The scores in PF/Tijah need to be combined in an algebraic way. You might be able to do something like this (including weighting of fields) directly in our score region algebra in MIL, but because BM25 function is non-linear in element term frequency, this will not be trivial, or maybe even impossible in our framework...
ir-model="PRF" implements pseudo relevance feedback following the relevance model paper by Lavrenko & Croft as SIGIR 2001. This option is not documented, which usually means it is not heavily tested as well. I checked it in about two years ago without test cases (woops, sorry) and haven't used it since.
I hope this helps, please let me know if you need help implementing new features in PF/Tijah.
Best, Djoerd.
Henning.Rode@cwi.nl schreef:
hej Junte,
you mean you would like to use/test the BM25F retrieval model? If you set the option <TijahOptions ir-model="OKAPI"/> PF/Tijah uses the BM25 retrieval model. I cannot say at the moment if it is an implementation of BM25F. You need to ask Djoerd in that case, since he implemented the OKAPI retrieval model.
best -henning
I was wondering whether it would be possible to use BM25F? http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf
Anyone else tried this before with an XQuery?
junte
------------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |

On Sun, Jun 6, 2010 at 4:31 PM, Stefan Manegold
ps: about http://dbappl.cs.utwente.nl/pftijah/Documentation/ForDevelopers modifying the pftijah.mx file and then do a 'make install' does not change the pftijah.mil file.
Did you modify a @mil section in pftijah.mx?
Do you build from a hg clone (or CVS checkout) or from a source tarball?
Stefan
It is line 4994, but I am not sure whether it is a @mil section. I did a CVS checkout. I am still using the aug 2009 version. junte
participants (4)
-
Djoerd Hiemstra
-
Henning.Rode@cwi.nl
-
jz@uva
-
Stefan Manegold