[MonetDB-users] Xquery Standoff Annotation Success Stories? GATE too?
Hello, I am looking for information and experiences with monetdb's standoff annotation extension for Xquery. In particular, I'm looking for a way to more flexibly and conveniently search through documents I've manually annotated for semantic content using GATE (General Architecture for Text Engineering), which has its own standoff format. Has anyone had success in querying GATE documents using monetdb xquery with the standoff extension? If not GATE, what are people using to produce the standoff notation? thank you, -david David Epstein PhD Candidate Urban & Regional Planning University of Michigan USA
Hello David,
Thanks for your interest in MonetDB/XQuery.
First of all, I am not aware of success stories about querying GATE
annotations, but then again, I am also not aware of any attempts to
query GATE annotations, and I think I probably would have known if
there were any.
The stand-off axes supported in MonetDB/XQuery are fairly generic in
the way that they allow for querying over a separate linear dimension
(whether they represent timestamps, positions in text, or
byte-offsets). The stand-off extensions that MonetDB/XQuery provides
merely provide convenient access to query for overlap and inclusion
for ranges in this dimension (/select-wide::* and /select-narrow::*).
Both operators can be easily written down in plain XQuery (would look
something like: for $other in $candidates for $item in $source where
$item/@start > $other/@start and $item/@end < $other/@end return
$item).
The benefit of the stand-off extensions in MonetDB/XQuery is two-fold:
convenient notation, and optimized processing (special indices are
created such that querying GB's of stand-off XML can be done in
interactive time). The current implementation only recognizes the
notation in which a stand-off range of an XML-element is expressed
with two XML-attributes with numeric values indicating start and end
of the range. So in your case you probably would need to convert the
GATE annotations to this format. By the way: the names of the
XML-attributes can be configured in MonetDB.conf.
I hope this answers your question, and I hope you can determine
whether MonetDB/XQuery is suited for your needs. Please let me know if
you have more questions. I would also be keen to hear about and/or
help out with querying GATE using the stand-off extensions.
Greetings,
Wouter
2010/2/23 David Epstein
Hello,
I am looking for information and experiences with monetdb's standoff annotation extension for Xquery.
In particular, I'm looking for a way to more flexibly and conveniently search through documents I've manually annotated for semantic content using GATE (General Architecture for Text Engineering), which has its own standoff format. Has anyone had success in querying GATE documents using monetdb xquery with the standoff extension?
If not GATE, what are people using to produce the standoff notation?
thank you, -david
David Epstein PhD Candidate Urban & Regional Planning University of Michigan USA
------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi Wouter, Thank you for your detailed reply. I'm afraid I have a few more questions for you: 1. If users are not relying on GATE to produce the standoff notation, what tools are they using? 2. If I send you a simple GATE format XML, could you give me an estimate of how difficult it would be to get working with MonetDB's standoff extension? 3. Given the implementation of standoff that you describe, can a user access the span of text demarcated by the start and end character positions? For example, could I check if a certain regular expression pattern appeared within the text span? Much of my analysis will involve comparing both the annotations and the text itself. 4. Finally, are there any front-ends available for MonetDB that would enable me to edit/update XML content in a way that hides levels of detail that I don't need to see at times? Or, that would allow me to save, reload, and execute queries quickly? thank you again, -david
1. If users are not relying on GATE to produce the standoff notation, what tools are they using? This completely depends on the application area. As stand-off annotation is mostly used in research environments, in-house developed systems are quite common.
2. If I send you a simple GATE format XML, could you give me an estimate of how difficult it would be to get working with MonetDB's standoff extension? Sure. But note that MonetDB/XQuery is a database system (it can basically store and query any XML document).
3. Given the implementation of standoff that you describe, can a user access the span of text demarcated by the start and end character positions? For example, could I check if a certain regular expression pattern appeared within the text span? Much of my analysis will involve comparing both the annotations and the text itself. There is currently no function in MonetDB/XQuery visible for users. The codebase does contain a non-used function 'so-blob()', which fetches specific regions from the data document. Note that you should be careful here; as it uses byte-position offsets which are not necessarily the same as character-position offsets.
4. Finally, are there any front-ends available for MonetDB that would enable me to edit/update XML content in a way that hides levels of detail that I don't need to see at times? Or, that would allow me to save, reload, and execute queries quickly? MonetDB/XQuery does not contain front-ends other than developer/system-administrator front-ends. MonetDB/XQuery can be easily queried in XQuery using JDBC, XRPC, and via the command-line (mclient). I am not aware of any third-party solutions that have a specialized integration with MonetDB/XQuery.
thank you again, -david
------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Hi Wouter, I'm a beginner in Xquery, so forgive me if this question seems naive. Using fn:substring(string,start,len), it is possible to write, in pure Xquery, a function that could isolate a substring of a node based on numerical start and end points in another node, right? And from that same information it would be possible to define xquery functions to identify a span as being before, within, or after another, right? If this is more-or-less correct, is the advantage of MonetDB over another xml database primarily the speed boost and not having to write these functions? Or are there other advantages as well? -david
If the 'data'-document is also an XML document, it is perfectly
possible to write a stand-off solution in pure XQuery. You are
correct, as I stated earlier, in that the main advantage of
MonetDB/XQuery (besides the ease of notation) is query performance,
and specifically for large documents (or large collections). For small
documents (say: up to a few megabytes of XML) most other XQuery
processors will be able to handle the XQueries that are typically
needed for stand-off querying.
2010/2/24 David Epstein
Hi Wouter,
I'm a beginner in Xquery, so forgive me if this question seems naive. Using fn:substring(string,start,len), it is possible to write, in pure Xquery, a function that could isolate a substring of a node based on numerical start and end points in another node, right? And from that same information it would be possible to define xquery functions to identify a span as being before, within, or after another, right? If this is more-or-less correct, is the advantage of MonetDB over another xml database primarily the speed boost and not having to write these functions? Or are there other advantages as well?
-david
------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (2)
-
David Epstein
-
Wouter Alink