Re: [MonetDB-users] PF: "Group By" work-around no longer working?

Hi Jan,
I was also using the GROUP BY construct. This is my query: ... let $qid := tijah:queryall-id($query_nexi, $options) let $nodes := tijah:nodes($qid) let $result := for $node in $nodes return typeswitch ($node) case $elem-node as element(*) return <result> <num> { (count($node/$elem-node/preceding::*) + 1) + This snippet of XQuery is to complex: 'count($node/$elem-node/...'. Both $node and $elem-node contain the tijah-result nodes (with type node() and element(*), respectively). What you want to write is 'count($elem-node/...'.
string-length(tijah:getINEXPath($node)) } </num> <file> { data($node/$elem-node/ancestor-or-self::EAD/@FILE) } file> <title> { data($node/$elem-node/ancestor-or-self::EAD/@TITLE) } </title> ...
<score> { tijah:score($qid, $node/$elem-node) } </score> </result> default return error ("unexpected node")
let $total := count($result) return <results total="{$total}"> { for $res at $rankGlobal in distinct-values($result/file) let $cs-group := $result[file = $res] for $cs-group2 at $rank in $cs-group where exists($cs-group2)
On Jan 14, 2009, at 11:53, jz@uva wrote: this line is unnecessary as there is always exactly one item bound to $cs-group2.
and $rank >= 1
this line is unnecessary as the positional variable always starts from 1 (and can never be less).
and $rank <= 8 and $rankGlobal >= $start and $rankGlobal <= $end order by $cs-group2 cast as xs:integer
what do you want here? I very much doubt that you are able to atomize an XML snippet, then cast the string-value into a number and still get something meaningful out.
return <out id="{$res}">{ $cs-group2 }</out> } </results>
With this query, I get different results in MPS and Alg.
I guess MPS treats your order by differently---read incorrectly---by ignoring every character data after the first number which accidently is the content of element num. Perhaps your query already works the way you expect it by replacing the ORDER BY by the following snippet: 'order by number($cs-group2/ num)'. Jan -- Jan Rittinger Lehrstuhl Datenbanken und Informationssysteme Wilhelm-Schickard-Institut für Informatik Eberhard-Karls-Universität Tübingen http://www-db.informatik.uni-tuebingen.de/team/rittinger

Thanks Jan for the clarification.
When I adopt the changes that you recommend, the actual behavior (run)
does not change.
Basically, there are 2 things: grouping results and presenting results
in its original hierarchical organization, and both differ in MPS and
Algebra.
- The grouping of results with distinct-values is different in MPS and
Alg. When I leave out the ORDER BY clause, the results that are
returned have a different ranking. The MPS backend is working as I
would expect, because the results are returned by relevance and
grouped. The Algebra backend discards the original ranking and
presents the results by file name.
- When I use the new ORDER BY clause ( ORDER BY number($cs-group2/num)
), the MPS backend preserves the grouping and orders each group by
num, whereas Algebra gets rid of the grouping and orders all results
by num.
I was quite happy with MPS, because the stuff that I wanted worked with that.
I need to get this working, because we are (supposed to be) doing user
studies with operational systems.
Do you have more tips? Is this a 'bug' with Pftijah?
junte
On Wed, Jan 14, 2009 at 12:35 PM, Jan Rittinger
On Jan 14, 2009, at 11:53, jz@uva wrote:
Hi Jan,
I was also using the GROUP BY construct. This is my query: ... let $qid := tijah:queryall-id($query_nexi, $options) let $nodes := tijah:nodes($qid) let $result := for $node in $nodes return typeswitch ($node) case $elem-node as element(*) return <result> <num> { (count($node/$elem-node/preceding::*) + 1) + This snippet of XQuery is to complex: 'count($node/$elem-node/...'. Both $node and $elem-node contain the tijah-result nodes (with type node() and element(*), respectively). What you want to write is 'count($elem-node/...'.
string-length(tijah:getINEXPath($node)) } </num> <file> { data($node/$elem-node/ancestor-or-self::EAD/@FILE) } file> <title> { data($node/$elem-node/ancestor-or-self::EAD/@TITLE) } </title> ...
<score> { tijah:score($qid, $node/$elem-node) } </score> </result> default return error ("unexpected node")
let $total := count($result) return <results total="{$total}"> { for $res at $rankGlobal in distinct-values($result/file) let $cs-group := $result[file = $res] for $cs-group2 at $rank in $cs-group where exists($cs-group2) this line is unnecessary as there is always exactly one item bound to $cs-group2.
and $rank >= 1 this line is unnecessary as the positional variable always starts from 1 (and can never be less).
and $rank <= 8 and $rankGlobal >= $start and $rankGlobal <= $end order by $cs-group2 cast as xs:integer what do you want here? I very much doubt that you are able to atomize an XML snippet, then cast the string-value into a number and still get something meaningful out.
return <out id="{$res}">{ $cs-group2 }</out> } </results>
With this query, I get different results in MPS and Alg. I guess MPS treats your order by differently---read incorrectly---by ignoring every character data after the first number which accidently is the content of element num.
Perhaps your query already works the way you expect it by replacing the ORDER BY by the following snippet: 'order by number($cs-group2/ num)'.
Jan
-- Jan Rittinger Lehrstuhl Datenbanken und Informationssysteme Wilhelm-Schickard-Institut für Informatik Eberhard-Karls-Universität Tübingen
http://www-db.informatik.uni-tuebingen.de/team/rittinger
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

hej junte, i was also using the "for...at" syntax incorrectly before, simply because we were used to the "incorrect" MPS behavior that come often quite handy. if you want to keep the ranked order of the pftijah output but limit the results to the k best ranked items, you need to use the subsequence( ) function. it requires some more lines of code compared to the "incorrect" MPS version, but it is still possible. i am not sure if i completely understand the intention of your query, but you should probably first group and aggregate the scores per file, then use the subsequence expression to select the best ones. and then, in a second loop you select the results per file and order them by score, and output again the best ones. does this help? best -henning On 14.01.2009, at 20:35, jz@uva wrote:
Thanks Jan for the clarification. When I adopt the changes that you recommend, the actual behavior (run) does not change.
Basically, there are 2 things: grouping results and presenting results in its original hierarchical organization, and both differ in MPS and Algebra.
- The grouping of results with distinct-values is different in MPS and Alg. When I leave out the ORDER BY clause, the results that are returned have a different ranking. The MPS backend is working as I would expect, because the results are returned by relevance and grouped. The Algebra backend discards the original ranking and presents the results by file name.
- When I use the new ORDER BY clause ( ORDER BY number($cs-group2/num) ), the MPS backend preserves the grouping and orders each group by num, whereas Algebra gets rid of the grouping and orders all results by num.
I was quite happy with MPS, because the stuff that I wanted worked with that. I need to get this working, because we are (supposed to be) doing user studies with operational systems.
Do you have more tips? Is this a 'bug' with Pftijah?
junte
On Wed, Jan 14, 2009 at 12:35 PM, Jan Rittinger
wrote: On Jan 14, 2009, at 11:53, jz@uva wrote:
Hi Jan,
I was also using the GROUP BY construct. This is my query: ... let $qid := tijah:queryall-id($query_nexi, $options) let $nodes := tijah:nodes($qid) let $result := for $node in $nodes return typeswitch ($node) case $elem-node as element(*) return <result> <num> { (count($node/$elem-node/preceding::*) + 1) + This snippet of XQuery is to complex: 'count($node/$elem-node/...'. Both $node and $elem-node contain the tijah-result nodes (with type node() and element(*), respectively). What you want to write is 'count($elem-node/...'.
string-length(tijah:getINEXPath($node)) } </num> <file> { data($node/$elem-node/ancestor-or-self::EAD/@FILE) } file> <title> { data($node/$elem-node/ancestor-or-self::EAD/@TITLE) } </title> ...
<score> { tijah:score($qid, $node/$elem-node) } </score> </result> default return error ("unexpected node")
let $total := count($result) return <results total="{$total}"> { for $res at $rankGlobal in distinct-values($result/file) let $cs-group := $result[file = $res] for $cs-group2 at $rank in $cs-group where exists($cs-group2) this line is unnecessary as there is always exactly one item bound to $cs-group2.
and $rank >= 1 this line is unnecessary as the positional variable always starts from 1 (and can never be less).
and $rank <= 8 and $rankGlobal >= $start and $rankGlobal <= $end order by $cs-group2 cast as xs:integer what do you want here? I very much doubt that you are able to atomize an XML snippet, then cast the string-value into a number and still get something meaningful out.
return <out id="{$res}">{ $cs-group2 }</out> } </results>
With this query, I get different results in MPS and Alg. I guess MPS treats your order by differently---read incorrectly---by ignoring every character data after the first number which accidently is the content of element num.
Perhaps your query already works the way you expect it by replacing the ORDER BY by the following snippet: 'order by number($cs-group2/ num)'.
Jan
-- Jan Rittinger Lehrstuhl Datenbanken und Informationssysteme Wilhelm-Schickard-Institut für Informatik Eberhard-Karls-Universität Tübingen
http://www-db.informatik.uni-tuebingen.de/team/rittinger
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

On Jan 14, 2009, at 20:35, jz@uva wrote:
Thanks Jan for the clarification. When I adopt the changes that you recommend, the actual behavior (run) does not change.
Basically, there are 2 things: grouping results and presenting results in its original hierarchical organization, and both differ in MPS and Algebra.
- The grouping of results with distinct-values is different in MPS and Alg. When I leave out the ORDER BY clause, the results that are returned have a different ranking. The MPS backend is working as I would expect, because the results are returned by relevance and grouped. The Algebra backend discards the original ranking and presents the results by file name.
From what I see here only the order differs which is unrelated to the 'grouping' issue.
- When I use the new ORDER BY clause ( ORDER BY number($cs-group2/num) ), the MPS backend preserves the grouping and orders each group by num, whereas Algebra gets rid of the grouping and orders all results by num.
Until now you were lucky with the buggy ordering in MPS that kept your relevance order by chance! To solve your problem there is of course a very simple solution: make the relevance explicit. In your example that is: let $result := for $node at $relevance in $nodes return typeswitch ($node) case $elem-node as element(*) return <result relevance="{$relevance}"> ... </result> default return error ("unexpected node") let $total := count($result) return <results total="{$total}"> { for $res at $rankGlobal in distinct-values($result/file) let $cs-group := $result[file = $res] for $cs-group2 at $rank in $cs-group where ... order by number($cs-group2/@relevance), number ($cs-group2/num) return <out id="{$res}">{ $cs-group2 }</out> }
I was quite happy with MPS, because the stuff that I wanted worked with that. I need to get this working, because we are (supposed to be) doing user studies with operational systems.
MPS like I said before is not correct with respect to order. distinct-values is defined in the specification as unordered. Thus the conclusion is that you need to specify what you want :-)
Do you have more tips? Is this a 'bug' with Pftijah?
junte
On Wed, Jan 14, 2009 at 12:35 PM, Jan Rittinger
wrote: On Jan 14, 2009, at 11:53, jz@uva wrote:
Hi Jan,
I was also using the GROUP BY construct. This is my query: ... let $qid := tijah:queryall-id($query_nexi, $options) let $nodes := tijah:nodes($qid) let $result := for $node in $nodes return typeswitch ($node) case $elem-node as element(*) return <result> <num> { (count($node/$elem-node/preceding::*) + 1) + This snippet of XQuery is to complex: 'count($node/$elem-node/...'. Both $node and $elem-node contain the tijah-result nodes (with type node() and element(*), respectively). What you want to write is 'count($elem-node/...'.
string-length(tijah:getINEXPath($node)) } </num> <file> { data($node/$elem-node/ancestor-or-self::EAD/@FILE) } file> <title> { data($node/$elem-node/ancestor-or-self::EAD/@TITLE) } </title> ...
<score> { tijah:score($qid, $node/$elem-node) } </score> </result> default return error ("unexpected node")
let $total := count($result) return <results total="{$total}"> { for $res at $rankGlobal in distinct-values($result/file) let $cs-group := $result[file = $res] for $cs-group2 at $rank in $cs-group where exists($cs-group2) this line is unnecessary as there is always exactly one item bound to $cs-group2.
and $rank >= 1 this line is unnecessary as the positional variable always starts from 1 (and can never be less).
and $rank <= 8 and $rankGlobal >= $start and $rankGlobal <= $end order by $cs-group2 cast as xs:integer what do you want here? I very much doubt that you are able to atomize an XML snippet, then cast the string-value into a number and still get something meaningful out.
return <out id="{$res}">{ $cs-group2 }</out> } </results>
With this query, I get different results in MPS and Alg. I guess MPS treats your order by differently---read incorrectly---by ignoring every character data after the first number which accidently is the content of element num.
Perhaps your query already works the way you expect it by replacing the ORDER BY by the following snippet: 'order by number($cs-group2/ num)'.
Jan
-- Jan Rittinger Lehrstuhl Datenbanken und Informationssysteme Wilhelm-Schickard-Institut für Informatik Eberhard-Karls-Universität Tübingen
http://www-db.informatik.uni-tuebingen.de/team/rittinger
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ This SF.net email is sponsored by: SourcForge Community SourceForge wants to tell your story. http://p.sf.net/sfu/sf-spreadtheword _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Jan Rittinger Lehrstuhl Datenbanken und Informationssysteme Wilhelm-Schickard-Institut für Informatik Eberhard-Karls-Universität Tübingen http://www-db.informatik.uni-tuebingen.de/team/rittinger
participants (3)
-
Henning Rode
-
Jan Rittinger
-
jz@uva