Re: [Monetdb-developers] Question about milprint_summer
Hi Maurice,
For my research on probabilistic XML, I need some additional functions in XQuery implemented in mil. By means of copying code from existing functions in milprint_summer, I've been able to get some things working, but I still do not have the feeling that I understand the meaning and purpose of all the bats. For example, what's this ipik bat? There seem to be two interfaces (NORMAL and VALUES) that pass data from one subexpression to the next. My question is (I guess): which bats are involved in each of the interfaces and which conditions hold and do I need to adhere to wrt these bats? Is there some documentation around about this? I checked the pathfinder Wiki, but it, for example, gives no result for a search for "ipik".
Jan wrote the following indications for writing functions in mps: http://www.pathfinder-xquery.org/wiki/index.php/Function_implementation The 'ipik' thing is not described there, because it is newer. It stands for iter-pos-item-kind, the four bats crucial to the relational sequence representation, which contain BAT[void,T] values. At some point, Stefan and me introduced support for constants. Since then, each variable (item-pos-item-kind) can possibly be a MIL constant. However, one of them *must* be still a BAT. But which? The purpose of the 'ipik' is to identify that BAT variable. That is, it is assigned to either iter,pos, item or kind. Then, the hairy part is whether your MIL code is constant-resistant. Obviously, if you can operate on the constants directly, the query tends to be faster. We provide support for constants in many pf_support.mx MIL procs (e.g. get_container()), as wel as MIL maps (e.g. [+](1,2) just works, i.e. multiplex allows constant-only parameters). A part of the MIL bat algebra (join, min, max, texist, etc) is supported on constants as well, see MonetDB4/src/modules/plain/constant.mx If you do not want to bother optimizing your code for constants, you can always use the 'materialize' command to inflate any constant back into a bat: # enforce all variables to be bats (no effort if already so) iter := iter.materialize(ipik); pos := pos.materialize(ipik); item := item.materialize(ipik); kind := kind.materialize(ipik); As for the two "rc" modes in mps (VALUES, NORMAL) the following: - NORMAL should be used always for heterogeneously typed sequences. The 'item' is a BAT[void,oid], where the tail OID refers to a value container (e.g. str_values). Booleans and nodes do not have a container, BTW. Thus, we have str_values/int_values and dbl_values (= dec_values). - the VALUES representation uses 'item_str', 'item_int' resp 'item_dbl' variables instead. It can be used for homogeneous sequences (typically, 'kind' is a constant then), and those values directly contain values of the desired type, e.g. BAT[void,lng] for integers (yes: XQ integers are longs!). - translate2MIL() can be called with the request VALUES. It then *may* produce a result in VALUES representation, returning the type in that case. E.g. for integer-typed results in VALUES representation, it would return INT. But, even if you request VALUES, translate2MIL may still return NORMAL. In case translate2MIL is called with rc==NORMAL, it must produce normal. - if you have the VALUES representation, and want to go to NORMAL, use the MIL proc addValues(), generated in mps by the convenience function addValues. Good luck, Peter
Thanks for your info! Much clearer now. I added the comments to the mentioned Wiki-page for future reference. Maurice. Peter Boncz wrote:
Hi Maurice,
For my research on probabilistic XML, I need some additional functions in XQuery implemented in mil. By means of copying code from existing functions in milprint_summer, I've been able to get some things working, but I still do not have the feeling that I understand the meaning and purpose of all the bats. For example, what's this ipik bat? There seem to be two interfaces (NORMAL and VALUES) that pass data from one subexpression to the next. My question is (I guess): which bats are involved in each of the interfaces and which conditions hold and do I need to adhere to wrt these bats? Is there some documentation around about this? I checked the pathfinder Wiki, but it, for example, gives no result for a search for "ipik".
Jan wrote the following indications for writing functions in mps:
http://www.pathfinder-xquery.org/wiki/index.php/Function_implementation
The 'ipik' thing is not described there, because it is newer. It stands for iter-pos-item-kind, the four bats crucial to the relational sequence representation, which contain BAT[void,T] values.
At some point, Stefan and me introduced support for constants. Since then, each variable (item-pos-item-kind) can possibly be a MIL constant. However, one of them *must* be still a BAT. But which? The purpose of the 'ipik' is to identify that BAT variable. That is, it is assigned to either iter,pos, item or kind.
Then, the hairy part is whether your MIL code is constant-resistant. Obviously, if you can operate on the constants directly, the query tends to be faster. We provide support for constants in many pf_support.mx MIL procs (e.g. get_container()), as wel as MIL maps (e.g. [+](1,2) just works, i.e. multiplex allows constant-only parameters). A part of the MIL bat algebra (join, min, max, texist, etc) is supported on constants as well, see MonetDB4/src/modules/plain/constant.mx
If you do not want to bother optimizing your code for constants, you can always use the 'materialize' command to inflate any constant back into a bat:
# enforce all variables to be bats (no effort if already so) iter := iter.materialize(ipik); pos := pos.materialize(ipik); item := item.materialize(ipik); kind := kind.materialize(ipik);
As for the two "rc" modes in mps (VALUES, NORMAL) the following: - NORMAL should be used always for heterogeneously typed sequences. The 'item' is a BAT[void,oid], where the tail OID refers to a value container (e.g. str_values). Booleans and nodes do not have a container, BTW. Thus, we have str_values/int_values and dbl_values (= dec_values).
- the VALUES representation uses 'item_str', 'item_int' resp 'item_dbl' variables instead. It can be used for homogeneous sequences (typically, 'kind' is a constant then), and those values directly contain values of the desired type, e.g. BAT[void,lng] for integers (yes: XQ integers are longs!).
- translate2MIL() can be called with the request VALUES. It then *may* produce a result in VALUES representation, returning the type in that case. E.g. for integer-typed results in VALUES representation, it would return INT. But, even if you request VALUES, translate2MIL may still return NORMAL. In case translate2MIL is called with rc==NORMAL, it must produce normal.
- if you have the VALUES representation, and want to go to NORMAL, use the MIL proc addValues(), generated in mps by the convenience function addValues.
Good luck,
Peter
------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
-- ---------------------------------------------------------------------- Dr.Ir. M. van Keulen - Assistant Professor, Data Management Technology Univ. of Twente, Dept of EEMCS, POBox 217, 7500 AE Enschede, Netherlands Email: m.vankeulen@utwente.nl, Phone: +31 534893688, Fax: +31 534892927 Room: ZI 3039, WWW: http://www.cs.utwente.nl/~keulen
participants (2)
-
Keulen, M. van (Maurice)
-
Peter Boncz