If this wsid is a concept that is strictly necessary for the update support (and nobody except me thinks that this bit shifting might be not the nicest solution) we could also think about getting rid of the ws and replacing it with the wsid. I just do not like the fact that there are two variables underway that have some side effects on each other. So a fourth proposal (d) would be to replace the occurrences of ws by wsid. This would mean that *all* document accesses would have to include another indirection (even all path steps). On 10/05/2006 11:03 AM, Stefan Manegold wrote with possible deletions:
Dear Peter, dear fellow PF & MXQ developers,
I'm not sure, whether I do understand the new "ws-API" correctly, and would be grateful if someone of you could enlighten me.
First, here's how far I get so far (please correct me if/where I'm wrong!), followed by concrete open questions.
For transaction support, Peter introduced a new wsid that uniquely identifies (a reference to?) a ws, which is required for conflict detection (as far as I understand).
A wsid is to be generated from/for a ws by calling wsid := ws_id(ws); however, this generated a new unique id each time is called (two calls on the same ws yield two different ids, right?), hence, wsid := ws_id(ws) should immediately follow ws := ws_create() .
For the inverse, there is ws := ws_bat(wsid) that return the ws identified by wsid; obviously, this function yield the same result each time it's called with the same wsid.
Since some functionality required to to the wsid instead of "only" the ws, Peter had changes the signatures/API of some runtime PROCs. As far as I see right now, these are mainly (only?) ws_doc(ws, ... -> ws_opendoc(wsid, ... ws_addcoll(ws, ... -> ws_opencoll(wsid, ... ws_destroy(int(ws)) -> ws_destroy(wsid)
I finished these changes by simply running the folloing one-liners:
find * -type f | xargs grep -l 'var ws := ws_create();' | xargs perl -i -p -e 's|(var ws := ws_create\(\);)|$1 var wsid := ws_id(ws);|' find * -type f | xargs grep -l 'ws_doc(ws,' | xargs perl -i -p -e 's|ws_doc\(ws,|ws_opendoc(wsid,|' find * -type f | xargs grep -l 'ws_destroy(int(ws))' | xargs perl -i -p -e 's|ws_destroy\(int\(ws\)\)|ws_destroy(wsid)|' find * -type f | xargs grep -l 'ws_addcoll(ws,' | xargs perl -i -p -e 's|ws_addcoll\(ws,|ws_opencoll(wsid,|' find * -type f | xargs grep -l 'ws_addcoll' | xargs perl -i -p -e 's|ws_addcoll|ws_opencoll|'
While this seemed to have been enough to fix all .milS tests and (most of) the milprint_summer functionallity, there are two open issues left: the Algebra version, XRpc & PFtijah.
To be honest, I haven't looked at PFtijah, yet; some help by the respective experst would be appreciated.
In cases of XRpc & Algebra, there are still places were wsid := ws_id(ws) is IMHO called in the wrong location:
XRpc: In PROC rpc_client() (runtime/xrpc.mx), I had changed ws_addcoll(ws, ... to ws_opencoll(ws_id(ws), ... While this seems to worg for now with read-only documents (i.e., in the absense of updates and concurrency), it is IMHO incorrect in general. Rather, rpc_client should receive wsid as argument instead of ws; if necessary, ws can then be derived via ws := ws_bat(wsid).
Algebra: in PROC doc_tbl() (runtime/pf_support.mx), I had changed var r := ws_doc(ws, item); into var wsid := ws_id(ws); var r := ws_opendoc(wsid, item); Same story as above. However, doc_tbl return ws also in its result BAT-of-BATs, as far as I understand mainly for "canonical API" reasons. Obviously, it is not possible to replace BAT ws by lng wsid, here...
Leaving the latter problem aside for the moment, the following could (have) work(ed) as a general rule, and we should check and change the codebase accordingly:
1) wsid := ws_id(ws) must only be called immediately after ws := ws_create()
2) all functions/PROCs that recursively (i.e., including all transitively called functions/PROCs) require wsid instead of or inaddition to ws should be modified in that the receive wsid instead of ws as an argument; ws can then locally be derived via ws := ws_bat(wsid) if/where necessary.
3) all functions/PROCs that recursively (i.e., including all transitively called functions/PROCs) "never" (at least not yet) require wsid, but are fine with ws can stay unchanged, i.e., getting (only) ws as argument. Obviously, wherever these functions need to maintain a signature/API aligned with those that fall under point 2) above, we should also change these functions as described above with 2).
As indicated above, this does not work for doc_tbl and its companion PROCs in runtime/pf_support.mx that make-up the interface between the Algebra version of the compiler and the runtime. I assume the respective canonical API could be changed in that the respective PROC get wsid instead of (or in addition to?) ws as arguments. However, we cannot easily change the API to return wsid (lng) instead of ws (BAT) in the result BAT-of-BATs. Here, I see three solutions --- the last one actually comes from JanR:
a) Is ws/wsid indeed required in the result? If not, we could simply discard it.
b) Wrap wsid temprarly into a ("fake") [void,lng] BAT containing only one (nil,wsid) BUN. (Ugly!)
c) (JanR) Put the wsid inside ws --- basically as a ("fake") BAT [void,lng]. Kind of the "encapsulated" solution --- not only for the algebra-runtime interface! Consequently, we should have ws_create() call ws_id(ws) internally and stored the result in ws (this could save the seemingly "redundant" ws := ws_create(); wsid := ws_id(ws); sequence), and all (most, except at least ws_destroy()) signatures/API could be kept / re-unified to pass (only) ws (now including wsid) instead of wsid as argument.
There seems to be one problem, so, if I understand Peter's comment correctly: " - ws-IDs are now lng-s (combination of *unique* ID and bat-ID) such IDs are meaningful also after the query is done (and ws-bat deallocated). ^1^1^1^1^1^1^1^1^1^1^1^1^1^1 ^2^2^2^2^2^2^2^2^2^2^2 this is needed for trans mgmt. All meta-bats witha ws-id field now hold lng instead of int. "
In other words, (^2) wsid needs to be available also if ws is gone. I'm not sure, though, whether is has to be available in the global wsid variable (i.e., inside/during the query), or (only) in some meta-bats outside (and hence after) the query (-context) (as suggested by ^1).
Peter,
could you please enlighten me/us, here?
In case "only" (^1) is required, the "encapsulated" solution seems indeed to be suitable for all code. Otherwise, is would at least be a local solution for the algebra-runtime interface (in case (a) & (b0 are no option), but bears the burden of maintaining consistency between the global wsid variable and the wsid stored inside the ws.
I hope, this email and any reactions to it help to clearify the current situation, rather than to blur it even more ...
Stefan
-- Jan Rittinger Database Systems Technische Universität München (Germany) http://www-db.in.tum.de/~rittinge/