Re: [Monetdb-developers] Monetdb-developers Digest, Vol 5, Issue 6
Hi all,
In PROC rpc_client() (runtime/xrpc.mx), I had changed ws_addcoll(ws, ... to ws_opencoll(ws_id(ws), ...
it should we ws_opencoll(wsid, ...)
You have that, because in the query context we always have both variable
'ws' and 'wsid'.
ws_id(ws) should be called once, it asks for a unique oid. The combination
of (id,int(ws)) is squeezed into a lng, and that is wsid.
As for doc_tbl(), what would work is to change its parameter to wsid; one
can always get the ws with ws := ws_bat(wsid), this is done now in numerous
places in pf_support.mx.
Your conclusions (1), (2) and (3) are all correct.
But, Jan's suggestion to store the wsid inside the ws can also work.
However, I deem it very awkward to introduce a new BAT in the ws just to
retain a number. Other suggestions are to rename the ws-bat to some unique
key, and then we can use that name as transaction key (=wsid). We would
change the wsid type then from lng to str. Another idea is to set the
head-seqbase of the ws-bat to an oid. I think the ws is accessed with
fetch() only, which disregards any seqbase.
Note that internally in pathfinder.mx there are a number of functions that
just pass around an artificial wsid, and in fact a ws does not exist at all
(this is the MIL document management interface e.g. shred_doc), so a number
of (internal) functions in pathfinder.mx will have to stay using a wsid in
their signature. They do not need a ws-bat.
But the exported functions ws_opencoll and ws_opendoc could indeed just use
a ws as they did before.
I will not have time for this until next week, so if anyone feels like
adventurous, he may try.
Peter
--
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ |
| 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 |
| The Netherlands | Fax : +31 (20) 592-4312 |
------------------------------
Message: 2
Date: Fri, 6 Oct 2006 12:43:21 +0200
From: Stefan Manegold
Dear Peter, dear fellow PF & MXQ developers, [...] In cases of XRpc & Algebra, there are still places were wsid := ws_id(ws) is IMHO called in the wrong location:
XRpc: In PROC rpc_client() (runtime/xrpc.mx), I had changed ws_addcoll(ws, ... to ws_opencoll(ws_id(ws), ... While this seems to worg for now with read-only documents (i.e., in the absense of updates and concurrency), it is IMHO incorrect in general. Rather, rpc_client should receive wsid as argument instead of ws; if necessary, ws can then be derived via ws := ws_bat(wsid).
I just changed the signatures of doLoopLiftedRPC() & rpc_client()
to receive lng wsid instead of BAT ws,
BAT ws is the locally derived via ws := ws_bat(wsid);
All XRpc tests seem to work fine.
Stefan
[...]
--
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ |
| 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 |
| The Netherlands | Fax : +31 (20) 592-4312 |
------------------------------
Message: 3
Date: Fri, 06 Oct 2006 14:39:19 +0200
From: Jan Rittinger
Dear Peter, dear fellow PF & MXQ developers,
I'm not sure, whether I do understand the new "ws-API" correctly, and would be grateful if someone of you could enlighten me.
First, here's how far I get so far (please correct me if/where I'm wrong!), followed by concrete open questions.
For transaction support, Peter introduced a new wsid that uniquely identifies (a reference to?) a ws, which is required for conflict detection (as far as I understand).
A wsid is to be generated from/for a ws by calling wsid := ws_id(ws); however, this generated a new unique id each time is called (two calls on the same ws yield two different ids, right?), hence, wsid := ws_id(ws) should immediately follow ws := ws_create() .
For the inverse, there is ws := ws_bat(wsid) that return the ws identified by wsid; obviously, this function yield the same result each time it's called with the same wsid.
Since some functionality required to to the wsid instead of "only" the ws, Peter had changes the signatures/API of some runtime PROCs. As far as I see right now, these are mainly (only?) ws_doc(ws, ... -> ws_opendoc(wsid, ... ws_addcoll(ws, ... -> ws_opencoll(wsid, ... ws_destroy(int(ws)) -> ws_destroy(wsid)
I finished these changes by simply running the folloing one-liners:
find * -type f | xargs grep -l 'var ws := ws_create();' | xargs perl -i -p -e 's|(var ws := ws_create\(\);)|$1 var wsid := ws_id(ws);|' find * -type f | xargs grep -l 'ws_doc(ws,' | xargs perl -i -p -e 's|ws_doc\(ws,|ws_opendoc(wsid,|' find * -type f | xargs grep -l 'ws_destroy(int(ws))' | xargs perl -i -p -e 's|ws_destroy\(int\(ws\)\)|ws_destroy(wsid)|' find * -type f | xargs grep -l 'ws_addcoll(ws,' | xargs perl -i -p -e 's|ws_addcoll\(ws,|ws_opencoll(wsid,|' find * -type f | xargs grep -l 'ws_addcoll' | xargs perl -i -p -e 's|ws_addcoll|ws_opencoll|'
While this seemed to have been enough to fix all .milS tests and (most of) the milprint_summer functionallity, there are two open issues left: the Algebra version, XRpc & PFtijah.
To be honest, I haven't looked at PFtijah, yet; some help by the respective experst would be appreciated.
In cases of XRpc & Algebra, there are still places were wsid := ws_id(ws) is IMHO called in the wrong location:
XRpc: In PROC rpc_client() (runtime/xrpc.mx), I had changed ws_addcoll(ws, ... to ws_opencoll(ws_id(ws), ... While this seems to worg for now with read-only documents (i.e., in the absense of updates and concurrency), it is IMHO incorrect in general. Rather, rpc_client should receive wsid as argument instead of ws; if necessary, ws can then be derived via ws := ws_bat(wsid).
Algebra: in PROC doc_tbl() (runtime/pf_support.mx), I had changed var r := ws_doc(ws, item); into var wsid := ws_id(ws); var r := ws_opendoc(wsid, item); Same story as above. However, doc_tbl return ws also in its result BAT-of-BATs, as far as I understand mainly for "canonical API" reasons. Obviously, it is not possible to replace BAT ws by lng wsid, here...
Leaving the latter problem aside for the moment, the following could (have) work(ed) as a general rule, and we should check and change the codebase accordingly:
1) wsid := ws_id(ws) must only be called immediately after ws := ws_create()
2) all functions/PROCs that recursively (i.e., including all transitively called functions/PROCs) require wsid instead of or inaddition to ws should be modified in that the receive wsid instead of ws as an argument; ws can then locally be derived via ws := ws_bat(wsid) if/where necessary.
3) all functions/PROCs that recursively (i.e., including all transitively called functions/PROCs) "never" (at least not yet) require wsid, but are fine with ws can stay unchanged, i.e., getting (only) ws as argument. Obviously, wherever these functions need to maintain a signature/API aligned with those that fall under point 2) above, we should also change these functions as described above with 2).
As indicated above, this does not work for doc_tbl and its companion PROCs in runtime/pf_support.mx that make-up the interface between the Algebra version of the compiler and the runtime. I assume the respective canonical API could be changed in that the respective PROC get wsid instead of (or in addition to?) ws as arguments. However, we cannot easily change the API to return wsid (lng) instead of ws (BAT) in the result BAT-of-BATs. Here, I see three solutions --- the last one actually comes from JanR:
a) Is ws/wsid indeed required in the result? If not, we could simply discard it.
b) Wrap wsid temprarly into a ("fake") [void,lng] BAT containing only one (nil,wsid) BUN. (Ugly!)
c) (JanR) Put the wsid inside ws --- basically as a ("fake") BAT [void,lng]. Kind of the "encapsulated" solution --- not only for the algebra-runtime interface! Consequently, we should have ws_create() call ws_id(ws) internally and stored the result in ws (this could save the seemingly "redundant" ws := ws_create(); wsid := ws_id(ws); sequence), and all (most, except at least ws_destroy()) signatures/API could be kept / re-unified to pass (only) ws (now including wsid) instead of wsid as argument.
There seems to be one problem, so, if I understand Peter's comment correctly: " - ws-IDs are now lng-s (combination of *unique* ID and bat-ID) such IDs are meaningful also after the query is done (and ws-bat deallocated). ^1^1^1^1^1^1^1^1^1^1^1^1^1^1 ^2^2^2^2^2^2^2^2^2^2^2 this is needed for trans mgmt. All meta-bats witha ws-id field now hold lng instead of int. "
In other words, (^2) wsid needs to be available also if ws is gone. I'm not sure, though, whether is has to be available in the global wsid variable (i.e., inside/during the query), or (only) in some meta-bats outside (and hence after) the query (-context) (as suggested by ^1).
Peter,
could you please enlighten me/us, here?
In case "only" (^1) is required, the "encapsulated" solution seems indeed to be suitable for all code. Otherwise, is would at least be a local solution for the algebra-runtime interface (in case (a) & (b0 are no option), but bears the burden of maintaining consistency between the global wsid variable and the wsid stored inside the ws.
I hope, this email and any reactions to it help to clearify the current situation, rather than to blur it even more ...
Stefan
--
Jan Rittinger
Database Systems
Technische Universit?t M?nchen (Germany)
http://www-db.in.tum.de/~rittinge/
------------------------------
Message: 4
Date: Fri, 6 Oct 2006 15:08:00 +0200
From: Stefan Manegold
If this wsid is a concept that is strictly necessary for the update support (and nobody except me thinks that this bit shifting might be not the nicest solution) we could also think about getting rid of the ws and replacing it with the wsid. I just do not like the fact that there are two variables underway that have some side effects on each other.
To be honest, I'm still in the process of analyzing trying to understand, what all the flavors, pros, cons, and ideas behind having/using wsid are; hence, I cannot tell, whether it is "strictly necessary" or not. As far as I understand so far, the basic idea behind the "bit shifting" is merely to store both the id of (and hence reference to) BAT ws as well as a unique identifier --- that is still valid after a query/transaction (and hence cannot simply be the ws BAT id, which might be re-used for a subsequent query/transaction) --- in one single atomic value. Further, I agree, that having two related variables that need to be kept in sync and treated accordingly is not very handy for maintenance --- my guess is, that the current situation resulted from lack of time and fear to implement (something like) your (d) proposal; which I agree is "nicer" & "cleaner", but requires quite a lot of possibly error-prone code changes --- nevertheless, I might give it a try during the weekend ... Stefan
So a fourth proposal (d) would be to replace the occurrences of ws by wsid. This would mean that *all* document accesses would have to include another indirection (even all path steps).
--
| Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl |
| CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ |
| 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 |
| The Netherlands | Fax : +31 (20) 592-4312 |
------------------------------
Message: 5
Date: Fri, 6 Oct 2006 15:42:02 +0200
From: Stefan Manegold
On Fri, Oct 06, 2006 at 02:39:19PM +0200, Jan Rittinger wrote:
If this wsid is a concept that is strictly necessary for the update support (and nobody except me thinks that this bit shifting might be not
the nicest solution) we could also think about getting rid of the ws and
replacing it with the wsid. I just do not like the fact that there are two variables underway that have some side effects on each other.
To be honest, I'm still in the process of analyzing trying to understand, what all the flavors, pros, cons, and ideas behind having/using wsid are; hence, I cannot tell, whether it is "strictly necessary" or not.
As far as I understand so far, the basic idea behind the "bit shifting" is merely to store both the id of (and hence reference to) BAT ws as well as a unique identifier --- that is still valid after a query/transaction (and hence cannot simply be the ws BAT id, which might be re-used for a subsequent query/transaction) --- in one single atomic value.
Further, I agree, that having two related variables that need to be kept in sync and treated accordingly is not very handy for maintenance --- my guess is, that the current situation resulted from lack of time and fear to implement (something like) your (d) proposal; which I agree is "nicer" & "cleaner", but requires quite a lot of possibly error-prone code changes
nevertheless, I might give it a try during the weekend ...
Stefan
So a fourth proposal (d) would be to replace the occurrences of ws by wsid. This would mean that *all* document accesses would have to include
another indirection (even all path steps).
Well, I just realize that this actually only helps to save us from having function interfaces that expect/use either ws or wsid or both --- it does not help to get rid of the global ws variable: to exist throught the whole query, the ws BAT must either be a global variable or be persistent --- the latter is not possible, since ws is a BAT-of-BATs, and those cannot be persistent... ... I just recall, that we have something like "session" BATs --- I'll check whether that's an option (any help/hint is of course welcome ...) Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 | ------------------------------ ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV ------------------------------ _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers End of Monetdb-developers Digest, Vol 5, Issue 6 ************************************************
On Fri, Oct 06, 2006 at 04:16:05PM +0200, p.a.boncz wrote:
Hi all,
In PROC rpc_client() (runtime/xrpc.mx), I had changed ws_addcoll(ws, ... to ws_opencoll(ws_id(ws), ...
it should we ws_opencoll(wsid, ...)
It is wsid since this morning (see my respective checkins and earlier follow-up on this thread).
You have that, because in the query context we always have both variable 'ws' and 'wsid'.
ws_id(ws) should be called once, it asks for a unique oid. The combination of (id,int(ws)) is squeezed into a lng, and that is wsid.
As for doc_tbl(), what would work is to change its parameter to wsid; one can always get the ws with ws := ws_bat(wsid), this is done now in numerous places in pf_support.mx.
well, as said before, that's not that easy, since doc_tbl (and it's brothers/sistens) follows and API definiton that aslo returns ws --- I have not idea, how this works/is used in detail ... JanR?
Your conclusions (1), (2) and (3) are all correct.
But, Jan's suggestion to store the wsid inside the ws can also work. However, I deem it very awkward to introduce a new BAT in the ws just to retain a number. Other suggestions are to rename the ws-bat to some unique key, and then we can use that name as transaction key (=wsid). We would change the wsid type then from lng to str. Another idea is to set the head-seqbase of the ws-bat to an oid. I think the ws is accessed with fetch() only, which disregards any seqbase.
What about encoding/storing the wsid in the ws BAT's name? I'll try do do that --- seems the smallest-impact solution that could (should?) finally satisfy all sides --- I hope ... this discussion already costs far too much time and energy ... |-( Basically, call current interfaces can stay unchanged. Only ws_id() will change slightly: ws_id() checkes whether the ws BAT as already been renamed, i.e., assigend an ID; if so, it uses that one; otherwise, it assigns one nd renames the ws BAT accordingly. Thus, ws_id can be called at any time on a ws BAT, and yield the same result each time it's called on the same ws BAT.
Note that internally in pathfinder.mx there are a number of functions that just pass around an artificial wsid, and in fact a ws does not exist at all (this is the MIL document management interface e.g. shred_doc), so a number of (internal) functions in pathfinder.mx will have to stay using a wsid in their signature. They do not need a ws-bat.
But the exported functions ws_opencoll and ws_opendoc could indeed just use a ws as they did before.
I will not have time for this until next week, so if anyone feels like adventurous, he may try.
(Hopefully as a final act on this saga), I'll try to implement the above mentioned "solution" ... Stefan
Peter
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
Stefan, ws_id() is not necesasary anymore then (nor ws_bat). The renaming can be done right when the ws is created (ws_create()). Peter -----Original Message----- From: Stefan Manegold [mailto:Stefan.Manegold@cwi.nl] Sent: Friday, October 06, 2006 4:34 PM To: p.a.boncz Cc: monetdb-developers@lists.sourceforge.net Subject: Re: [Monetdb-developers] Monetdb-developers Digest, Vol 5, Issue 6 On Fri, Oct 06, 2006 at 04:16:05PM +0200, p.a.boncz wrote:
Hi all,
In PROC rpc_client() (runtime/xrpc.mx), I had changed ws_addcoll(ws, ... to ws_opencoll(ws_id(ws), ...
it should we ws_opencoll(wsid, ...)
It is wsid since this morning (see my respective checkins and earlier follow-up on this thread).
You have that, because in the query context we always have both variable 'ws' and 'wsid'.
ws_id(ws) should be called once, it asks for a unique oid. The combination of (id,int(ws)) is squeezed into a lng, and that is wsid.
As for doc_tbl(), what would work is to change its parameter to wsid; one can always get the ws with ws := ws_bat(wsid), this is done now in numerous places in pf_support.mx.
well, as said before, that's not that easy, since doc_tbl (and it's brothers/sistens) follows and API definiton that aslo returns ws --- I have not idea, how this works/is used in detail ... JanR?
Your conclusions (1), (2) and (3) are all correct.
But, Jan's suggestion to store the wsid inside the ws can also work. However, I deem it very awkward to introduce a new BAT in the ws just to retain a number. Other suggestions are to rename the ws-bat to some unique key, and then we can use that name as transaction key (=wsid). We would change the wsid type then from lng to str. Another idea is to set the head-seqbase of the ws-bat to an oid. I think the ws is accessed with fetch() only, which disregards any seqbase.
What about encoding/storing the wsid in the ws BAT's name? I'll try do do that --- seems the smallest-impact solution that could (should?) finally satisfy all sides --- I hope ... this discussion already costs far too much time and energy ... |-( Basically, call current interfaces can stay unchanged. Only ws_id() will change slightly: ws_id() checkes whether the ws BAT as already been renamed, i.e., assigend an ID; if so, it uses that one; otherwise, it assigns one nd renames the ws BAT accordingly. Thus, ws_id can be called at any time on a ws BAT, and yield the same result each time it's called on the same ws BAT.
Note that internally in pathfinder.mx there are a number of functions that just pass around an artificial wsid, and in fact a ws does not exist at all (this is the MIL document management interface e.g. shred_doc), so a number of (internal) functions in pathfinder.mx will have to stay using a wsid in their signature. They do not need a ws-bat.
But the exported functions ws_opencoll and ws_opendoc could indeed just use a ws as they did before.
I will not have time for this until next week, so if anyone feels like adventurous, he may try.
(Hopefully as a final act on this saga), I'll try to implement the above mentioned "solution" ... Stefan
Peter
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
On Fri, Oct 06, 2006 at 04:52:07PM +0200, p.a.boncz wrote:
Stefan,
ws_id() is not necesasary anymore then (nor ws_bat).
... provided I also go through all the work of changing all wsid interfaces back to ws interfaces, and derive wsid from the ws BAT's name (via some new ws_id PROC) locally where ever wsid is required iso. ws ...
The renaming can be done right when the ws is created (ws_create()).
right. my first approach was/is to do only limited ocde changes to see whether my idea does work at all --- such "clean-up" will only be done once I know it works at all ... Stefan
Peter
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
participants (2)
-
p.a.boncz
-
Stefan Manegold