Hi Ed, finally, I have time to answer your questions...
Greetings!
Just doing some more digging around MonetDB, and have a couple of questions.
(1) If I am managing multiple BATs that are (supposedly) synced, how do I manage appends and the syncing?
For example, I have a_p1 = BAT[void, int] and a_p2 = BAT[void, int]
(a_p1 = <a> type things with property p1) (a_p2 = <a> type things with property p2)
I wish to insert a new <a>=(1,2) thing. Do I do? bat("a_p1").append(1); bat("a_p2").append(2);
right, that's what you need to do. (The SQL front-end, e.g., does excatly this.)
How do I ensure that the "voids" are always synced? Is this just "assumed"?
Two (or more) void columns are treated as synced, if they have the same seqbase and the same lenght (i.e., number of tuples/BUNs).
What if I fail in the middle of operations (such that a_p1 has an extra row) and stuck with dirty tables? Do I do a global "abort();" or a "clean();" to get back to a safe state? Will this reset the seqbase?
If you fail half-way, the way to get back a consistend state is indeed to do a global "abort();". This sets you back to the state right after the last "commit();". Hence, if you need to be "on the save side, you could call commit() after each set of appends that form one relational tuple. Note however, that commit() is expensive because it need to flush the respective changes to disk; hence, you might want to call it less often. The "optimal" frequency depends on how much performance you/your application needs respectively how much data "loss" you/your application can bear with in case of a failure + abort()... Neither the appends nor a commit or abort do change the seqbase of a void column!
(Is this documented someplace? I am trying to look at the architecture docs but cannot find anything that concretely answers similar questions.)
I'm afraid, there is not too much documentation about these issues except from the code itself. I'll try to have a look, but given the limited man power we have to cope with, I cannot promise anything for now. Improving the documentation is on our todo list, and questions like these will help us to find out which documentation is needed most.
(2) How does sync() work?
"sync()" is related to transaction management (it "save all persistent BATs"), however, it is not supposed to be used "under normal circumstances. You should use "commit()" instead!
Do I do a: sync(bat("a_p1"), bat("a_p2"));
once and only once? After every update/append to the tables?
Hm, I'm not aware of any "sync(BAT,BAT)".
There is the phrase "When two BATs effectively contain the same sequence of head elements, we call them 'synced'. This is implemented by storing a very large OID for each column. An update to the column destroys this OID.". This sounds as if any update/append breaks the correspondence, and also that it will limit the number of rows in the table (as an OID is a 32-bit number of 32-bit systems).
Since your dealing with void-headed BATs, here, "syncedness" (or better "alignedness") of BATs it trivial and given by default: Two (or more) void-headed BATs are treated as synced/aligned, if their heads have the same seqbase and the BATs have the same lenght (i.e., number of tuples/BUNs). Hence, "syncedness"/"alignedness" is completely handled by the kernel, and you don't need to do anything. Independent of "syncedness"/"alignedness", on a 32-bit system the size of a BAT is limited by either 2^32 bytes or 2^32 rows, whatever is less.
If I have 10 properties I am tracking, would I pick one as the "primary" BAT, and sync() all others to it?
As said before, no need for explicte syncing (it IMHO does not even exist in MIL). Al you need to do create all 10 BAT with the same seqbase, and ensure that all your "append()"'s are synchronized as described above. I hope, this answers your questions. Please don't hesitate to contact us again, if things are still unclear, or once new questions arise! Regards, Stefan
Regards! Ed
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |