Hi Ed, (and another delayed reply ... my apologies ...)
I wish to insert a new <a>=(1,2) thing. Do I do? bat("a_p1").append(1); bat("a_p2").append(2);
right, that's what you need to do. (The SQL front-end, e.g., does excatly this.)
I assume that if 2 or more processes want to insert into that particular group of BATs, I will need to use some sort of transaction to ensure ordering of the inserts (ie, if 2 processes do the above, the timing could be P1, then P2, then P2, then P1, causing an inconsistent ordering). How does the SQL side handle this for a table "insert"?
Yes. By default, MonetDB provieds only very limited transaction support on individual BATs. If necessary, front-ends are supposed to implement more transaction mechanism themselves using the tools that are provided byt modules "trans" and "lock". The reason is, that we what to keep the kernel as "overhead-free" as possible to acheive max. performance with "mainly-read" workloads (OLAP) that do not require much less transactions support than OLTP workloads. The SQL front-end uses a timestamp-based optimistic transaction protocol: (If I recall correctly --- Niels, please correct me if I'm wrong!) All updates are done locally in extra insert/update/delete BATs for each transactions. At commit time, the front-end checks whether the updates of the committing transaction would/did conflict with some other transaction. If so, the committing trans action is aborted. If there were no conflicts, the updates are applied to the original persistent BATs. In your case, I see two options: 1) for each insert (append) of one row to the set of BATs that hold the individual attributes, each client would first need to aquire a lock (see module lock) before access to the respecitve BATs (not, one "global" lock for all BATs the make up one row!). only once the lock is granted, the client can insert. obviously, there are two problems: a) this kind of synchronisation mechanism will not scale with many concurrent clients b) it does not allow single-client commits and/or aborts; only global commits/aborts are possible. 2) in case your updates is append-only, and the order between clients does not matter, and you can live with the fact that the "main"/"global" BATs are not completely up-to-date at every moment in time: - keep (small) local BATs for each client for the (then lock-free) appends; - have each client append to its own BATs - every N insert (per client; e.g., N=1000) have each client commit, append its local BATs "in bulk" to the respective global BATs (guarded by a lock as described above), empty its local BATs, and proceed.
Two (or more) void columns are treated as synced, if they have the same seqbase and the same lenght (i.e., number of tuples/BUNs).
So I should set the seqbase to something unique for the group of BATs.
Unless the abolute value of the void head have some meaning to you, I'd propose the set all seqbases to 0@0; in any case, the seqbases of all BATs that make up one row must be identical (per partition). Note: The "syncedness" of BATs plays only a role with internal optimizations of the kernel when choosing the best strategies/algorithms for read-only operations such as select or join. MonetDB does/can not use the syncedness to keep BATs synched in case of updates.
What if I fail in the middle of operations (such that a_p1 has an extra row) and stuck with dirty tables? Do I do a global "abort();" or a "clean();" to get back to a safe state? Will this reset the seqbase?
If you fail half-way, the way to get back a consistend state is indeed to do a global "abort();". This sets you back to the state right after the last "commit();". Hence, if you need to be "on the save side, you could call commit() after each set of appends that form one relational tuple. Note however, that commit() is expensive because it need to flush the respective changes to disk; hence, you might want to call it less often. The "optimal" frequency depends on how much performance you/your application needs respectively how much data "loss" you/your application can bear with in case of a failure + abort()...
Neither the appends nor a commit or abort do change the seqbase of a void column!
I am assuming that one issue is that the database is dirty, and can be seen from another client (ACID properties on a database such as Oracle ensure that two clients cannot see changes the other has made that is uncommitted). In this case, I cannot undo _just_ the changes one client has made, but must undo the whole set. If I use the transaction support to "label" a transaction, can I do a proper cleanup that way? Is there a way to isolate other clients from dirty states/changes made by one client?
In MIL, changes to globally available BATs are immediately visible to all clients. (Persistent BATs are globally visible.) It's the reposibility of the front-end (like SQL) to implement user/client isolation. I sketched solution for your case under option 2) above.
(As an aside, is there a way to provide client isolation? The "trans" module does not seem to support specifying individual transactions or locking? The only other module I seem to find is "lock", but I am unsure how to use this between clients. Place the lock in a BAT with a label? Then how would atomic access to that BAT be achieved? Even if I use the locks for access control, there doesn't seem to be anything that provides the appropriate isolation from changes Ie, if I start a transaction, and then the client dies, do other clients see partial operations?)
The idea is to create a lock (or more for several purposes) (cf. "lock_create()") that is know to all client; either by keeping it in a variable (say "VAR MyLock := lock_create();") that was created before the client listener was started or by keeping it in a persistent BAT. Then, before accessing the BATs, a client need to aquire that lock: lock_set(MyLock); will block the client until the lock is available and granted to the calling client. lock_try(MyLock); will return immediately with ==0 if the lock was availble, in that case it is granted to the calling client !=0 if the lock was not available Once the client has been granted the lock, it can access the respective BATs. Once the clinet has finished it action on the respective BATs, it need to release the lock: "lock_unset(MyLock);" As said above, we kept these features outside the client to avoid the related overhead for the worksloads that MonetDB was primarily desined for and that do not require them.
(2) How does sync() work?
"sync()" is related to transaction management (it "save all persistent BATs"), however, it is not supposed to be used "under normal circumstances. You should use "commit()" instead!
Do I do a: sync(bat("a_p1"), bat("a_p2"));
once and only once? After every update/append to the tables?
I guess I got confused. I typed <sync> when I meant <synced>.
There is a
for testing syncedness.
[...] I'll have to stop for now. Sorry. I'll come back to your syncedness questions asap. Regards, Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |