[MonetDB-users] VOIDs and multiple BATs
Greetings! Just doing some more digging around MonetDB, and have a couple of questions. (1) If I am managing multiple BATs that are (supposedly) synced, how do I manage appends and the syncing? For example, I have a_p1 = BAT[void, int] and a_p2 = BAT[void, int] (a_p1 = <a> type things with property p1) (a_p2 = <a> type things with property p2) I wish to insert a new <a>=(1,2) thing. Do I do? bat("a_p1").append(1); bat("a_p2").append(2); How do I ensure that the "voids" are always synced? Is this just "assumed"? What if I fail in the middle of operations (such that a_p1 has an extra row) and stuck with dirty tables? Do I do a global "abort();" or a "clean();" to get back to a safe state? Will this reset the seqbase? (Is this documented someplace? I am trying to look at the architecture docs but cannot find anything that concretely answers similar questions.) (2) How does sync() work? Do I do a: sync(bat("a_p1"), bat("a_p2")); once and only once? After every update/append to the tables? There is the phrase "When two BATs effectively contain the same sequence of head elements, we call them 'synced'. This is implemented by storing a very large OID for each column. An update to the column destroys this OID.". This sounds as if any update/append breaks the correspondence, and also that it will limit the number of rows in the table (as an OID is a 32-bit number of 32-bit systems). If I have 10 properties I am tracking, would I pick one as the "primary" BAT, and sync() all others to it? Regards! Ed
I am not sure on this, but maybe it helps you a bit further for now. Edmund Dengler wrote:
Greetings!
I wish to insert a new <a>=(1,2) thing. Do I do? bat("a_p1").append(1); bat("a_p2").append(2);
How do I ensure that the "voids" are always synced? Is this just "assumed"?
I think you are responsible yourself to keep the bats "synched". That means that you have to ensure they are the same length. As far as I know, a void is just a virtual oid, which is just a number going up each record. I could be wrong.
There is the phrase "When two BATs effectively contain the same sequence of head elements, we call them 'synced'. This is implemented by storing a very large OID for each column. An update to the column destroys this OID.". This sounds as if any update/append breaks the correspondence, and also that it will limit the number of rows in the table (as an OID is a 32-bit number of 32-bit systems).
Correct. The number of rows in a BAT are limited by the number of bits your system has.
If I have 10 properties I am tracking, would I pick one as the "primary" BAT, and sync() all others to it?
I think SQL keeps internally the bats equal in length and does the administration itself also. I hope this made some sense. A more precise answer follows hopefully soon.
Hi Ed, finally, I have time to answer your questions...
Greetings!
Just doing some more digging around MonetDB, and have a couple of questions.
(1) If I am managing multiple BATs that are (supposedly) synced, how do I manage appends and the syncing?
For example, I have a_p1 = BAT[void, int] and a_p2 = BAT[void, int]
(a_p1 = <a> type things with property p1) (a_p2 = <a> type things with property p2)
I wish to insert a new <a>=(1,2) thing. Do I do? bat("a_p1").append(1); bat("a_p2").append(2);
right, that's what you need to do. (The SQL front-end, e.g., does excatly this.)
How do I ensure that the "voids" are always synced? Is this just "assumed"?
Two (or more) void columns are treated as synced, if they have the same seqbase and the same lenght (i.e., number of tuples/BUNs).
What if I fail in the middle of operations (such that a_p1 has an extra row) and stuck with dirty tables? Do I do a global "abort();" or a "clean();" to get back to a safe state? Will this reset the seqbase?
If you fail half-way, the way to get back a consistend state is indeed to do a global "abort();". This sets you back to the state right after the last "commit();". Hence, if you need to be "on the save side, you could call commit() after each set of appends that form one relational tuple. Note however, that commit() is expensive because it need to flush the respective changes to disk; hence, you might want to call it less often. The "optimal" frequency depends on how much performance you/your application needs respectively how much data "loss" you/your application can bear with in case of a failure + abort()... Neither the appends nor a commit or abort do change the seqbase of a void column!
(Is this documented someplace? I am trying to look at the architecture docs but cannot find anything that concretely answers similar questions.)
I'm afraid, there is not too much documentation about these issues except from the code itself. I'll try to have a look, but given the limited man power we have to cope with, I cannot promise anything for now. Improving the documentation is on our todo list, and questions like these will help us to find out which documentation is needed most.
(2) How does sync() work?
"sync()" is related to transaction management (it "save all persistent BATs"), however, it is not supposed to be used "under normal circumstances. You should use "commit()" instead!
Do I do a: sync(bat("a_p1"), bat("a_p2"));
once and only once? After every update/append to the tables?
Hm, I'm not aware of any "sync(BAT,BAT)".
There is the phrase "When two BATs effectively contain the same sequence of head elements, we call them 'synced'. This is implemented by storing a very large OID for each column. An update to the column destroys this OID.". This sounds as if any update/append breaks the correspondence, and also that it will limit the number of rows in the table (as an OID is a 32-bit number of 32-bit systems).
Since your dealing with void-headed BATs, here, "syncedness" (or better "alignedness") of BATs it trivial and given by default: Two (or more) void-headed BATs are treated as synced/aligned, if their heads have the same seqbase and the BATs have the same lenght (i.e., number of tuples/BUNs). Hence, "syncedness"/"alignedness" is completely handled by the kernel, and you don't need to do anything. Independent of "syncedness"/"alignedness", on a 32-bit system the size of a BAT is limited by either 2^32 bytes or 2^32 rows, whatever is less.
If I have 10 properties I am tracking, would I pick one as the "primary" BAT, and sync() all others to it?
As said before, no need for explicte syncing (it IMHO does not even exist in MIL). Al you need to do create all 10 BAT with the same seqbase, and ensure that all your "append()"'s are synchronized as described above. I hope, this answers your questions. Please don't hesitate to contact us again, if things are still unclear, or once new questions arise! Regards, Stefan
Regards! Ed
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
Thanks Stefan! I do appreciate the time you take to answer questions, and appreciate your being busy! On Thu, 21 Apr 2005, Stefan Manegold wrote:
I wish to insert a new <a>=(1,2) thing. Do I do? bat("a_p1").append(1); bat("a_p2").append(2);
right, that's what you need to do. (The SQL front-end, e.g., does excatly this.)
I assume that if 2 or more processes want to insert into that particular group of BATs, I will need to use some sort of transaction to ensure ordering of the inserts (ie, if 2 processes do the above, the timing could be P1, then P2, then P2, then P1, causing an inconsistent ordering). How does the SQL side handle this for a table "insert"?
Two (or more) void columns are treated as synced, if they have the same seqbase and the same lenght (i.e., number of tuples/BUNs).
So I should set the seqbase to something unique for the group of BATs.
What if I fail in the middle of operations (such that a_p1 has an extra row) and stuck with dirty tables? Do I do a global "abort();" or a "clean();" to get back to a safe state? Will this reset the seqbase?
If you fail half-way, the way to get back a consistend state is indeed to do a global "abort();". This sets you back to the state right after the last "commit();". Hence, if you need to be "on the save side, you could call commit() after each set of appends that form one relational tuple. Note however, that commit() is expensive because it need to flush the respective changes to disk; hence, you might want to call it less often. The "optimal" frequency depends on how much performance you/your application needs respectively how much data "loss" you/your application can bear with in case of a failure + abort()...
Neither the appends nor a commit or abort do change the seqbase of a void column!
I am assuming that one issue is that the database is dirty, and can be seen from another client (ACID properties on a database such as Oracle ensure that two clients cannot see changes the other has made that is uncommitted). In this case, I cannot undo _just_ the changes one client has made, but must undo the whole set. If I use the transaction support to "label" a transaction, can I do a proper cleanup that way? Is there a way to isolate other clients from dirty states/changes made by one client? (As an aside, is there a way to provide client isolation? The "trans" module does not seem to support specifying individual transactions or locking? The only other module I seem to find is "lock", but I am unsure how to use this between clients. Place the lock in a BAT with a label? Then how would atomic access to that BAT be achieved? Even if I use the locks for access control, there doesn't seem to be anything that provides the appropriate isolation from changes Ie, if I start a transaction, and then the client dies, do other clients see partial operations?)
(2) How does sync() work?
"sync()" is related to transaction management (it "save all persistent BATs"), however, it is not supposed to be used "under normal circumstances. You should use "commit()" instead!
Do I do a: sync(bat("a_p1"), bat("a_p2"));
once and only once? After every update/append to the tables?
I guess I got confused. I typed <sync> when I meant <synced>.
There is a
Hi Ed, (and another delayed reply ... my apologies ...)
I wish to insert a new <a>=(1,2) thing. Do I do? bat("a_p1").append(1); bat("a_p2").append(2);
right, that's what you need to do. (The SQL front-end, e.g., does excatly this.)
I assume that if 2 or more processes want to insert into that particular group of BATs, I will need to use some sort of transaction to ensure ordering of the inserts (ie, if 2 processes do the above, the timing could be P1, then P2, then P2, then P1, causing an inconsistent ordering). How does the SQL side handle this for a table "insert"?
Yes. By default, MonetDB provieds only very limited transaction support on individual BATs. If necessary, front-ends are supposed to implement more transaction mechanism themselves using the tools that are provided byt modules "trans" and "lock". The reason is, that we what to keep the kernel as "overhead-free" as possible to acheive max. performance with "mainly-read" workloads (OLAP) that do not require much less transactions support than OLTP workloads. The SQL front-end uses a timestamp-based optimistic transaction protocol: (If I recall correctly --- Niels, please correct me if I'm wrong!) All updates are done locally in extra insert/update/delete BATs for each transactions. At commit time, the front-end checks whether the updates of the committing transaction would/did conflict with some other transaction. If so, the committing trans action is aborted. If there were no conflicts, the updates are applied to the original persistent BATs. In your case, I see two options: 1) for each insert (append) of one row to the set of BATs that hold the individual attributes, each client would first need to aquire a lock (see module lock) before access to the respecitve BATs (not, one "global" lock for all BATs the make up one row!). only once the lock is granted, the client can insert. obviously, there are two problems: a) this kind of synchronisation mechanism will not scale with many concurrent clients b) it does not allow single-client commits and/or aborts; only global commits/aborts are possible. 2) in case your updates is append-only, and the order between clients does not matter, and you can live with the fact that the "main"/"global" BATs are not completely up-to-date at every moment in time: - keep (small) local BATs for each client for the (then lock-free) appends; - have each client append to its own BATs - every N insert (per client; e.g., N=1000) have each client commit, append its local BATs "in bulk" to the respective global BATs (guarded by a lock as described above), empty its local BATs, and proceed.
Two (or more) void columns are treated as synced, if they have the same seqbase and the same lenght (i.e., number of tuples/BUNs).
So I should set the seqbase to something unique for the group of BATs.
Unless the abolute value of the void head have some meaning to you, I'd propose the set all seqbases to 0@0; in any case, the seqbases of all BATs that make up one row must be identical (per partition). Note: The "syncedness" of BATs plays only a role with internal optimizations of the kernel when choosing the best strategies/algorithms for read-only operations such as select or join. MonetDB does/can not use the syncedness to keep BATs synched in case of updates.
What if I fail in the middle of operations (such that a_p1 has an extra row) and stuck with dirty tables? Do I do a global "abort();" or a "clean();" to get back to a safe state? Will this reset the seqbase?
If you fail half-way, the way to get back a consistend state is indeed to do a global "abort();". This sets you back to the state right after the last "commit();". Hence, if you need to be "on the save side, you could call commit() after each set of appends that form one relational tuple. Note however, that commit() is expensive because it need to flush the respective changes to disk; hence, you might want to call it less often. The "optimal" frequency depends on how much performance you/your application needs respectively how much data "loss" you/your application can bear with in case of a failure + abort()...
Neither the appends nor a commit or abort do change the seqbase of a void column!
I am assuming that one issue is that the database is dirty, and can be seen from another client (ACID properties on a database such as Oracle ensure that two clients cannot see changes the other has made that is uncommitted). In this case, I cannot undo _just_ the changes one client has made, but must undo the whole set. If I use the transaction support to "label" a transaction, can I do a proper cleanup that way? Is there a way to isolate other clients from dirty states/changes made by one client?
In MIL, changes to globally available BATs are immediately visible to all clients. (Persistent BATs are globally visible.) It's the reposibility of the front-end (like SQL) to implement user/client isolation. I sketched solution for your case under option 2) above.
(As an aside, is there a way to provide client isolation? The "trans" module does not seem to support specifying individual transactions or locking? The only other module I seem to find is "lock", but I am unsure how to use this between clients. Place the lock in a BAT with a label? Then how would atomic access to that BAT be achieved? Even if I use the locks for access control, there doesn't seem to be anything that provides the appropriate isolation from changes Ie, if I start a transaction, and then the client dies, do other clients see partial operations?)
The idea is to create a lock (or more for several purposes) (cf. "lock_create()") that is know to all client; either by keeping it in a variable (say "VAR MyLock := lock_create();") that was created before the client listener was started or by keeping it in a persistent BAT. Then, before accessing the BATs, a client need to aquire that lock: lock_set(MyLock); will block the client until the lock is available and granted to the calling client. lock_try(MyLock); will return immediately with ==0 if the lock was availble, in that case it is granted to the calling client !=0 if the lock was not available Once the client has been granted the lock, it can access the respective BATs. Once the clinet has finished it action on the respective BATs, it need to release the lock: "lock_unset(MyLock);" As said above, we kept these features outside the client to avoid the related overhead for the worksloads that MonetDB was primarily desined for and that do not require them.
(2) How does sync() work?
"sync()" is related to transaction management (it "save all persistent BATs"), however, it is not supposed to be used "under normal circumstances. You should use "commit()" instead!
Do I do a: sync(bat("a_p1"), bat("a_p2"));
once and only once? After every update/append to the tables?
I guess I got confused. I typed <sync> when I meant <synced>.
There is a
for testing syncedness.
[...] I'll have to stop for now. Sorry. I'll come back to your syncedness questions asap. Regards, Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
Hi Ed,
... finally, I'm back ...
In case you store a relational tuple (say
From the bat documentation:
"For BAT-sets that contain exactly the same sets of OIDs, those OID-columns can be marked synced(). This information is then used to use positional lookup (the i-th element in BAT1 corresponds with the i-th in BAT2) and to turn semijoin- into very cheap copy-operations, etc."
almost makes it sound as if you need synched() to actual test and set this bit. Is this true? How is the "syncedness" ensured when I have a BAT[OID, any], or determined?
As mentioned earlier, the syncedness is only used internally. There is no way to set it explicitely from MIL. For VOID columns and dense, ascending OID columns, syncedness is detected only from their seqbase and #BUNs (see above and below).
From the bat documentation:
"This is implemented by storing a very large OID for each column. An update to the column destroys this OID."
From your explanation, VOIDs are handled by seqbase and # of buns. For OID columns in a BAT (ie, BAT[OID, any]), is this just the first BUN?
Note, the "very large OID" that is used to identify synced BATs is NOT the same as their seqbase(s). In case of VOID or dense, ascending OID columns, syncedness is given, if two columsn have the same seqbase & same length (#BUNs); in this case, the "very large OID" is not needed/used at all. In other cases (see e.g., the "[+]" example above), MonetDB assignes both BATs the same "very large OID" to record there syncedness. Thus with later operations that use two BATs that have the same "very large OID", MonetDB knows (assumes) that these BATs are synced. [For dense, ascending OID columns, the seqbase is in deed the value of the first BUN, i.e., the smallest value in that column (see above). For all other OID columns, there obviously is no seqbase.]
For update operations, does this mean that in the middle of inserting into a set of BATs, that as the BUN count is different, a concurrent query may not realize that the two BATs are synced? (this gets back to multiple client isolation for changes)
Since there is no way to detect "syncedness maintaining updates" in MonetDB, explicitely marked syncedness has to be "un-marked" (by "destroying" the "very large OID"). In general, in cases like your's, i.e., updates of a set of BATs that represent the attributes of a wide (relational) table, synced can only be maintained (and hence exploited) by using head columsn that are either VOID or at least dense, ascending OIDs. (If your scenario is append-only, VOID columns are "perfect"!) Syncedness as described in the documentation with the "very large OID" is only applicable for derived BATs (as described above) with the [V]oid columns no being modified. As a summary, in your case, you don't have to worry abotu syncedness! If you can/do use VOID-headed BATs (or at least OID-headed BATs with the OIDs forming dense, ascending sequences), you get syncedness "for free". Otherwise, there is no way to maintain/use syncedness at all. I hope, this helps you. Please let us know, if you have more questions, problems, comments, etc. concerning MonetDB! Regards, Stefan -- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
participants (3)
-
Edmund Dengler
-
Fabian
-
Stefan.Manegold@cwi.nl