Mercurial > hg > MonetDB-extend
changeset 14:5a5167adae4a
Updates and corrections for current MonetDB.
author | Sjoerd Mullender <sjoerd@acm.org> |
---|---|
date | Thu, 07 Dec 2017 17:44:37 +0100 (2017-12-07) |
parents | a3465119dc5b |
children | 59bbfa0096b3 |
files | reverse/README.rst |
diffstat | 1 files changed, 49 insertions(+), 16 deletions(-) [+] |
line wrap: on
line diff
--- a/reverse/README.rst Thu Dec 07 16:46:57 2017 +0100 +++ b/reverse/README.rst Thu Dec 07 17:44:37 2017 +0100 @@ -35,6 +35,31 @@ where ``table`` is an SQL table with a column called ``strcol`` which is of type ``VARCHAR`` (or any other string type). +A Note About Names +------------------ + +The name *BAT* originally stood for *Binary Association Table*. This +suggests, correctly, that there were actually two values per row in a +BAT that were associated. The two values where the *head* and *tail* +values of, what was called, a *Binary UNit* or *BUN*. The head and +tail columns of a BAT had independent types, but it turned out that +most of the time we used the type *oid* (*Object IDentifier*) for the +head column, and the values were, most of the time, consecutive. +Since then we have made sure that the head values were always of type +oid and consecutive, which meant that we could do away with the +complete head column. The only thing that we still needed (and still +need) is the first value of the old head column. This value is called +*hseqbase* (head sequence base). There are still many vestiges of the +old head and tail columns, especially when accessing values in what +used to be the tail column and now is the only column. So we have +functions such as ``BUNtail`` that have nothing to do anymore with a +tail. Also, the term BUN was repurposed to being the name we have +given to the index of the C array that holds the values of what used +to be the tail column. For more information see the blog post +`MonetDB goes headless`__. + +__ https://www.monetdb.org/blog/monetdb-goes-headless + Implementation -------------- @@ -241,8 +266,8 @@ bn = COLnew(b->hseqbase, TYPE_str, BATcount(b), TRANSIENT); -The arguments of ``COLnew`` are the seqbase for the *head* and the -type of the *tail* columns, the initial size of the to-be-allocated +The arguments of ``COLnew`` are the head seqbase, the type of the +column, the initial size (in number of entries) of the to-be-allocated BAT, and ``TRANSIENT`` to indicate that this BAT is temporary. ``COLnew`` guarantees that there is space for at least the specified number of elements, or it returns ``NULL``. Since we call @@ -250,8 +275,15 @@ about the size of the new BAT (``BUNappend`` takes care of resizing if necessary), but from an efficiency point of view, it's better to create the BAT with the required size (growing a BAT can be -expensive). We set the sequence base for the head column of the new -BAT to be the same as that of the input BAT. +expensive). We must set the head sequence base of the new BAT to be +the same as that of the input BAT. + +Note that for variable-sized types (such as the strings we are dealing +with here), ``BUNappend`` can still fail due to not enough memory, +even though we supposedly allocated enough. The strings have to be +stored somewhere, and ``COLnew`` has no way of knowing how large the +total are for the strings must be, so ``BUNappend`` may have to grow +the memory area for the strings, and that can fail. Iterating through the source BAT is done using a standard mechanism:: @@ -268,7 +300,7 @@ argument can be used inside the body as an argument to e.g. ``BUNtail``. -The body of the loop first retrieves the current value from the tail +The body of the loop first retrieves the current value from the column:: src = (const char *) BUNtail(bi, p); @@ -306,11 +338,11 @@ -------------- MonetDB makes extensive use of a number of property flags that can be -set on the columns of BATs. It is crucial that these property flags -don't lie. When the server is started with the ``-d10`` option, the -server checks these property flags and exits with a failed assertion -when a flag is set incorrectly (or the server issues a warning when it -was built with assertions disabled). +set on BATs. It is crucial that these property flags don't lie. When +the server is started with the ``-d10`` option, the server checks +these property flags and exits with a failed assertion when a flag is +set incorrectly (or the server issues a warning when it was built with +assertions disabled). Property flags are Boolean flags, i.e. they are either true (set) or false (not set). When a property flag is not set, it means that @@ -350,10 +382,11 @@ same time. When they are, it implies that all values are equal to each other. -The ``key`` property flag is actually two bits. The lower bit is set -if the property holds. If, in addition, the upper bit is also set, it -means that the property must hold, i.e. when an attempt is made to -insert a new value that already occurs, the insert must fail. +Next to the ``key`` property there is also a ``unique`` property. +The ``unique`` property, when set, indicates that all values in the +BAT *must* be distinct (as in the UNIQUE constraint in SQL). We're +not really concerned with this, since it is not used by the SQL +layer. When ``unique`` is set, then so must ``key``. When the ``sorted`` property is unset, the ``nosorted`` property is a position in the BAT where the previous value is not less than or @@ -364,11 +397,11 @@ locations whose values are equal. Note that most of the properties are true for an empty column, hence -when ``BATnew`` returns, all property flags except for ``nil`` are set +when ``COLnew`` returns, all property flags except for ``nil`` are set (there are no nils in an empty column). This means that as soon as you start adding data to a column, you must deal with the property flags. Note also that the function ``BUNappend`` maintains the property flags -the best it can. That is why in the example we didn't need to do +as best it can. That is why in the example we didn't need to do anything with the property flags.