changeset 14:5a5167adae4a

Updates and corrections for current MonetDB.
author Sjoerd Mullender <sjoerd@acm.org>
date Thu, 07 Dec 2017 17:44:37 +0100 (2017-12-07)
parents a3465119dc5b
children 59bbfa0096b3
files reverse/README.rst
diffstat 1 files changed, 49 insertions(+), 16 deletions(-) [+]
line wrap: on
line diff
--- a/reverse/README.rst	Thu Dec 07 16:46:57 2017 +0100
+++ b/reverse/README.rst	Thu Dec 07 17:44:37 2017 +0100
@@ -35,6 +35,31 @@
 where ``table`` is an SQL table with a column called ``strcol`` which
 is of type ``VARCHAR`` (or any other string type).
 
+A Note About Names
+------------------
+
+The name *BAT* originally stood for *Binary Association Table*.  This
+suggests, correctly, that there were actually two values per row in a
+BAT that were associated.  The two values where the *head* and *tail*
+values of, what was called, a *Binary UNit* or *BUN*.  The head and
+tail columns of a BAT had independent types, but it turned out that
+most of the time we used the type *oid* (*Object IDentifier*) for the
+head column, and the values were, most of the time, consecutive.
+Since then we have made sure that the head values were always of type
+oid and consecutive, which meant that we could do away with the
+complete head column.  The only thing that we still needed (and still
+need) is the first value of the old head column.  This value is called
+*hseqbase* (head sequence base).  There are still many vestiges of the
+old head and tail columns, especially when accessing values in what
+used to be the tail column and now is the only column.  So we have
+functions such as ``BUNtail`` that have nothing to do anymore with a
+tail.  Also, the term BUN was repurposed to being the name we have
+given to the index of the C array that holds the values of what used
+to be the tail column.  For more information see the blog post
+`MonetDB goes headless`__.
+
+__ https://www.monetdb.org/blog/monetdb-goes-headless
+
 Implementation
 --------------
 
@@ -241,8 +266,8 @@
 
  bn = COLnew(b->hseqbase, TYPE_str, BATcount(b), TRANSIENT);
 
-The arguments of ``COLnew`` are the seqbase for the *head* and the
-type of the *tail* columns, the initial size of the to-be-allocated
+The arguments of ``COLnew`` are the head seqbase, the type of the
+column, the initial size (in number of entries) of the to-be-allocated
 BAT, and ``TRANSIENT`` to indicate that this BAT is temporary.
 ``COLnew`` guarantees that there is space for at least the specified
 number of elements, or it returns ``NULL``.  Since we call
@@ -250,8 +275,15 @@
 about the size of the new BAT (``BUNappend`` takes care of resizing if
 necessary), but from an efficiency point of view, it's better to
 create the BAT with the required size (growing a BAT can be
-expensive).  We set the sequence base for the head column of the new
-BAT to be the same as that of the input BAT.
+expensive).  We must set the head sequence base of the new BAT to be
+the same as that of the input BAT.
+
+Note that for variable-sized types (such as the strings we are dealing
+with here), ``BUNappend`` can still fail due to not enough memory,
+even though we supposedly allocated enough.  The strings have to be
+stored somewhere, and ``COLnew`` has no way of knowing how large the
+total are for the strings must be, so ``BUNappend`` may have to grow
+the memory area for the strings, and that can fail.
 
 Iterating through the source BAT is done using a standard mechanism::
 
@@ -268,7 +300,7 @@
 argument can be used inside the body as an argument to
 e.g. ``BUNtail``.
 
-The body of the loop first retrieves the current value from the tail
+The body of the loop first retrieves the current value from the
 column::
 
  src = (const char *) BUNtail(bi, p);
@@ -306,11 +338,11 @@
 --------------
 
 MonetDB makes extensive use of a number of property flags that can be
-set on the columns of BATs.  It is crucial that these property flags
-don't lie.  When the server is started with the ``-d10`` option, the
-server checks these property flags and exits with a failed assertion
-when a flag is set incorrectly (or the server issues a warning when it
-was built with assertions disabled).
+set on BATs.  It is crucial that these property flags don't lie.  When
+the server is started with the ``-d10`` option, the server checks
+these property flags and exits with a failed assertion when a flag is
+set incorrectly (or the server issues a warning when it was built with
+assertions disabled).
 
 Property flags are Boolean flags, i.e. they are either true (set) or
 false (not set).  When a property flag is not set, it means that
@@ -350,10 +382,11 @@
 same time.  When they are, it implies that all values are equal to
 each other.
 
-The ``key`` property flag is actually two bits.  The lower bit is set
-if the property holds.  If, in addition, the upper bit is also set, it
-means that the property must hold, i.e. when an attempt is made to
-insert a new value that already occurs, the insert must fail.
+Next to the ``key`` property there is also a ``unique`` property.
+The ``unique`` property, when set, indicates that all values in the
+BAT *must* be distinct (as in the UNIQUE constraint in SQL).  We're
+not really concerned with this, since it is not used by the SQL
+layer.  When ``unique`` is set, then so must ``key``.
 
 When the ``sorted`` property is unset, the ``nosorted`` property is
 a position in the BAT where the previous value is not less than or
@@ -364,11 +397,11 @@
 locations whose values are equal.
 
 Note that most of the properties are true for an empty column, hence
-when ``BATnew`` returns, all property flags except for ``nil`` are set
+when ``COLnew`` returns, all property flags except for ``nil`` are set
 (there are no nils in an empty column).  This means that as soon as
 you start adding data to a column, you must deal with the property
 flags.
 
 Note also that the function ``BUNappend`` maintains the property flags
-the best it can.  That is why in the example we didn't need to do
+as best it can.  That is why in the example we didn't need to do
 anything with the property flags.