Mercurial > hg > MonetDB-extend

.. This Source Code Form is subject to the terms of the Mozilla Public
.. License, v. 2.0.  If a copy of the MPL was not distributed with this
.. file, You can obtain one at http://mozilla.org/MPL/2.0/.
..
.. Copyright 2013-2021 MonetDB B.V.

.. This document is written in reStructuredText (see
   http://docutils.sourceforge.net/ for more information).
   Use ``rst2html.py`` to convert this file to HTML.

===========================================
Implementing a Simple User-Defined Function
===========================================

:Author: Sjoerd Mullender
:Organization: CWI, MonetDB B.V.
:Abstract: In this short tutorial we describe how to create a simple
   UDF for MonetDB/SQL.

Introduction
------------

In this directory we show how to make a simple user-defined function
(UDF) that can be used in MonetDB/SQL.  The function is implemented in
C.  In order to make the function available to SQL, we need to provide
the correct C interface, a MAL interface, and an SQL interface.  We
will discuss them all, starting from SQL and going down towards the
actual implementation.

We want to create a function that allows us to write something like
this:

.. code-block:: sql

 SELECT revstr(strcol) FROM table;

where ``table`` is an SQL table with a column called ``strcol`` which
is of type ``VARCHAR`` (or any other string type).

Implementation
--------------

We will first create an interface to do a simple one-value-at-a-time
(*scalar*) operation:

.. code-block:: sql

 SELECT revstr('string');

The SQL catalog will need to be extended with a definition of the
``revstr`` function as follows:

.. code-block:: sql

 CREATE FUNCTION revstr(src STRING) RETURNS STRING
        EXTERNAL NAME revstr.revstr;

The statement tells the SQL system that there is a function called
``revstr`` which takes a ``STRING`` argument and produces a
``STRING`` result.  The function is implemented using the MAL
interface ``revstr.revstr``.  Note that ``STRING`` is equivalent to
``CHARACTER LARGE OBJECT`` or ``CLOB``.

This statement will normally be executed once when the database is
created, after which it is part of the SQL catalog.  How this is
accomplished exactly we will leave until later in this tutorial.  For
now let it suffice to note that the SQL query is encoded as a C string
and stored in the variable ``reverse_sql``:

.. code-block:: c

  static char reverse_sql[] = "CREATE FUNCTION revstr(src STRING)"
          " RETURNS STRING EXTERNAL NAME revstr.revstr;";

At the SQL side we don't have to do anything more.

The SQL layer produces an intermediate code called MAL (which stands
for MonetDB Assembly Language).  The external name above refers to a
command that is defined in MAL, so now that we have the SQL interface,
we need to create the MAL interface.

The MAL interface of the function looks like this:

.. code-block::

 module revstr;

 command revstr(ra1:str):str
 address UDFreverse
 comment "Reverse a string";

The SQL engine uses the convention that a *bulk* variant of a scalar
operation (i.e., a variant that works on a complete column and
produces a column as opposed to a function that works on a single
value and produces a single value) has the same name but is located in
a module with the string ``bat`` prepended.  So, the bulk version of
the ``revstr.revstr`` function can also be created:

.. code-block::

 module batrevstr;

 command revstr(b:bat[:str]):bat[:str]
 address UDFBATreverse
 comment "Reverse a column of strings";

This MAL code also needs to be encoded in the C source.  This is done as
follows:

.. code-block:: c

  static mel_func reverse_init_funcs[] = {
	  command("revstr", "revstr", UDFreverse, false,
		  "Reverse a string",
		  args(1,2, arg("",str),arg("ra1",str))),
	  command("batrevstr", "revstr", UDFBATreverse, false,
		  "Reverse a BAT of strings",
		  args(1,2, batarg("",str),batarg("b",str))),
	  { .imp=NULL }		/* sentinel */
  };

A C array with elements of type ``mel_func`` is created, and each MAL
command is one element of this array.  The array ends with a sentinel, a
value that is "empty" and can thus be recognized as the end of the
array.

Each element in the array is an instance of the macro ``command`` which
has a bunch of arguments.  In order, they are: MAL module name (string),
MAL function name (string), C function (pointer to function), a Boolean
flag to indicate whether this is an "unsafe" operation (one with side
effects), a comment (string--not currently used but must be present), a
description of the function arguments.  The function arguments are
encoded using the ``args`` macro with the following arguments.  The
number of return values (MAL functions can return 1 or more values), the
total number of arguments (i.e. the sum of return values and input
arguments), and then for each return argument and each input argument a
description of the argument itself.  Each argument is an instance of a
macro.  There are various forms, but here we use two.  ``arg`` describes
a scalar argument and has two parameters, the MAL name of the argument
and the MAL type.  ``batarg`` describes a BAT argument and also has two
parameters, the MAL name of the argument and the MAL type of the
elements of the BAT.

Note that implementing a bulk version is optional.  If it does not
exist, the scalar version will be called multiple times.  However,
calling the scalar version multiple (sometimes very many) times incurs
significant overhead, hence it is usually a good idea to implement the
bulk version.

Now we come to the actual implementation of the feature.

The MAL interfaces of the scalar and bulk versions of the ``revstr``
function translates to the following C interfaces:

.. code-block:: c

 static char *UDFreverse(char **retval, const char **arg);
 static char *UDFBATreverse(bat *retval, const bat *arg);

The return value of the C functions is normally ``MAL_SUCCEED`` which
translates to ``NULL``.  If an error occurs, the return should be a
freshly allocated string that contains the error message.  The string
must be allocated with the function ``GDKmalloc`` or one of its
variants (``GDKzalloc``, ``GDKstrdup``) since it will be freed with
``GDKfree`` when the interpreter is done with the message.  Most of
the time this can be done by calling the macro ``throw`` which is
defined as ``return createException``, so calling ``throw`` with
arguments will actually call ``createException`` with those same
arguments and return the result.  The function ``createException`` has
three or more arguments, the first is ``MAL``, the second is the name
of the MAL function, the third is a ``printf`` format string, and
remaining arguments are values that are used by the format string.  A
minimal example is:

.. code-block:: c

  static char *UDFreverse(char **retval, const char **arg)
  {
      (void) retval; (void) arg; /* we're not using these */
      throw(MAL, "revstr.revstr", "Not yet implemented");
  }

MAL commands can return any number of values.  These values are
returned by the C function in the first arguments of the C function,
hence the two ``retval`` arguments.  The arguments of the MAL
interface follow the return values in the argument list of the C
function (the ``arg`` arguments).  We have added the ``const`` keyword
to indicate that the arguments will not be altered.

The MonetDB code usually uses the C type ``str`` which is defined to
be ``char *``, so you could define the functions also as:

.. code-block:: c

 static str UDFreverse(str *retval, const str *arg);
 static str UDFBATreverse(bat *retval, const bat *arg);

Note that the definitions are not entirely equivalent.  The target of
the ``const`` keyword is different for the first function.

The type ``bat`` is defined as ``int``, so you may see ``int`` in some
places in the MonetDB source code in equivalent positions.

These functions must be located in a dynamically loadable module
(``.so`` file on Linux, ``.dll`` on Windows), and this module must
have the name ``lib_reverse.so`` (or ``lib_reverse.dll``).  The
functions are only directly referenced from the ``reverse_init_funcs``
array that we have defined above, so the functions are declared as
``static`` functions.

Scalar Version
~~~~~~~~~~~~~~

We will now first focus on the implementation of the scalar function
and return to the bulk version later.

The input of the function ``UDFreverse`` is a (NULL-terminated)
string.  The function is called with a pointer to the string pointer,
so ``*arg`` is the actual string and ``**arg`` the first byte of the
string.

The result of the operation is also a (NULL-terminated) string.  Since
the caller does not know what the size of the result will be, it
provides a pointer to where the result is to be put.  The callee is
responsible for allocating the necessary space.  This means that we
need to do something like this:

.. code-block:: c

 *retval = GDKmalloc(size);
 // copy data into *retval

In the case of this function, calculating the needed space is easy,
although we need to do error checking as well:

.. code-block:: c

 *retval = GDKmalloc(strlen(*arg) + 1);
 if (*retval == NULL)
     throw(MAL, "revstr.revstr", MAL_MALLOC_FAIL);
 // reverse the string in *arg into *retval
 return MAL_SUCCEED;

In the actual algorithm we have also taken into account that strings
in MonetDB are always stored in the UTF-8 encoding.  In addition, our
implementation checks for the special value ``str_nil`` which is the C
representation of the SQL ``NULL`` value for strings (which in MAL is
called ``nil:str``).

Bulk Version
~~~~~~~~~~~~

The bulk version gets as input a pointer to a BAT identifier (a value
of type ``bat``).  It also returns a BAT identifier of a newly created
BAT through the output pointer.

It is important to realize that BATs are reference counted with two
reference counters.  There is a *logical* reference count (``lref``)
and a *physical* reference count (``ref``).  C code is only allowed to
look at the actual data when the *physical* reference count is greater
than zero and the BAT is loaded into (virtual) memory.  A newly
created BAT has a physical reference count of one and a logical
reference count of zero.  Before returning a new BAT, the physical
reference count must be converted to a logical reference count.

When a C function is called by the MAL interpreter, it passes a
*logical* reference to any BAT parameters.  This means that the C
function must first make sure that the BAT gets a *physical* reference
and is loaded into memory.  We start with doing just that.  We do that
by calling ``BATdescriptor`` which increments the physical reference
count and loads the BAT into memory (if it wasn't there already):

.. code-block:: c

 BAT *b;
 b = BATdescriptor(*arg);
 if (b == NULL)
     throw(MAL, "batrevstr.revstr", RUNTIME_OBJECT_MISSING);

When we're done with this BAT, we will need to decrement the physical
reference count again.  We do that by calling ``BBPunfix``:

.. code-block:: c

 BBPunfix(b->batCacheid);

Note that ``b->batCacheid`` is equal to ``*arg``.

We need to create the result BAT ourselves.  We know the type, and we
know that the size of the BAT will be the same as the input BAT.
Hence we can use this code:

.. code-block:: c

 bn = COLnew(b->hseqbase, TYPE_str, BATcount(b), TRANSIENT);

The arguments of ``COLnew`` are the head seqbase, the type of the
column, the initial size (in number of entries) of the to-be-allocated
BAT, and ``TRANSIENT`` to indicate that this BAT is temporary.
``COLnew`` guarantees that there is space for at least the specified
number of elements, or it returns ``NULL``.  Since we call
``BUNappend`` to add entries to the BAT, we're actually not concerned
about the size of the new BAT (``BUNappend`` takes care of resizing if
necessary), but from an efficiency point of view, it's better to
create the BAT with the required size (growing a BAT can be
expensive).  We must set the head sequence base of the new BAT to be
the same as that of the input BAT.

Note that for variable-sized types (such as the strings we are dealing
with here), ``BUNappend`` can still fail due to not enough memory,
even though we supposedly allocated enough.  The strings have to be
stored somewhere, and ``COLnew`` has no way of knowing how large the
total area for the strings must be, so ``BUNappend`` may have to grow
the memory area for the strings, and that can fail.

Iterating through the source BAT is done using a standard mechanism:

.. code-block:: c

 BATiter bi;
 BUN p, q;
 bi = bat_iterator(b);
 BATloop(b, p, q) {
     ...
 }
 bat_iterator_end(&bi);

``BATloop`` is a macro that translates to a C ``for`` loop, using the
first argument to set the bounds, the second argument to iterate
through the BAT, and the third argument as the loop limit.  The second
argument can be used inside the body as an argument to
e.g. ``BUNtail``.

We make use of a BAT iterator (type ``BATiter``) which is initialized
with a call to ``bat_iterator(b)``.  Since this call increments a
reference count, we need to match the initialization with a call to
``bat_iterator_end(&bi)``.  The calls to ``bat_iterator`` and
``bat_iterator_end`` cannot fail.  The need for ``bat_iterator_end`` is
new since the Jul2021 release (11.41.X).

The body of the loop first retrieves the current value from the
column:

.. code-block:: c

 src = (const char *) BUNtail(bi, p);

We then use this string in the same way as in the scalar function.
The reversed string in ``dst`` is appended to the result BAT:

.. code-block:: c

 BUNappend(bn, dst, false);

The third argument to ``BUNappend`` must always be ``false``.

``BUNappend`` returns ``GDK_SUCCEED`` for success or ``GDK_FAIL`` for
failure.  ``BUNappend`` is marked in the include file as a function of
which the result *must* be checked.  It is a good convention to always
check whether the result is (not) equal to ``GDK_SUCCEED`` so that if in
the future different errors are returned, the code keeps working.

BAT Properties
--------------

MonetDB makes extensive use of a number of property flags that can be
set on BATs.  It is crucial that these property flags don't lie.  When
the server is started with the ``-d10`` option, the server checks
these property flags and exits with a failed assertion when a flag is
set incorrectly (or the server issues a warning when it was built with
assertions disabled).

Property flags are Boolean flags, i.e. they are either true (set) or
false (not set).  When a property flag is not set, it means that
either the property doesn't hold or it is unknown whether the property
holds.

The property flags are

``tsorted``
  the column is sorted in ascending order

``trevsorted``
  the column is sorted in descending order

``tkey``
  all values in the column are distinct

``tnonil``
  there are no nil values in the column

``tnil``
  there are nil values in the column (this property isn't used
  internally)

In addition to these flags there are a few other properties:

``tnosorted``
  the location that proofs the column is *not* sorted

``tnorevsorted``
  the location that proofs the column is *not* reverse sorted

``tnokey[0]`` and ``tnokey[1]``
  two locations that together proof *not* all values are distinct

The property flags ``tsorted`` and ``trevsorted`` may both be set at
the same time.  When they are, it implies that all values are equal to
each other.

When the ``tsorted`` property is unset, the ``tnosorted`` property is
a position in the BAT where the previous value is not less than or
equal to the position itself.  If the ``tnosorted`` value is 0, we
truly don't know whether the BAT is sorted.  Similarly for the
``trevsorted`` and ``tnorevsorted`` properties.  The two ``tnokey``
values are either both 0 (we really don't know), or two distinct
locations whose values are equal.  The ``tno``... values *must* be
zero if the corresponding property is set, they may be zero if the
corresponding property in unset.

Note that most of the properties are true for an empty column, hence
when ``COLnew`` returns, all property flags except for ``nil`` are set
(there are no nils in an empty column).  This means that as soon as
you start adding data to a column, you must deal with the property
flags.  The simplest solution is to just clear all properties:

.. code-block:: c

  bn->tsorted = bn->trevsorted = false;
  bn->tkey = false;
  bn->tnil = false;
  bn->tnonil = false;
  bn->tnosorted = bn->tnorevsorted = 0;
  bn->tnokey[0] = bn->tnokey[1] = 0;

Note also that the function ``BUNappend`` maintains the property flags
as best it can.  That is why in the example we didn't need to do
anything with the property flags.

Returning the Result
--------------------

When the return BAT has been created and filled, it needs to be
returned.  This is done by assigning the BAT id to the space the result
pointer points to.  In addition, we need to deal with the reference
counting mentioned before.  The input BAT's reference that we got from
``BATdescriptor`` should be released with ``BBPunfix`` and one physical
reference to the new BAT needs to be converted to one logical reference.
This is done by a call to BBPkeepref.  And finally, the function should
return and indicate that no errors occurred.  This is done by returning
the value ``MAL_SUCCEED`` (which is just a ``NULL`` pointer).

.. code-block:: c

  BBPunfix(b->batCacheid);
  BBPkeepref(bn->batCacheid);
  *retval = bn->batCacheid;
  return MAL_SUCCEED;

Informing the Server
--------------------

So far we have created the necessary C code that can be called by the
interpreter when the appropriate SQL query is executed.  However, we
still need to tell the server that this code actually exists.  We have
already hinted at this, but here we will finish that part.

Once the ``.so`` or ``.ddl`` file has been created and installed, the
server needs to be told about it.  This is done be calling the server
with an extra argument:

.. code-block:: sh

  mserver5 --loadmodule=reverse ...

where ``...`` represents any other arguments not covered by this
tutorial.

When the server gets this ``--loadmodule`` argument, it loads the
library.  And here we use a trick that is available in dynamically
loaded libraries.  We tell the system to automatically execute some code
when the library is loaded:

.. code-block:: c

  #include "mal_import.h"
  #include "sql_import.h"
  #ifdef _MSC_VER
  #undef read
  #pragma section(".CRT$XCU",read)
  #endif
  LIB_STARTUP_FUNC(init_reverse)
  {
	  mal_module("revstr", NULL, reverse_init_funcs);
	  sql_register("revstr", reverse_sql);
  }

The ``LIB_STARTUP_FUNC`` macro is defined in one of the include files.
It has an argument which is a name that should be unique.  The
convention is ``init_`` followed by the name of the module.  This macro
is the start of a function definition, so it is followed by the body of
the function that is executed when the library is loaded.

The function calls two functions.  The first, ``mal_module``, registers
the MAL functions that we have defined.  The arguments are the name of
the module, an array of elements of type ``mel_atom`` (not used here),
and an array of elements of type ``mel_func`` which contains our MAL
functions.

The second function, ``sql_register`` registers the SQL query that needs
to be executed to enter the SQL function into the catalog.

Makefile
--------

To bring all of this together, we have a ``Makefile``.  It uses the
``pkgconf`` command to find the location of the MonetDB installation and
the arguments needed to compile the module.  (If you don't have
``pkgconf``, you may be able to replace it with ``pkg-config``.)  This
Makefile works on Fedora Linux if you have the package
``MonetDB-SQL-server5-devel`` with all its dependencies installed
(available starting in the Oct2020-SP2 bugfix release), and on
Debian/Ubuntu if you have the package ``monetdb5-sql-dev`` with all its
dependencies installed (available starting in the Oct2020-SP2 bugfix
release).  The file may need to be changed for other systems.  Note that
even in the Oct2020-SP5 release there are some include files missing.

A Note About Names
------------------

The name *BAT* originally stood for *Binary Association Table*.  This
suggests, correctly, that there were actually two values per row in a
BAT that were associated.  The two values where the *head* and *tail*
values of, what was called, a *Binary UNit* or *BUN*.  The head and
tail columns of a BAT had independent types, but it turned out that
most of the time we used the type *oid* (*Object IDentifier*) for the
head column, and the values were, most of the time, consecutive.
Since then we have made sure that the head values were always of type
oid and consecutive, which meant that we could do away with the
complete head column.  The only thing that we still needed (and still
need) is the first value of the old head column.  This value is called
*hseqbase* (head sequence base).  There are still many vestiges of the
old head and tail columns, especially when accessing values in what
used to be the tail column and now is the only column.  So we have
functions such as ``BUNtail`` that have nothing to do anymore with a
tail.  Also, the term BUN was repurposed to being the name we have
given to the index of the C array that holds the values of what used
to be the tail column.  For more information see the blog post
`MonetDB goes headless`__.

__ https://www.monetdb.org/blog/monetdb-goes-headless
author	Sjoerd Mullender <sjoerd@acm.org>
date	Thu, 24 Mar 2022 18:37:39 +0100 (2022-03-24)
parents	68263b10998e
children	02895996506d