Mercurial > hg > MonetDB-extend

--- a/README.rst	Thu Jun 10 16:20:54 2021 +0200
+++ b/README.rst	Thu Jun 10 16:21:46 2021 +0200
@@ -68,7 +68,9 @@
 The tutorial in the ``regexp`` subdirectory contains an example of a
 FILTER FUNCTION.  These functions are used to filter values in a
 column.  The simplest example of a filter is (given a table ``t`` that
-has an integer column ``c``)::
+has an integer column ``c``):
+
+.. code-block:: sql

   SELECT * FROM t WHERE c = 0;
--- a/regexp/README.rst	Thu Jun 10 16:20:54 2021 +0200
+++ b/regexp/README.rst	Thu Jun 10 16:21:46 2021 +0200
@@ -22,7 +22,9 @@

 We want to create a function or set of functions that allows us to
 filter a column based on whether a regular expression matches.  When
-we're done, we will be able to issue a query such as this::
+we're done, we will be able to issue a query such as this:
+
+.. code-block:: sql

   SELECT name FROM t WHERE [name] rematch ['^mue?ller$'];

@@ -34,7 +36,9 @@

 Note that this query is similar to queries using the LIKE predicate,
 although regular expressions are more powerful than the patterns
-allowed by LIKE.  The following three queries are equivalent::
+allowed by LIKE.  The following three queries are equivalent:
+
+.. code-block:: sql

   SELECT name FROM t WHERE name LIKE '%foo%';
   SELECT name FROM t WHERE [name] sys."like" ['%foo%'];
@@ -51,14 +55,18 @@
 both sides of the filter function.  Given two tables with STRING
 columns where one table contains values and the other regular
 expressions, we can issue queries such as the following (these two
-queries are equivalent)::
+queries are equivalent):
+
+.. code-block:: sql

   SELECT t1.value FROM t1 JOIN t2 ON [t1.value] sys.rematch [t2.pattern];
   SELECT t1.value FROM t1, t2 WHERE [t1.value] sys.rematch [t2.pattern];

 The function ``rematch`` that we will create, and also the function
 ``like`` that already exists are FILTER functions.  We can also use
-the filter function for simple (scalar) queries, such as::
+the filter function for simple (scalar) queries, such as:
+
+.. code-block:: sql

   SELECT ['some string'] sys."like" ['%foo%'];

@@ -71,7 +79,9 @@
 It can be useful to pass some extra flags to the matching operator to
 tell it to do case insensitive matching, or always match the whole
 string vs. finding a matching substring.  We will be able to call this
-variant is as follows::
+variant is as follows:
+
+.. code-block:: sql

   SELECT ['some string'] sys.rematch ['.*foo.*', 'i'];
   SELECT name FROM t WHERE [name] sys.rematch ['.*foo.*', 'i'];
@@ -89,7 +99,9 @@
 SQL
 ...

-The SQL interface is simple::
+The SQL interface is simple:
+
+.. code-block:: sql

   CREATE FILTER FUNCTION rematch(val STRING, pat STRING)
          EXTERNAL NAME regexp.rematch;
@@ -112,7 +124,9 @@
 This statement will normally be executed once when the database is
 created, after which it is part of the SQL catalog.  To accomplish
 this we need to store the SQL query in a C string (also see the
-*reverse* tutorial)::
+*reverse* tutorial):
+
+.. code-block:: c

   static char regexp_sql[] = "CREATE FILTER FUNCTION rematch(val STRING, pat STRING) "
 	  "EXTERNAL NAME regexp.rematch; "
@@ -135,7 +149,9 @@
 with a single value for each input value.  See the *reverse* tutorial.

 The interface looks like this.  First the variant without the
-``flags`` argument::
+``flags`` argument:
+
+.. code-block::

   module regexp;

@@ -158,7 +174,9 @@
   address regexpmatchbulk
   comment "Return a BAT with true for match and false for no match";

-The variant with the ``flags`` argument looks like this::
+The variant with the ``flags`` argument looks like this:
+
+.. code-block::

   module regexp;

@@ -182,7 +200,9 @@
   comment "Return a BAT with true for match and false for no match";

 We encode these MAL commands in a C array (again, see the *reverse*
-tutorial)::
+tutorial):
+
+.. code-block:: c

   static mel_func regexp_init_funcs[] = {
 	  command("regexp", "rematch", regexpmatch, false,
@@ -223,7 +243,7 @@

 __ http://pcre.org/

-::
+.. code-block:: c

    pcre *re;        /* compiled regular expression */
    pcre_extra *sd;  /* studied regular expression */
@@ -243,7 +263,9 @@
 With this information, we can implement the first function easily.
 The first function is a simple scalar version, i.e. a version that
 works on a single value.  We also give the version with flags
-argument.  Since they are very similar, they share all code::
+argument.  Since they are very similar, they share all code:
+
+.. code-block:: c

   static char *
   regexpmatch(bit *ret, const char **val, const char **pat)
@@ -258,7 +280,9 @@
   }

 The function ``do_match`` does all the work and is in essence the
-above code (in real code we need to add error handling)::
+above code (in real code we need to add error handling):
+
+.. code-block:: c

   static char *
   do_match(bit *ret, const char *val, const char *pat, const char *flags)
@@ -287,7 +311,9 @@
 ``````

 The C interface of the two select functions (with and without flags)
-is as follows::
+is as follows:
+
+.. code-block:: c

   static char *regexpmatchselect(bat *ret, const bat *bid, const bat *sid,
           const char **pat, const bit *anti);
@@ -320,7 +346,9 @@
 There are a set of macros and functions that make using candidate
 lists very easy.  First an iterator structure is initialized, and then
 this structure is used to iterate through the candidate list.  The
-relevant code looks like this::
+relevant code looks like this:
+
+.. code-block:: c

   struct canditer ci;
   canditer_init(&ci, b, s); /* s may be NULL for no candidate list */
@@ -349,7 +377,9 @@
 whether or not the property holds.

 The two C functions referenced above are so similar that they share
-all code::
+all code:
+
+.. code-block:: c

   static char *
   regexpmatchselect(bat *ret, const bat *bid, const bat *sid,
@@ -366,7 +396,9 @@
       return do_select(ret, *bid, sid ? *sid : 0, *pat, *flags, *anti);
   }

-The function ``do_select`` does all the work::
+The function ``do_select`` does all the work:
+
+.. code-block:: c

   static char *
   do_select(bat *ret, bat bid, bat sid, const char *pat,
@@ -376,7 +408,9 @@
   }

 First we check whether any of the string input arguments is NIL.  If
-they are, there are no matches and we're done quickly::
+they are, there are no matches and we're done quickly:
+
+.. code-block:: c

   if (strNil(pat) || strNil(flags)) {
       /* no matches when the pattern or the flags is NIL
@@ -389,14 +423,18 @@
   }

 After this we convert the ``flags`` string to the ``options`` value
-that the PCRE library wants::
+that the PCRE library wants:
+
+.. code-block:: c

   options = parseflags(flags);

 Then we load the two input BATs.  The parameters ``bid`` and ``sid``
 are BAT IDs, but we need BAT descriptors.  To get from one to the
 other we use the function ``BATdescriptor``.  Once we have a BAT
-descriptor, we need to free it again using ``BBPunfix``::
+descriptor, we need to free it again using ``BBPunfix``:
+
+.. code-block:: c

   s = NULL;
   b = BATdescriptor(bid);
@@ -408,20 +446,26 @@
 that there is enough space for this many rows, so if we ask for enough
 capacity to store the maximum potential result, we don't need to check
 during insertion.  We also set up a pointer to the start of the data
-area of the BAT::
+area of the BAT:
+
+.. code-block:: c

   bn = COLnew(0, TYPE_oid, ci.ncand, TRANSIENT);
   outp = (oid *) Tloc(bn, 0);

 Since we're going to use the search pattern many times (well,
 depending on the inputs, of course), we invest extra time to study the
-pattern::
+pattern:
+
+.. code-block:: c

   re = pcre_compile(pat, options, &err, &pos, NULL);
   sd = pcre_study(re, 0, &err);

 We then set up an auxiliary variable to help us iterate over the
-input::
+input:
+
+.. code-block:: c

   bi = bat_iterator(b);

@@ -439,7 +483,9 @@
 nil, and if not whether the value matches the regular expression.  If
 there is a match we add the ID of the value to the output.  Here,
 match takes the ``anti`` variable into account.  The code is as
-follows::
+follows:
+
+.. code-block:: c

   for (BUN i = 0; i < ci.ncand; i++) {
       oid o = canditer_next(&ci);
@@ -456,7 +502,9 @@
       }
   }

-Now we can release all resources::
+Now we can release all resources:
+
+.. code-block:: c

   if (s) BBPunfix(s->batCacheid);
   BBPunfix(b->batCacheid);
@@ -464,7 +512,9 @@
   pcre_free(re);

 Finally, we need to fix up the result BAT and return it.  As is, the
-BAT is still empty, so we set the size::
+BAT is still empty, so we set the size:
+
+.. code-block:: c

   BATsetcount(bn, (BUN) (outp - (oid *) Tloc(bn, 0)));

@@ -512,7 +562,9 @@
   values are distinct.  Must both be ``0`` if ``tkey == true``.

 Because of the way we created the output we know a number of these
-properties::
+properties:
+
+.. code-block:: c

   bn->tsorted = true;
   bn->tkey = true;
@@ -525,7 +577,9 @@
 We also know that the column is dense if there is at most a single row
 in it, but we can easily check whether the column is dense even if
 there are more rows: we just need to look at the difference between
-the first and last rows and compare that with the number of rows::
+the first and last rows and compare that with the number of rows:
+
+.. code-block:: c

   bn->tseqbase = oid_nil;
   if (BATcount(bn) > 1) {
@@ -534,7 +588,9 @@
           bn->tseqbase = outp[0];
   }

-Now we can return::
+Now we can return:
+
+.. code-block:: c

   *ret = bn->batCacheid;
   BBPkeepref(*ret);
@@ -544,7 +600,9 @@
 ````

 The C interface of the two join functions (with and without flags) is
-as follows::
+as follows:
+
+.. code-block:: c

   char *regexpmatchjoin(bat *lres, bat *rres, const bat *lid, const bat *rid,
           const bat *sl, const bat *sr,
--- a/reverse/README.rst	Thu Jun 10 16:20:54 2021 +0200
+++ b/reverse/README.rst	Thu Jun 10 16:21:46 2021 +0200
@@ -28,7 +28,9 @@
 actual implementation.

 We want to create a function that allows us to write something like
-this::
+this:
+
+.. code-block:: sql

  SELECT reverse(strcol) FROM table;

@@ -39,12 +41,16 @@
 --------------

 We will first create an interface to do a simple one-value-at-a-time
-(*scalar*) operation::
+(*scalar*) operation:
+
+.. code-block:: sql

  SELECT reverse('string');

 The SQL catalog will need to be extended with a definition of the
-``reverse`` function as follows::
+``reverse`` function as follows:
+
+.. code-block:: sql

  CREATE FUNCTION reverse(src STRING) RETURNS STRING
         EXTERNAL NAME reverse.reverse;
@@ -59,7 +65,9 @@
 created, after which it is part of the SQL catalog.  How this is
 accomplished exactly we will leave until later in this tutorial.  For
 now let it suffice to note that the SQL query is encoded as a C string
-and stored in the variable ``reverse_sql``::
+and stored in the variable ``reverse_sql``:
+
+.. code-block:: c

   static char reverse_sql[] = "CREATE FUNCTION reverse(src STRING)"
           " RETURNS STRING EXTERNAL NAME reverse.reverse;";
@@ -71,7 +79,9 @@
 command that is defined in MAL, so now that we have the SQL interface,
 we need to create the MAL interface.

-The MAL interface of the function looks like this::
+The MAL interface of the function looks like this:
+
+.. code-block::

  module reverse;

@@ -84,7 +94,9 @@
 produces a column as opposed to a function that works on a single
 value and produces a single value) has the same name but is located in
 a module with the string ``bat`` prepended.  So, the bulk version of
-the ``reverse.reverse`` function can also be created::
+the ``reverse.reverse`` function can also be created:
+
+.. code-block::

  module batreverse;

@@ -93,7 +105,9 @@
  comment "Reverse a column of strings";

 This MAL code also needs to be encoded in the C source.  This is done as
-follows::
+follows:
+
+.. code-block:: c

   static mel_func reverse_init_funcs[] = {
 	  command("reverse", "reverse", UDFreverse, false,
@@ -136,7 +150,9 @@
 Now we come to the actual implementation of the feature.

 The MAL interfaces of the scalar and bulk versions of the ``reverse``
-function translates to the following C interfaces::
+function translates to the following C interfaces:
+
+.. code-block:: c

  static char *UDFreverse(char **retval, const char **arg);
  static char *UDFBATreverse(bat *retval, const bat *arg);
@@ -154,7 +170,9 @@
 three or more arguments, the first is ``MAL``, the second is the name
 of the MAL function, the third is a ``printf`` format string, and
 remaining arguments are values that are used by the format string.  A
-minimal example is::
+minimal example is:
+
+.. code-block:: c

   static char *UDFreverse(char **retval, const char **arg)
   {
@@ -170,7 +188,9 @@
 to indicate that the arguments will not be altered.

 The MonetDB code usually uses the C type ``str`` which is defined to
-be ``char *``, so you could define the functions also as::
+be ``char *``, so you could define the functions also as:
+
+.. code-block:: c

  static str UDFreverse(str *retval, const str *arg);
  static str UDFBATreverse(bat *retval, const bat *arg);
@@ -203,13 +223,17 @@
 the caller does not know what the size of the result will be, it
 provides a pointer to where the result is to be put.  The callee is
 responsible for allocating the necessary space.  This means that we
-need to do something like this::
+need to do something like this:
+
+.. code-block:: c

  *retval = GDKmalloc(size);
  // copy data into *retval

 In the case of this function, calculating the needed space is easy,
-although we need to do error checking as well::
+although we need to do error checking as well:
+
+.. code-block:: c

  *retval = GDKmalloc(strlen(*arg) + 1);
  if (*retval == NULL)
@@ -244,7 +268,9 @@
 function must first make sure that the BAT gets a *physical* reference
 and is loaded into memory.  We start with doing just that.  We do that
 by calling ``BATdescriptor`` which increments the physical reference
-count and loads the BAT into memory (if it wasn't there already)::
+count and loads the BAT into memory (if it wasn't there already):
+
+.. code-block:: c

  BAT *b;
  b = BATdescriptor(*arg);
@@ -252,7 +278,9 @@
      throw(MAL, "batreverse.reverse", RUNTIME_OBJECT_MISSING);

 When we're done with this BAT, we will need to decrement the physical
-reference count again.  We do that by calling ``BBPunfix``::
+reference count again.  We do that by calling ``BBPunfix``:
+
+.. code-block:: c

  BBPunfix(b->batCacheid);

@@ -260,7 +288,9 @@

 We need to create the result BAT ourselves.  We know the type, and we
 know that the size of the BAT will be the same as the input BAT.
-Hence we can use this code::
+Hence we can use this code:
+
+.. code-block:: c

  bn = COLnew(b->hseqbase, TYPE_str, BATcount(b), TRANSIENT);

@@ -283,7 +313,9 @@
 total area for the strings must be, so ``BUNappend`` may have to grow
 the memory area for the strings, and that can fail.

-Iterating through the source BAT is done using a standard mechanism::
+Iterating through the source BAT is done using a standard mechanism:
+
+.. code-block:: c

  BATiter bi;
  BUN p, q;
@@ -299,12 +331,16 @@
 e.g. ``BUNtail``.

 The body of the loop first retrieves the current value from the
-column::
+column:
+
+.. code-block:: c

  src = (const char *) BUNtail(bi, p);

 We then use this string in the same way as in the scalar function.
-The reversed string in ``dst`` is appended to the result BAT::
+The reversed string in ``dst`` is appended to the result BAT:
+
+.. code-block:: c

  BUNappend(bn, dst, false);

@@ -378,7 +414,9 @@
 when ``COLnew`` returns, all property flags except for ``nil`` are set
 (there are no nils in an empty column).  This means that as soon as
 you start adding data to a column, you must deal with the property
-flags.  The simplest solution is to just clear all properties::
+flags.  The simplest solution is to just clear all properties:
+
+.. code-block:: c

   bn->tsorted = bn->trevsorted = false;
   bn->tkey = false;
@@ -401,7 +439,9 @@

 Once the ``.so`` or ``.ddl`` file has been created and installed, the
 server needs to be told about it.  This is done be calling the server
-with an extra argument::
+with an extra argument:
+
+.. code-block:: sh

   mserver5 --loadmodule=regexp ...

@@ -411,7 +451,9 @@
 When the server gets this ``--loadmodule`` argument, it loads the
 library.  And here we use a trick that is available in dynamically
 loaded libraries.  We tell the system to automatically execute some code
-when the library is loaded::
+when the library is loaded:
+
+.. code-block:: c

   #include "mal_import.h"
   #include "sql_import.h"