The MonetDB team at CWI/MonetDB BV is pleased to announce the Apr2019 feature release of the MonetDB suite of programs. More information about MonetDB can be found on our website at https://www.monetdb.org/. For details on this release, please see the release notes at https://www.monetdb.org/Downloads/ReleaseNotes. As usual, the download location is https://dev.monetdb.org/downloads/. Apr 2019 feature release (11.33.3) MonetDB5 Server * Changed the way blobs are ordered. Before, shorter blobs came before longer, now the contents is compared first. * Removed function blob.tostring() since it cannot guarantee that the resulting string is properly encoded in UTF-8. * Removed the type "sqlblob", changed the external format of the type "blob" to be like what "sqlblob" used to be. In other words, the "blob" type is now SQL-compatible. * The MAL functions algebra.sort and algebra.firstn now have a new, extra second-to-last argument, nilslast:bit, which indicates where NIL values are to be sorted: at the beginning, or at the end. The old behavior is when for algebra.sort, the value of nilslast is equal to the value or the reverse:bit parameter, for algebra.firstn when the value of nilslast is opposite to the value of the asc:bit parameter. Build Environment * We now use the C11 atomic operations if available, although libatomic_ops is still used if present and not disabled on the configure command line. * We now also build the Python 3 integration packages for Fedora, Debian, Ubuntu, and Windows (not for Debian 8, Ubuntu 14.04, or EPEL 6 and 7 which are too old). SELinux Support * There is one tunable parameter, mserver5_can_read_home, which can be set using "setsebool -P mserver5_can_read_home=true" to allow an mserver5 process started by monetdbd under the control of systemd to read files in users' home directories. Merovingian * Added a new database property named `profilerbeatfreq`. Its value, <freq>, should be an integer and, if it is set the command `profilerstart` will call stethoscope with this value as the argument to -b. This will instruct stethoscope to collect heartbeat events from the MonetDB server every <freq> milliseconds. Please note that no type checking is done, and if <freq> is not an integer, stethoscope will silently ignore it. Client Package * The mclient program can now be quit by typing quit or exit on the command-line. This makes it easier for novice users to quit the program if they do not know that the standard command to quit is \q. Mapi Library * Removed function mapi_cache_shuffle. * Removed function mapi_stream_query. Stream Library * The functions mnstr_write{Sht,Int,Lng,Hge} and their Array variants now swap bytes of the written values if the stream is set to swap bytes (i.e. big-endian on a little-endian machine and v.v.). * Removed defines ST_ASCII, ST_BIN, ST_READ, ST_WRITE from stream.h. * Changed function mnstr_set_byteorder(stream, char) to mnstr_set_bigendian(stream, bool) where the second argument should be `true' for specifying that the stream is bigendian. This sets an internal flag whether or not to swap bytes which can be retrieved with mnstr_get_swapbytes(stream). * Changed function `int mnstr_byteorder(stream)' to `bool mnstr_get_swapbytes(stream)'; it now returns whether bytes are swapped. * Changed function `int mnstr_type(stream)' to `bool mnstr_isbinary(stream)'. MonetDB Common * During processing, we now try to maintain the smallest and largest values that occur in a BAT. This information is not saved across server restarts. * Added an extra argument of type `bool' to the atom to string and string to atom methods to indicate whether the string is for internal use (casting the value to or from a string) of external use (to be printed). * Implemented a nilslast option for BATfirstn. If set, NILs come last in the ordering that BATfirstn simulates, so non-NIL values are preferentially returned. The old behavior can be obtained by setting nilslast to !asc(ending). * Implemented a nilslast option for BATsort. This option should be equal to the reverse option for stable sort (it is not implemented for stable sort), but can be different from reverse for non-stable sort. The functions BATsort and GDKqsort have extra parameters, the function GDKqsort_rev has been removed (superseded by GDKqsort with the new `reverse' parameter). * The BUNtail, BUNtvar, BUNtloc, and BUNtpos macros (and Tloc and Tpos) now return a `void *' instead of a `char *'. * Function PROPdestroy now takes a BAT* parameter. * A whole bunch of functions now take a bool argument instead of an int argument. Some functions now return a bool instead of an int. In all these cases, the int was used as a Boolean value. SQL Frontend * Implemented two-argument least/greatest functions. If one argument is NULL, the functions return the other value, otherwise they return the least/greatest of the two arguments. * Implemented SQL2003 natural logarithm function: ln(num_expr). * Function octet_length now also accepts a BLOB argument and returns the length of the BLOB (i.e. the same as length when called with a BLOB argument). * Implemented length(blob) and comparison between blobs. * Implemented a two argument function log which calculates the logarithm of the first argument using the second argument as base. * Next to the functions log (natural logarithm) and log10 (base 10 logarithm), there is now also log2 (base 2 logarithm). * Removed unused global SQL variable "history". * Allow usage of custom expressions in GROUP BY and PARTITION BY clauses. The same expressions can be used in the projection phase as long as it is exactly the same used in the grouping clause (textual lookup is performed). As an example the query: SELECT col1*2 FROM t1 GROUP BY col1*2; is correct, while SELECT sin(col1+5) FROM t1 GROUP BY col1*2; is wrong. * Allow usage of WITH clauses in insert, update, delete and merge statements. * Implemented merge statements from SQL:2003 standard. Using a source relation R, a target table T is updating depending on the result of the merge. In case of a match, the table T's row is either updated or deleted with R's row. In a non-match case, R's row is inserted into T. The grammar is the follows: > MERGE INTO target [ [AS] ident ] USING source ON search_condition { WHEN MATCHED [ AND search_condition ] THEN { UPDATE SET assignment_list | DELETE } } | { WHEN NOT MATCHED [ AND search_condition ] THEN INSERT [ column_list ] [ { DEFAULT VALUES | VALUES row_values } ] } * Added possibility to change the schema of a table with the syntax: > ALTER TABLE [ IF EXISTS ] qname SET SCHEMA ident * Added optional alias option for target table in update and delete statements: (e.g. UPDATE target AS alias SET col1=1 WHERE alias.col2 < 5) * Improved and extended storagemodel functionality. Changes include: * Added views sys."tablestorage" and sys."schemastorage" for easy and quick assesment of aggregated storage usage per table or per schema. * Excluded listing system tables in views sys."storage", sys."tablestorage" and sys."schemastorage". You can still use sys."storage"() to query storage of system tables and columns. * In procedure sys.storagemodelinit() when populating table sys.storagemodelinput it now excludes system tables as system tables are not useful to be modeled for storagesize by application users. It now also computes and populates the atomwidth column more correctly for variable size data types (strings, blobs) when the table has representative data in those columns. * System function sys.columnsize() has been corrected for types: tinyint, real, date, time, timetz, sec_interval, month_interval, decimal, uuid, mbr, char, clob, json, url, blob, geometry and geometrya. For variable size data types (varchar, char, clob, json, url, blob, geometry, geometrya) it now returns the columnsize excluding the variable heapsize. The heapsize is retrievable via sys.heapsize(). * System function sys.heapsize() has been corrected for all variable size data types: varchar, char, clob, json, url, blob, geometry and geometrya. * System function sys.imprintsize() has been corrected. The imprintsize depends on the width (1 or 2 or 4 or 8 or 16 bytes) of the data type. Also instead of 12% it now uses 20%, which is a better estimate. * System function sys.storagemodel() has been removed as it outputs the same data as view sys.storagemodel. Use view sys.storagemodel instead. * Corrected views sys.storagemodel and sys.tablestoragemodel by returning a computed orderidxsize when the count has been changed in the sys.storagemodelinput table. Also the views now return data ordered by schema, table and column names. * Extended view sys.tablestoragemodel with column: "storages". Besides columns also keys (primary, foreign and unique keys) and indexes (ordered, imprints) use storage, so the "storages" count can be higher than the number of columns per table. * Corrected the data type of columns "schema", "table", "column", "type", "mode" and location in table sys.storagemodelinput and functions sys."storage"(), sys."storage"(sname), sys."storage"(sname, tname) and sys."storage"(sname, tname, cname) from string into varchar(1024). Consequently also the views based on the table or functions will be reporting varchar(1024) as column meta data instead of clob for those columns. This allows faster querying and reporting by generic SQL programs which treat clob querying different from varchar columns. * Extended windowing functions catalog with SQL standard semantics. Standard aggregation functions can now be used in windowing functions: (avg,max,min,sum,prod,count). Other windowing specific functions were also implemented: (percent_rank,cume_dist,ntile, lag,lead,first_value,last_value,nth_value). * The standard frame specification was implemented for aggregation functions as well as first_value, last_value and nth_value functions. The available frames are rows, range and groups. Bounds can be unbounded (partition limit), the current row, a fixed number of rows (constant), or variable (column as input). (e.g SELECT COUNT(col1) OVER (PARTITION BY col2 ORDER BY col3 RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM t1;). * Added WINDOW keyword which is optionally provided after the FROM clause, with window specifications used in the projection (e.g SELECT SUM(col1) OVER w1, LAST_VALUE(col2) OVER w2 FROM t1 WINDOW w1 AS (ROWS BETWEEN 5 PRECEDING AND 0 FOLLOWING), w2 AS (w1);). * Our previous partitioning implementation didn’t impose order in the input. With this reexamination, partitioning now imposes ascending order by default, thus pairing with the industry standard implementation. * Implemented X'...' style binary string literals. * Implemented U&'...' Unicode character string literals and U&"..." Unicode delimited identifiers, including UESCAPE. For the string literals, you can have U&'...' '...' '...' UESCAPE '...' where the escape must be as single character and the other '...' strings are also Unicode character string literals. For now, these latter strings also undergo C-style backslash interpretation. * Implemented PostgreSQL-like E'...' strings. The strings can contain C-style backslash escapes. The old format strings '...' currently still also accept C-style escapes, but that feature will be removed in a future release. * Implemented the NULLS FIRST and NULLS LAST option to ORDER BY. The default is NULLS FIRST for ASC(ending) and NULLS LAST for DESC(ending). * Added possibility to rename SQL schemas, tables and columns with ALTER statements: > ALTER SCHEMA [ IF EXISTS ] ident RENAME TO ident > ALTER TABLE [ IF EXISTS ] qname RENAME TO ident > ALTER TABLE [ IF EXISTS ] qname RENAME [ COLUMN ] ident TO ident Also added optional IF EXISTS clause to other existing ALTER TABLE statements. * The sql SAMPLE syntax is extended to include an optional integral SEED parameter that sets the seed for the internal random number generator in the sample algorithm. When the seed is set, the user can obtain a reproducible sample set from a static dataset with the same sample query. The new syntax is SELECT ... FROM ... WHERE ... SAMPLE <expr> [SEED <integer>] * Implemented a version of COPY INTO that reads/writes files from/to the client instead of doing it in the server. This has the advantage that COPY INTO is then no longer restricted to only the "super user" monetdb, nor only to absolute file names. The syntax to have the server communicate with the client for file content is COPY INTO table FROM file ON CLIENT ...; and COPY query INTO file ON CLIENT ...;. This also works for COPY BINARY INTO. There is also the possibility to specify that files are to be read/written by the server by using ON SERVER. This is also the default when ON CLIENT or ON SERVER is not specified. * The sys.functions table now has an extra column called "system" of type BOOLEAN whose value is TRUE for system functions (i.e. functions that should not be dumped). The table sys.systemfunctions has been changed to a view and is now officially deprecated. * Extended merge tables with partitioning using a predicate: > CREATE MERGE TABLE [ IF NOT EXISTS ] table_name (... columns ...) [ PARTITION BY { RANGE | VALUES } { ON '(' column_name ')' | USING '(' expression ')' } ] The partitioning can occur by range or list of values using one of the table's columns or an expression. The domain of the partitioning scheme will be covered by each child table added with alter statements depending if the table is partitioned by range or list of values: > ALTER TABLE merge_table_name { ADD | SET } TABLE child_table_name AS PARTITION IN '(' expression [ ',' ... ] ')' [ WITH NULL ] > ALTER TABLE merge_table_name { ADD | SET } TABLE child_table_name AS PARTITION BETWEEN { RANGE MINVALUE | expression } AND { RANGE MAXVALUE | expression } [ WITH NULL ] > ALTER TABLE merge_table_name { ADD | SET } TABLE child_table_name AS PARTITION WITH NULL When updating a single partition range/list of values, the SET keyword should be used instead of ADD. INSERT, UPDATE and DELETE statements are possible on partitioned tables with corresponding validation on each partition domain. The system tables sys.table_partitions, sys.range_partitions and sys.value_partitions were added to store meta-information about each existing partition. Bug Fixes * 2403: stream: http read support in streams * 2416: GDK: file/dir creation mask * 2478: SQL: rename columns * 2496: SQL: implement greatest / least * 3384: Auxiliary window functions * 3416: log() SQL function only supports natural logarithm * 3448: Make lbatomic_ops optional if installed * 3520: Add support for scalar function LN(num) * 3530: sigabort on window aggr function * 3567: Add support for: ALTER TABLE [schema1.]oldtablename RENAME [TO] [schema2.]newtablename * 3743: DELETE FROM merge table not supported. * 3832: Cannot use expressions in GROUP BY clause * 3946: Expose C log2 function to SQL * 4056: mapi_timeout is not implemented * 6181: comments cause error with return in function * 6326: Eliminate unused UNION tables * 6347: Mserver returns with "memory exhausted in" error if query size exceeds 2.5MB * 6475: Remove unused SQL global variable "history" * 6588: Table aliasing are not supported in SQL update queries * 6591: Name mangling in the SQL catalog results in ambiguity * 6608: Sqlitelogictest error message in group by with coalesce * 6609: Rename Schema in MonetDB * 6636: sys.queue() extension: allow DBA to see all running queries * 6647: Add suport to Python 3 on Windows * 6669: COPY [xxx RECORDS] INTO foo FROM STDIN ... doesn't work without specifying nr of to be copied records * 6671: Error when running user function on merge table with remote part * 6674: R UDF with Date type fails to convert RAPI * 6676: Max data length of 2048 for column sys._tables.query is too small for the actual data (2811 for view sys.ids) * 6678: Binding NULL parameter to parametrized query results in syntax error on execution * 6680: Copy cannot open CSV file if systemctl enabled * 6684: Inserting into a replica table crashes the server * 6685: adding a view to a merge table gives unexpected behaviour * 6690: Unable to fetch empty string with ODBC driver * 6696: Re-use of MAL explain plans with complex type coercions