Strictly speaking, database users do not need to be bothered by the technicalities of the underlying systems. For that reason, interfaces have been invented so that users can simply use a database system to process their SQL workload. However, just like when one encounters a good dish, one often wants to know how it was made. So, here it is, MonetDB’s recipe to “cook” SQL queries and relational data into delicious result sets to serve user applications piping hot.
The figure here illustrates the implementation architecture of MonetDB to execute an SQL query:
Starting from the top, a query first goes through the SQL Parser and SQL Compiler. The result of these steps is a logical query execution plan expressed as “Relational Algebra” operators. Users can view this plan by prefixing their queries with the keyword PLAN
, which instructs MonetDB to compile the query but not execute it.
The logical plan is optimised using the information available at the SQL layer and translated into a physical execution plan, “MAL Programme”, by the MAL Generator. MAL stands for MonetDB Assembly Language, MonetDB’s internal language to denote how a query should be executed. MAL only allows assignments to make the statements easy to parse and analyse.
The initial MAL programme is iteratively rewritten by an army of MAL Optimisers until there are no more optimisation opportunities or until it runs out of rewriting quota. Users can view the final result of the MAL Optimisers by prefixing their queries with the keyword EXPLAIN
, which instructs MonetDB to return the MAL plan without executing it.
The MAL Interpreter interprets the optimised MAL programme and executes it in MonetDB’s kernel (called the “GDK Kernel”), which directly operates on BATs. BAT stands for Binary Association Table, the MonetDB’s internal binary data structure to store the data of the relational tables. BATs can reside both in-memory or on-disk (as memory-mapped files).
Finally, the resulting tuples are sent back to the MonetDB client in a “Resultset”.
The SQL Compiler and MAL Optimisers deploy well-known rewriting rules (e.g. join order, selection push-down or push-up, parallelisation, and dead code/common expression/constant elimination) to reduce the intermediate sizes and processing time. They do not rely on any cost-model or pre-computed statistics.
The middle layer (marked by the dashed box) is a sequence of specialised optimisers that morph the logical plan received from the SQL compiler into a physical parallel execution plan expressed in MAL statements.
The bottom layer (under the dashed box) contains the implementation of the relational operators (in the MAL statements). Each operator takes as input the resident intermediates produced by operators executed earlier or the persistent data on disk.
As an example, given two tables R and S containing cities and their respective two-letters country codes, and a query joining the two tables on their ID columns:
sql>SELECT * FROM r;
+------+--------+
| id | val |
+======+========+
| 1 | London |
| 2 | Paris |
| 3 | Berlin |
+------+--------+
sql>SELECT * FROM s;
+------+------+
| id | val |
+======+======+
| 1 | GB |
| 2 | FR |
| 3 | DE |
+------+------+
sql>SELECT r.val, s.val FROM r, s WHERE r.id = s.id;
+--------+------+
| val | val |
+========+======+
| London | GB |
| Paris | FR |
| Berlin | DE |
+--------+------+
The logical execution plan for this JOIN query is easy to read:
sql>PLAN SELECT r.val, s.val FROM r, s WHERE r.id = s.id;
+------------------------------------------------------------------------+
| rel |
+========================================================================+
| project ( |
| | join ( |
| | | table("sys"."r") [ "r"."id" NOT NULL UNIQUE HASHCOL , "r"."val" ], |
| | | table("sys"."s") [ "s"."id" NOT NULL UNIQUE HASHCOL , "s"."val" ] |
| | ) [ ("r"."id" NOT NULL HASHCOL ) = ("s"."id" NOT NULL HASHCOL ) ] |
| ) [ "r"."val", "s"."val" ] |
+------------------------------------------------------------------------+
Going from the innermost to the outermost operators, this plan show a typical workflow: first identify the tables and their columns used by the query; then the JOIN operator and the condition; finally a PROJECT operator for the SELECTed columns. The logical plans are often still easy enough for SQL users to read, while their corresponding physical plans are often much more complex, as shown in the figure below:
sql>EXPLAIN SELECT r.val, s.val FROM r, s WHERE r.id = s.id;
+------------------------------------------------------------------------------------------------------------------------------------------------+
| mal |
+================================================================================================================================================+
| function user.main():void; |
| X_1:void := querylog.define("explain select r.val, s.val from r, s where r.id = s.id;":str, "default_pipe":str, 32:int); |
| barrier X_160:bit := language.dataflow(); |
| X_4:int := sql.mvc(); |
| C_109:bat[:oid] := sql.tid(X_4:int, "sys":str, "r":str, 0:int, 3:int); |
| X_114:bat[:int] := sql.bind(X_4:int, "sys":str, "r":str, "id":str, 0:int, 0:int, 3:int); |
| X_118:bat[:str] := sql.bind(X_4:int, "sys":str, "r":str, "val":str, 0:int, 0:int, 3:int); |
| X_122:bat[:int] := algebra.projection(C_109:bat[:oid], X_114:bat[:int]); |
| X_49:bat[:str] := bat.pack("sys.r":str, "sys.s":str); |
| X_50:bat[:str] := bat.pack("val":str, "val":str); |
| X_51:bat[:str] := bat.pack("clob":str, "clob":str); |
| X_52:bat[:int] := bat.pack(0:int, 0:int); |
| C_110:bat[:oid] := sql.tid(X_4:int, "sys":str, "r":str, 1:int, 3:int); |
| X_116:bat[:int] := sql.bind(X_4:int, "sys":str, "r":str, "id":str, 0:int, 1:int, 3:int); |
| X_119:bat[:str] := sql.bind(X_4:int, "sys":str, "r":str, "val":str, 0:int, 1:int, 3:int); |
| X_123:bat[:int] := algebra.projection(C_110:bat[:oid], X_116:bat[:int]); |
| C_112:bat[:oid] := sql.tid(X_4:int, "sys":str, "r":str, 2:int, 3:int); |
| X_117:bat[:int] := sql.bind(X_4:int, "sys":str, "r":str, "id":str, 0:int, 2:int, 3:int); |
| X_120:bat[:str] := sql.bind(X_4:int, "sys":str, "r":str, "val":str, 0:int, 2:int, 3:int); |
| C_20:bat[:oid] := sql.tid(X_4:int, "sys":str, "s":str); |
| X_22:bat[:int] := sql.bind(X_4:int, "sys":str, "s":str, "id":str, 0:int); |
| X_28:bat[:str] := sql.bind(X_4:int, "sys":str, "s":str, "val":str, 0:int); |
| X_124:bat[:int] := algebra.projection(C_112:bat[:oid], X_117:bat[:int]); |
| X_36:bat[:int] := algebra.projection(C_20:bat[:oid], X_22:bat[:int]); |
| X_37:bat[:str] := algebra.projection(C_20:bat[:oid], X_28:bat[:str]); |
| X_162:void := language.pass(C_20:bat[:oid]); |
| (X_128:bat[:oid], X_129:bat[:oid]) := algebra.join(X_122:bat[:int], X_36:bat[:int], nil:BAT, nil:BAT, false:bit, nil:lng); |
| (X_130:bat[:oid], X_131:bat[:oid]) := algebra.join(X_123:bat[:int], X_36:bat[:int], nil:BAT, nil:BAT, false:bit, nil:lng); |
| (X_132:bat[:oid], X_133:bat[:oid]) := algebra.join(X_124:bat[:int], X_36:bat[:int], nil:BAT, nil:BAT, false:bit, nil:lng); |
| X_163:void := language.pass(X_36:bat[:int]); |
| X_134:bat[:str] := algebra.projectionpath(X_128:bat[:oid], C_109:bat[:oid], X_118:bat[:str]); |
| X_164:void := language.pass(C_109:bat[:oid]); |
| X_135:bat[:str] := algebra.projectionpath(X_130:bat[:oid], C_110:bat[:oid], X_119:bat[:str]); |
| X_165:void := language.pass(C_110:bat[:oid]); |
| X_136:bat[:str] := algebra.projectionpath(X_132:bat[:oid], C_112:bat[:oid], X_120:bat[:str]); |
| X_166:void := language.pass(C_112:bat[:oid]); |
| X_137:bat[:str] := algebra.projection(X_129:bat[:oid], X_37:bat[:str]); |
| X_138:bat[:str] := algebra.projection(X_131:bat[:oid], X_37:bat[:str]); |
| X_139:bat[:str] := algebra.projection(X_133:bat[:oid], X_37:bat[:str]); |
| X_167:void := language.pass(X_37:bat[:str]); |
| X_152:bat[:str] := mat.packIncrement(X_134:bat[:str], 3:int); |
| X_153:bat[:str] := mat.packIncrement(X_152:bat[:str], X_135:bat[:str]); |
| X_45:bat[:str] := mat.packIncrement(X_153:bat[:str], X_136:bat[:str]); |
| X_155:bat[:str] := mat.packIncrement(X_137:bat[:str], 3:int); |
| X_156:bat[:str] := mat.packIncrement(X_155:bat[:str], X_138:bat[:str]); |
| X_47:bat[:str] := mat.packIncrement(X_156:bat[:str], X_139:bat[:str]); |
| exit X_160:bit; |
| X_48:int := sql.resultSet(X_49:bat[:str], X_50:bat[:str], X_51:bat[:str], X_52:bat[:int], X_52:bat[:int], X_45:bat[:str], X_47:bat[:str]); |
| end user.main; |
| # optimizer.inline(0:int, 3:lng) |
| # optimizer.remap(0:int, 2:lng) |
| # optimizer.costModel(1:int, 2:lng) |
| # optimizer.coercions(0:int, 5:lng) |
| # optimizer.aliases(0:int, 1:lng) |
| # optimizer.evaluate(0:int, 19:lng) |
| # optimizer.emptybind(4:int, 13:lng) |
| # optimizer.deadcode(6:int, 12:lng) |
| # optimizer.pushselect(0:int, 18:lng) |
| # optimizer.aliases(4:int, 9:lng) |
| # optimizer.for(0:int, 8:lng) |
| # optimizer.dict(0:int, 6:lng) |
| # optimizer.mitosis(3:int, 43:lng) |
| # optimizer.mergetable(5:int, 81:lng) |
| # optimizer.bincopyfrom(0:int, 1:lng) |
| # optimizer.aliases(0:int, 1:lng) |
| # optimizer.constants(3:int, 13:lng) |
| # optimizer.commonTerms(1:int, 17:lng) |
| # optimizer.projectionpath(3:int, 17:lng) |
| # optimizer.deadcode(4:int, 11:lng) |
| # optimizer.matpack(2:int, 11:lng) |
| # optimizer.reorder(1:int, 12:lng) |
| # optimizer.dataflow(1:int, 33:lng) |
| # optimizer.querylog(0:int, 1:lng) |
| # optimizer.multiplex(0:int, 1:lng) |
| # optimizer.generator(0:int, 2:lng) |
| # optimizer.candidates(1:int, 2:lng) |
| # optimizer.deadcode(0:int, 7:lng) |
| # optimizer.postfix(0:int, 11:lng) |
| # optimizer.wlc(0:int, 1:lng) |
| # optimizer.garbageCollector(1:int, 26:lng) |
| # optimizer.profiler(0:int, 0:lng) |
| # optimizer.total(32:int, 460:lng) |
+------------------------------------------------------------------------------------------------------------------------------------------------+
Here I forced MonetDB to generate a parallelised physical plan for two worker threads (e.g. hence, two algebra.join
operators). Otherwise, MonetDB will automatically decide that a sequential execution is sufficient for such small tables.
I’ll refrain from going into the details of this physical plan since they are mainly read by the database developers and they differ significantly even for the same query under different configurations. However, SQL users might find the list of optimsers interesting:
optimizer.constants(3:int, 12:lng)
means that the constant eliminator has replaced three constants in 12 millisecondsdeadcode
eliminater; while other optimisers have a fixed position in the series of optimisers, e.g. the garbageCollector
.Ok, folks, that’s all for today. Until the next blog,