The MonetDB team at the Centrum Wiskunde & Informatica (CWI) has been pioneering column stores technology since 1993, based on the intuition that simplicity of a database kernel and novel software stack is essential for a break-through in query processing speed. At a time when joins were considered one of the prime performance hindrances, main-memory a scarce resource, and cost-based multi purpose optimizer, the MonetDB team relied on extensive join processing, fully materialized intermediate relational tables in virtual memory, and a three-layered software stack focused on strategic, tactical, and operational optimization decisions.
The Dark Ages [1979-1992] The development of the MonetDB software family goes back as far as the early eighties when the first relational kernel, called Troll, was delivered to a larger audience. It was spread over ca 1000 sites world-wide and became part of a software case-tool until the beginning of the nineties. None of the code of this system has survived, but ideas and experiences on how to obtain a fast kernel by simplification and explicit materialization found their origin during this period. The second half of the eighties was spent on building the first distributed main-memory database system in the context of the national Prisma project. A fully functional system of 100 processors and a wealthy 1GB of main memory showed the road to develop database technology from a different perspective.
The Early Days [1993-1995] Immediately after the Prisma project was declared dead, a new database kernel based on Binary Association Tables (BATs) was laid out. This storage engine became accessible through as simple scripting language, called MIL, intended as a target for compilation of queries. The target application domain was to better support of scientific databases with their then archaic file structures. It quickly shifted to a more urgent and emerging area.
The Data Distilleries Era [1996-2003] The datamining projects running as of 1993 called for better database support. It culminated in the spin-off Data Distilleries, which based their analytical customer relationship suite on the power provided by the early MonetDB implementations. In the years following, many technical innovations were paired with strong industrial maturing of the software base. Data Distilleries became a subsidiary of SPSS in 2003, which was acquired by IBM in 2009.
The Open-Source Challenge [2003-2007] Moving MonetDB Version 4 into the open-source domain required a large number of extensions to the code base. It became utmost important to support a mature implementation of the SQL:2003 standard, and the bulk of application programming interfaces (PHP, JDBC, Python, Perl, ODBC, Ruby). The result of this activity was the first official open-source release in 2004. An XQuery front-end was developed with partners and released in 2005.
The Open-Source Product [2008-2010] MonetDB/SQL is the result of a multi-year activity to improve the software stack as a basis for datawarehouses. A spin-off company of CWI was set up to support its market take-up, to provide a foundation for QA, and support user requested development activities hard to justify in a research institute.
The Renovation Phase [2011-2013] After a decade of growing code, a major cleanup was started. The MonetDB 4 kernel and its XQuery components were frozen. This cleared the path to perform a rewrite of the complete kernel to get a leaner code base. In particular, the columnar approach already effective in the MonetDB 5 and SQL layers was pushed down to the kernel. The consequential changes created a difference in API.
International Recognition [2014,2016,2017] Its pioneering role has been internationally recognized with the prestigious ACM SIGMOD Edgar F. Codd Innovation Award , ACM SIGMOD Systems Award, and the ACM Fellow Award for M.L. Kersten.
Reaching out [2013- ] A database services company MonetDB Solutions has been established to facilitate commercial use of the system in products and services by third parties. Research on the platform continous as before, venturing out into various directions, such as: better support for arrays and GeoSpatial data, adaptive access to file repositories for scientific data management applications.
A new software eco system [2014- ] Opening the MonetDB database kernel to host different language interpreters, e.g. R and Python, as well as the embedded version of MonetDB, makes it a leading platform for data science.
Embedded database processing [2016- ] An embeddable version of MonetDB has been released. It loads as a library into R, Python, Java and C, where it can be used as an alternative for SQLite.
Hybrid transaction/analytical processing (HTAP) [2021- ] In the Jul2021 release the storage and transaction layers have undergone major changes. The goal of these changes is robust performance under inserts/updates/deletes and lowering the transaction startup costs, allowing faster (small) updates processing.
We hope that many may benefit from our investments, both research, hobby and business wise. Don’t be shy to express this on the social media.