Skip to main content

Column store features

When your database grows into millions of records spread over many tables and business intelligence/ science becomes the prevalent application domain, a column-store database management system is called for. Unlike traditional row-stores, such as MySQL and PostgreSQL, a column-store provides a modern and scalable solution without calling for substantial hardware investments.

MonetDB pioneered column-store solutions for high-performance data warehouses for business intelligence and eScience since 1993. It achieves its goal by innovations at all layers of a DBMS, e.g. a storage model based on vertical fragmentation, a modern CPU-tuned query execution architecture, automatic and adaptive indices, run-time query optimization, and a modular software architecture. It is based on the SQL 2003 standard with full support for foreign keys, joins, views, triggers, and stored procedures. It is fully ACID compliant and supports a rich spectrum of programming interfaces (JDBC, ODBC, PHP, Python, RoR, C/C++, Perl).

MonetDB is distributed both as a source tarball, packages for installation, and binary installers on a variety of platforms. The latest release has been tested on Linux (Fedora, RedHat Enterprise Linux, Debian, Ubuntu), Gentoo, Mac OS, Windows 7, Windows Sever 2012, Windows 10. A regular release schedule ensures the latest functional improvements to reach the community quickly.

MonetDB is the focus of database research pushing the technology envelop in many areas. Its three-level software stack, comprised of SQL front-end, tactical-optimizers, and columnar abstract-machine kernel, provide a flexible environment to customize it many different ways. A rich collection of linked-in libraries provide functionality for temporal data types, geometry data types, math routine, JSON, URL and UUID data types, User Defined Functions (UDFs) written in Python, R or C/C++. In-depth information on the technical innovations in the design and implementation of MonetDB can be found in our science library.

Last, but not least, the MonetDB system is distributed under the liberal open-source license. It allows you to modify and extend it in any way you like and subsequently redistribute it in open and close source products. Bug-fixes and functional enhancements to the MonetDB code base are highly appreciated.

In a nutshell, the MonetDB system exhibits the following features:

A column-store database kernel. MonetDB is built on the canonical representation of database relations as columns, a.k.a. arrays. They are sizeable entities -up to GigaBytes- swapped into memory by the operating system.
A high-performance system. MonetDB excels in applications where the database hot-set - the part actually touched - can be largely held in main-memory or where a few columns of a broad relational table are sufficient to handle individual requests. Further exploitation of cache-conscious algorithms proved the validity of these design decisions.
A multi-core power engine. MonetDB is designed for multi-core parallel execution on desktops to reduce response time for complex query processing. Several techniques for distributed processing are explored, but as many has found out, there is no silver bullet to improve parallel processing performance. For simple data-parallel problems a map-reduce scheme suffice, but for more complex cases careful database design and (partial) replication is called for.
A versatile algebraic database kernel. MonetDB is designed to accommodate different query languages through its proprietary algebraic-language, called the MonetDB Assembly Language (MAL). It paves the route from declarative expression received from a query compiler up to and including the necessary distributed processing protocols to steer execution of the individual database servers. The primary front-end being distributed is a SQL to MAL compiler.
A size for all. The maximal database size supported by MonetDB depends on the underlying processing platform, e.g., a 32- or 64-bit operating system, and storage device, e.g., the file system and disk raids. The number of columns per tables is practically unlimited. For each column is mapped onto a file, whose limit is dictated by the operating system and hardware platform. The number of concurrent user threads is a configuration parameter.
An extendable platform. MonetDB has been strongly influenced by the scientific experiments to understand the interplay between algorithms and application requirements. It has turned MonetDB into an extensible database system with hooks at all levels in the software stack. This allows for extension of the optimizer pipe-line with domain specific rules; the bulk operations in the kernel for domain specific algorithms; as well as traditional encapsulation of operations take from existing science libraries.
A broad application scope. MonetDB supports a broad palette of application domains by hooking up external supplied libraries, e.g. pcre, raptor, libxml and geos. Several external file formats are being encapsulated into data vaults, which creates a symbiosis and natural bridge between database processing and legacy file-based processing prevalent in some science domains.
An open-source solution. MonetDB has been developed over many years of research at CWI, whose charter ensures that results are easily accessible to others. The MonetDB forum and mailing list are the access point to the development team. Turn-key extensions, high-end technical consultancy and joint-venture projects can be accommodated through the MonetDB Solutions company.