[Monetdb-developers] Partitioning
Hello, I've been following the developments of MonetDB off and on. It seems to me that there is a real emphasis on XML-related functionality, and less so for BI-related functionality. I wanted to ask if there are any plans to allow tables to be partitioned by some condition. This would have a couple of benefits. A table could be broken out into several BAT files, one for each condition met. That would allow you to potentially overcome the 2 GB BAT file limitation on 32 bit systems. And of course, when adding intelligence into the optimizer for this, it could examine the conditions of queries and potentially eliminate entire BAT files to scan. For example, assume you table tab with a column named sometype, and used sometype for partitioning. Sometype has 10 possible values. Physically, MonetDB would create 10 different BAT files for each column, corresponding to each possible value. When a query is executed with a condition like "sometype = 1", the optimizer is smart enough to know it need only use 1 set of the bat files, and not all 10, significantly reducing the amount of data that needs to be examined. If you are familiar with PostgreSQL, they introduced something like this called Constraint Exclusion Partitioning. Each partition is treated as a subtable of a main parent table. You are required to insert directly into the proper subtable, but querying on the main parent table will determine which subtables are required to be examined in processing the query. Their scheme is not entirely convenient with loading up data, but it is quite flexible in setting up arbitrary conditions on the subtables. Are there any such plans to do something similar? At what state is this in? I believe I saw something about Partitioning on the roadmap several months ago. The MonetDB home page mentions OLAP, and it would seem to me that a feature like this is critical if MonetDB really wants to handle Business Intelligence type of queries. Thanks! -- moredata@fastmail.net -- http://www.fastmail.fm - Faster than the air-speed velocity of an unladen european swallow
Hello, Thank you for your interest in MonetDB source and its future. The answer to the question is simple. Indeed, partitioned tables are a good mechanism to control scalability problems on small scale systems. And experimental code has been written to assess the consequences on the core. One of the blocking reasons to include it is the lack of an extendable optimizer infrastructure in the current release. However, this has changed with the MonetDB 5 alpha release, which is only available as source tarball. The draft documentation may be of interest to you and shows the required functionality, http://homepages.cwi.nl/~mk/MonetDB/The-MAL-Optimizer.html#The-MAL-Optimizer Furthermore, we are finishing a project extending MonetDB Version 5 with the ability to deal with distributed materialized views in SQL. A big step towards partitioned AND distributed query processing. This code will certainly become available for a larger audiance and it plays a crucial role in managing a multi TB database under construction at our place. regards, Martin moredata@fastmail.net wrote:
Hello,
I've been following the developments of MonetDB off and on. It seems to me that there is a real emphasis on XML-related functionality, and less so for BI-related functionality.
I wanted to ask if there are any plans to allow tables to be partitioned by some condition. This would have a couple of benefits. A table could be broken out into several BAT files, one for each condition met. That would allow you to potentially overcome the 2 GB BAT file limitation on 32 bit systems. And of course, when adding intelligence into the optimizer for this, it could examine the conditions of queries and potentially eliminate entire BAT files to scan.
For example, assume you table tab with a column named sometype, and used sometype for partitioning. Sometype has 10 possible values. Physically, MonetDB would create 10 different BAT files for each column, corresponding to each possible value. When a query is executed with a condition like "sometype = 1", the optimizer is smart enough to know it need only use 1 set of the bat files, and not all 10, significantly reducing the amount of data that needs to be examined.
If you are familiar with PostgreSQL, they introduced something like this called Constraint Exclusion Partitioning. Each partition is treated as a subtable of a main parent table. You are required to insert directly into the proper subtable, but querying on the main parent table will determine which subtables are required to be examined in processing the query. Their scheme is not entirely convenient with loading up data, but it is quite flexible in setting up arbitrary conditions on the subtables.
Are there any such plans to do something similar? At what state is this in? I believe I saw something about Partitioning on the roadmap several months ago. The MonetDB home page mentions OLAP, and it would seem to me that a feature like this is critical if MonetDB really wants to handle Business Intelligence type of queries.
Thanks!
participants (2)
-
Martin Kersten
-
moredataļ¼ fastmail.net