As of MonetDB Jun2020 release, we have added support for ROLLUP
, CUBE
and GROUPING SETS
features
defined by the SQL:1999 standard in the GROUP BY
clause. These keywords allow one to perform multiple grouping
set combinations in a single statement, containing partial and total computed results.
At the same time, the GROUPING
aggregate, which identifies the grouping columns used in the output,
has also been implemented.
The grammar is as follows:
group_by_clause:
GROUP BY group_by_element [',' group_by_element ... ]
group_by_element:
identifier
| ROLLUP '(' identifier [',' identifier ... ] ')'
| CUBE '(' identifier [',' identifier ... ] ')'
| GROUPING SETS '(' group_by_element [',' group_by_element ... ] ')'
| '(' ')'
The expression '(' ')'
means a grouping without columns, i.e. a global aggregate.
In practice, whenever ROLLUP, CUBE and GROUPING SETS are used inside the GROUP BY clause, the incoming data is split for the generated grouping combinations, then each grouping is computed, then the results are returned. This can be seen as multiple GROUP BY clauses being generated, computed, then merged with UNION statements. Depending on the grouping keyword used, the generated grouping combination is different.
Nested and mixed groupings:
The GROUPING aggregate has also been added to identify the grouping columns used for each row computed in the output. This aggregate takes as input identifiers for the group columns used in the GROUP BY clause. The output is bitmask indicating which of the arguments were not used in the output grouping set. The last grouping element corresponds to the least-significant bit.
Below, we list example queries with the added functionalities.
CREATE TABLE categories (id INTEGER, description TEXT);
CREATE TABLE sections (id INTEGER, description TEXT);
CREATE TABLE products (id INTEGER, categoryid int, sectionid INTEGER,
description TEXT, price DECIMAL(6,2));
CREATE TABLE sales (productid INTEGER, sale_day DATE, units INTEGER);
INSERT INTO categories VALUES (1, 'fresh food'), (2, 'dry food'), (3, 'drinks');
INSERT INTO sections VALUES (1, 'front'), (2, 'side'), (3, 'back');
INSERT INTO products VALUES
(1, 1, 1, 'apples', 1.5), (2, 1, 2, 'melons', 4.0),
(3, 2, 2, 'peanuts', 2.0), (4, 3, 1, 'water', 1.0),
(5, 3, 3, 'wine', 5.0), (6, 2, 3, 'walnuts', 1.5);
INSERT INTO sales VALUES
(1, date '2020-03-01', 10), (2, date '2020-03-01', 3),
(4, date '2020-03-01', 4), (1, date '2020-03-02', 6),
(4, date '2020-03-02', 5), (5, date '2020-03-02', 2),
(1, date '2020-03-03', 7), (3, date '2020-03-03', 4),
(2, date '2020-03-03', 3), (5, date '2020-03-03', 1),
(6, date '2020-03-03', 1);
-- In the first example we compute the total sales per product per day, then
-- the total sales per product, and finally the overall total using the
-- ROLLUP construct.
-- If a column was not used for the aggregate, it gets a NULL value by default.
-- In our query, we use COALESCE to give them a meaningful name.
SELECT COALESCE(products.description, 'all_products') AS description,
COALESCE(sale_day, 'all_days') AS sale_day, totals.total
FROM (
SELECT productid, sale_day, SUM(units * price) AS total
FROM products
LEFT JOIN sales ON sales.productid = products.id
GROUP BY ROLLUP(productid, sale_day)
) AS totals
LEFT JOIN products ON products.id = totals.productid
ORDER BY sale_day NULLS LAST, productid NULLS LAST;
description | sale_day | total |
---|---|---|
apples | 2020-03-01 | 15.0000 |
melons | 2020-03-01 | 12.0000 |
water | 2020-03-01 | 4.0000 |
apples | 2020-03-02 | 9.0000 |
water | 2020-03-02 | 5.0000 |
wine | 2020-03-02 | 10.0000 |
apples | 2020-03-03 | 10.5000 |
melons | 2020-03-03 | 12.0000 |
peanuts | 2020-03-03 | 8.0000 |
wine | 2020-03-03 | 5.0000 |
walnuts | 2020-03-03 | 1.0000 |
apples | all_days | 34.5000 |
melons | all_days | 24.0000 |
peanuts | all_days | 8.0000 |
water | all_days | 9.0000 |
wine | all_days | 15.0000 |
walnuts | all_days | 1.5000 |
all_products | all_days | 92.0000 |
-- In the following query, we compute the total sales by category, section and
-- global. The left joins in the outer query are used to get the description.
SELECT COALESCE(sections.description, 'all_sections') AS section,
COALESCE(categories.description, 'all_categories') AS category,
totals.total
FROM (
SELECT categoryid, sectionid, SUM(units * price) AS total
FROM products
LEFT JOIN sales ON sales.productid = products.id
GROUP BY GROUPING SETS ((categoryid), (sectionid), ())
) AS totals
LEFT JOIN categories ON totals.categoryid = categories.id
LEFT JOIN sections ON totals.sectionid = sections.id;
description | sale_day | total |
---|---|---|
front | all_categories | 43.5000 |
side | all_categories | 32.0000 |
back | all_categories | 16.5000 |
all_sections | fresh food | 58.5000 |
all_sections | drinks | 24.0000 |
all_sections | dry food | 9.5000 |
all_sections | all_categories | 92.0000 |
-- We can use the GROUPING aggregate to identify which of the columns were used
-- in the grouping, then filter the result based on this information. In the
-- next example we create a view with the computed results of the previous
-- example and then we filter the category exclusive aggregates.
CREATE VIEW sales_totals AS
SELECT GROUPING(categoryid) AS category_aggregates,
GROUPING(sectionid) AS section_aggregates,
categoryid, sectionid, SUM(units * price) AS total
FROM products
LEFT JOIN sales ON sales.productid = products.id
GROUP BY GROUPING SETS ((categoryid), (sectionid), ());
SELECT description, total
FROM sales_totals
LEFT JOIN categories ON sales_totals.categoryid = categories.id
WHERE category_aggregates = 0;
description | total |
---|---|
fresh food | 58.5000 |
drinks | 24.0000 |
dry food | 9.5000 |
Some implementation notes are left here:
The grouping combinations extension was developed by Pedro Ferreira, a software developer at MonetDB Solutions.