Window Functions

SQL provides a method to aggregate over a series of related tuples. They are called window functions and always come with an OVER() clause which determines how tuples are split up over the window functions.

The PARTITION BY clause within OVER divides the rows into groups that share the same values of the PARTITION BY expression(s). For each row, the window function is computed over all rows participating in the group. The order within a partition can be used as well.

window_function_spec:
    window_function OVER   { window_name | '(' window_specification ')' }

window_function:
      RANK '(' ')'
    | DENSE_RANK '(' ')'
    | PERCENT_RANK '(' ')'
    | ROW_NUMBER '(' ')'
    | CUME_DIST '(' ')'
    | FIRST_VALUE '(' query_expression ')'
    | LAST_VALUE '(' query_expression ')'
    | NTH_VALUE '(' query_expression ',' numeric_expression ')'
    | NTILE '(' query_expression ')'
    | LEAD '(' query_expression [ ',' query_expression [ ',' query_expression ] ] ')'
    | LAG '(' query_expression [ ',' query_expression [ ',' query_expression ] ] ')'
    | aggregate_function

aggregate_function:
      COUNT '(' '*' ')'
    | COUNT '(' [ DISTINCT ] query_expression ')'
    | MAX '(' query_expression ')'
    | MIN '(' query_expression ')'
    | SUM '(' [ DISTINCT ] query_expression ')'
    | PROD '(' [ DISTINCT ] query_expression ')'
    | AVG '(' query_expression ')'
    | extra_statistic_aggregate_function

window_specification:
    [ window_name ]   [ PARTITION BY column_ref   [ ',' ... ] ]   [ ORDER BY sort_spec ]   [ window_frame ]

window_frame:
    { ROWS | RANGE | GROUPS }   { window_frame_start | BETWEEN window_bound AND window_bound }
        [ EXCLUDE { CURRENT ROW | GROUP | TIES | NO OTHERS }   ]

window_frame_start:
     UNBOUNDED PRECEDING
   | value PRECEDING
   | CURRENT ROW

window_bound:
    window_frame_start
   | UNBOUNDED FOLLOWING
   | value FOLLOWING

Supported Window Functions:

RANK() : BIGINT - Returns the rank number within a partition, starting at 1.
DENSE_RANK() : BIGINT - Returns the rank of the current row without gaps, it counts peer groups.
PERCENT_RANK() : DOUBLE - Calculates the relative rank of the current row: (rank() - 1) / (rows in partition - 1).
ROW_NUMBER() : BIGINT - Returns the position of the tuple currently in the result set, starting at 1.
CUME_DIST() : DOUBLE - Calculates the cumulative distribution: number of rows preceding or peer with current row / rows in partition.
FIRST_VALUE(input A) : A - Returns input value at first row of the window frame.
LAST_VALUE(input A) : A - Returns input value at last row of the window frame.
NTH_VALUE(input A, nth BIGINT) : A - Returns input value at “nth” row of the window frame. If there is no “nth” row in the window frame, then NULL is returned.
NTILE(nbuckets BIGINT) : BIGINT - Enumerates rows from 1 in each partition, dividing it in the most equal way possible.
LAG(input A [, offset BIGINT [, default_value A ] ]) : A - Returns input value at row “offset” before the current row in the partition. If the offset row does not exist, then the “default_value” is output. If omitted, “offset” defaults to 1 and “default_value” to NULL.
LEAD(input A [, offset BIGINT [, default_value A ] ]) : A - Returns input value at row “offset” after the current row in the partition. If the offset row does not exist, then the “default_value” is output. If omitted, “offset” defaults to 1 and “default_value” to NULL.
MIN(input A) : A
MAX(input A) : A
COUNT(*) : BIGINT
COUNT(input A) : BIGINT
SUM(input A) : A
PROD(input A) : A
AVG(input A) : DOUBLE

From release Jun2020 (11.37.7) all the statistics aggregate functions are also supported.

The supported frames are:

    ROWS - Frames are calculated on physical offsets of input rows.
    RANGE - Result frames are calculated on value differences from input rows (used with a custom PRECEDING or FOLLOWING bound requires an ORDER BY clause).
    GROUPS - Groups of equal row values are used to calculate result frames (requires an ORDER BY clause).

Examples

create table ranktest (id int, k varchar(3));
insert into ranktest values (1, 'a'),(2, 'b'),(3, 'c'),(4, 'd');
insert into ranktest values (1, 'a'),(2, 'b'),(3, 'c'),(4, 'd');

select ROW_NUMBER() over () as foo from ranktest;
select ROW_NUMBER() over (PARTITION BY id) as foo, id from ranktest;
select ROW_NUMBER() over (PARTITION BY id ORDER BY id) as foo, id from ranktest;
select ROW_NUMBER() over (ORDER BY id) as foo, id from ranktest;

select RANK() over () as foo from ranktest;
select RANK() over (PARTITION BY id) as foo, id from ranktest;
select RANK() over (PARTITION BY id ORDER BY id) as foo, id from ranktest;
select RANK() over (ORDER BY id) as foo, id from ranktest;

select RANK() over () as foo, id, k from ranktest;
select RANK() over (PARTITION BY id) as foo, id, k from ranktest;
select RANK() over (PARTITION BY id ORDER BY id, k) as foo, id, k from ranktest;
select RANK() over (ORDER BY id, k) as foo, id, k from ranktest;

select DENSE_RANK() over () as foo, id, k from ranktest order by k;
select DENSE_RANK() over (PARTITION BY id) as foo, id, k from ranktest order by k;
select DENSE_RANK() over (PARTITION BY id ORDER BY id, k) as foo, id, k from ranktest order by k;
select DENSE_RANK() over (ORDER BY id, k) as foo, id, k from ranktest order by k;
drop table ranktest;

It is possible to define a new user-defined window function for which the implementation is specified externally using MAL and C:

CREATE OR REPLACE WINDOW stddev(val bigint) RETURNS double EXTERNAL NAME "sql"."stdevp";