Statistic Functions

The SQL implementation provides the well-known standard SQL aggregate functions COUNT(*|...), COUNT(DISTINCT ...), SUM(...), AVG(...), MIN(...) and MAX(...) over scalar types/expressions and groupings. In addition, a few important statistical aggregate functions: MEDIAN, QUANTILE, STDDEV, VAR, correlation CORR, COVAR_SAMP and COVAR_POP are available.

extra_statistical_aggregate_functions:
      sys.median '(' scalar_expr ')'
    | sys.median_avg '(' scalar_expr ')'
    | sys.quantile '(' scalar_expr ',' scalar_expr ')'
    | sys.quantile_avg '(' scalar_expr ',' scalar_expr ')'
    | sys.stddev_samp '(' scalar_expr ')'
    | sys.stddev_pop '(' scalar_expr ')'
    | sys.var_samp '(' scalar_expr ')'
    | sys.var_pop '(' scalar_expr ')'
    | sys.corr '(' scalar_expr ',' scalar_expr ')'
    | sys.covar_samp '(' scalar_expr ',' scalar_expr ')'
    | sys.covar_pop '(' scalar_expr ',' scalar_expr ')'

For sys.quantile the percentile argument is a float value between 0.0 and 1.0. sys.median(<expr>) is equivalent to sys.quantile(<expr>, 0.5).

sys.stddev_samp computes the cumulative sample standard deviation and returns the square root of the sample variance. sys.stddev_pop computes the population standard deviation and returns the square root of the population variance. Both functions take as an argument any numeric datatype. Likewise, sys.var_samp and sys.var_pop functions return the sample variance (/n-1) of a set of numbers and the biased variance (/n) of a set of numbers, respectively.

The aggregate functions sys.median_avg and sys.quantile_avg return the interpolated value if the median/quantile doesn't fall exactly on a particular row. These functions always return a value of type DOUBLE and only work for numeric types (various width integers, decimal and floating point).

Examples

create schema aggr_tst;
set schema aggr_tst;
create table tc (c real);
insert into tc values (1), (2), (3), (4), (5), (9);
select * from tc;
select count(*) countstar, COUNT(c) count, COUNT(DISTINCT c) countdistinct
     , SUM(c) sum, AVG(c) average, PROD(c) product
     , MIN(c) minimum, MAX(c) maximum
     , sys.MEDIAN(c) median
     , sys.MEDIAN_AVG(c) median_avg
     , sys.QUANTILE(c, 0.5) quantile
     , sys.QUANTILE_AVG(c, 0.5) quantile_avg
     , sys.STDDEV_SAMP(c) stddev_samp
     , sys.STDDEV_POP(c) stddev_pop
     , sys.VAR_SAMP(c) var_samp
     , sys.VAR_POP(c) var_pop
     , sys.CORR(c, c+1) corr
     , sys.COVAR_SAMP(c, c*2) covar_samp
     , sys.COVAR_POP(c, c*2) covar_pop
 from tc;
drop table tc;
set schema sys;
drop schema aggr_tst;

Tip: To view all the available aggregate functions in your MonetDB server use query:

SELECT * FROM sys.functions WHERE type = 3 ORDER BY name;