Statistic Functions

Statistic Functions mk Tue, 11/19/2013 - 08:52

The SQL implementation provides the well-known standard SQL aggregate functions COUNT(*|...), COUNT(DISTINCT ...), SUM(...), AVG(...), MIN(...) and MAX(...) over scalar types/expressions and groupings. In addition, a few important statistical aggregate functions: MEDIAN, QUANTILE, STDDEV, VAR and correlation CORR are available. From release Jun2020 (11.37.7) also COVAR_SAMP and COVAR_POP are available.

extra_statistical_aggregate_functions:
      sys.median '(' scalar_expr ')'
    | sys.median_avg '(' scalar_expr ')'
    | sys.quantile '(' scalar_expr ',' scalar_expr ')'
    | sys.quantile_avg '(' scalar_expr ',' scalar_expr ')'
    | sys.stddev_samp '(' scalar_expr ')'
    | sys.stddev_pop '(' scalar_expr ')'
    | sys.var_samp '(' scalar_expr ')'
    | sys.var_pop '(' scalar_expr ')'
    | sys.corr '(' scalar_expr ',' scalar_expr ')'
    | sys.covar_samp '(' scalar_expr ',' scalar_expr ')'
    | sys.covar_pop '(' scalar_expr ',' scalar_expr ')'

For sys.quantile the percentile argument is a float value between 0.0 and 1.0. sys.median(<expr>) is equivalent to sys.quantile(<expr>, 0.5).
sys.stddev_samp computes the cumulative sample standard deviation and returns the square root of the sample variance. sys.stddev_pop computes the population standard deviation and returns the square root of the population variance. Both functions take as an argument any numeric datatype.
Likewise, sys.var_samp and sys.var_pop functions return the sample variance (/n-1) of a set of numbers and the biased variance (/n) of a set of numbers, respectively.

Note: The aggregate functions sys.median_avg and sys.quantile_avg are added in Nov2019 (11.35.3) release. They return the interpolated value if the median/quantile doesn't fall exactly on a particular row. These functions always return a value of type DOUBLE and only work for numeric types (various width integers, decimal and floating point).

Usage example:

   create schema aggr_tst;
   set schema aggr_tst;
   create table tc (c real);
   insert into tc values (1), (2), (3), (4), (5), (9);
   select * from tc;
   select count(*) countstar, COUNT(c) count, COUNT(DISTINCT c) countdistinct
        , SUM(c) sum, AVG(c) average, PROD(c) product
        , MIN(c) minimum, MAX(c) maximum
        , sys.MEDIAN(c) median
        , sys.MEDIAN_AVG(c) median_avg
        , sys.QUANTILE(c, 0.5) quantile
        , sys.QUANTILE_AVG(c, 0.5) quantile_avg
        , sys.STDDEV_SAMP(c) stddev_samp
        , sys.STDDEV_POP(c) stddev_pop
        , sys.VAR_SAMP(c) var_samp
        , sys.VAR_POP(c) var_pop
        , sys.CORR(c, c+1) corr
        , sys.COVAR_SAMP(c, c*2) covar_samp
        , sys.COVAR_POP(c, c*2) covar_pop
    from tc;
   drop table tc;
   set schema sys;
   drop schema aggr_tst;

 

Tip: To view all the available aggregate functions in your MonetDB server use query:

SELECT * FROM sys.functions where type = 3;