The analyze statement will analyze and update column data statistics of persisted column(s) of table(s) in a specific schema.
The statistics information is used by the SQL query planner and optimizers, so it is recommended to (re)run the ANALYZE statement on tables after many table inserts/updates/deletes are done.
analyze-statement:
ANALYZE schemaname [ '.' tablename [ '(' columnname [, columnname ] [, ... ] ')' ] ]
The system administrator or owner of a table can update statistics for
a) all tables in a schema (specify only the schemaname) or
b) all columns of a specific table (specify schemaname and tablename) or
c) a list of specific columns of one table.
ANALYZE sch2;
ANALYZE sch2."mytable";
ANALYZE "sch2"."mytable" ("col1", col2, "col9");
Analyze will access and profile all data of selected columns and update the statistics. This may take some time when the tables are very large and/or you analyze many tables in a schema.
You can only update statistics of persisted tables (with column data stored on disk), so not for views, temporary or remote tables.
Besides the analyze statement you may also call any of the sys.analyze(...)
system procedures. The available variants are:
procedure sys.analyze();
procedure sys.analyze(sname varchar(1024));
procedure sys.analyze(sname varchar(1024), tname varchar(1024));
procedure sys.analyze(sname varchar(1024), tname varchar(1024), cname varchar(1024));
Examples of using sys.analyze(...)
procedures:
CALL analyze('sch2');
CALL analyze('sch2', 'mytable');
CALL analyze('sch2', 'mytable', 'col1');
CALL analyze('sch2', 'mytable', 'col2');
CALL sys.analyze('sch2', 'mytable', 'col9');
CALL sys.analyze();
The column data statistics information can be queried from system view: sys.statistics. This view shows statistics information of columns of user tables only, so excludes system tables for convenience.
Instead of the view you can also query any of the following table producing functions:sys.statistics()
orsys.statistics('myschema')
orsys.statistics('myschema','mytable')
orsys.statistics('myschema','mytable','mycolumn')
.
The last 3 functions with input arguments are faster than querying the system view.
These 4 table producing functions will also return statistics information on columns of system tables.
-- create a test table
CREATE TABLE test.store (id INT, product VARCHAR(32), price DECIMAL(8,2));
INSERT INTO test.store VALUES (1, 'apple', 1.2), (2, 'orange', 2.0), (3, 'banana', 1.5);
-- Retrieve statistics for all user tables in the database.
SELECT * FROM sys.statistics ORDER BY column_id;
/*
+-----------+--------+-------+---------+---------+-------+-------+--------+-------+--------+--------+--------+-----------+
| column_id | schema | table | column | type | width | count | unique | nils | minval | maxval | sorted | revsorted |
+===========+========+=======+=========+=========+=======+=======+========+=======+========+========+========+===========+
| 8412 | test | store | id | int | 4 | 3 | false | true | 1 | 3 | false | false |
| 8413 | test | store | product | varchar | 1 | 3 | false | true | apple | orange | false | false |
| 8414 | test | store | price | decimal | 4 | 3 | false | true | 120 | 200 | false | false |
+-----------+--------+-------+---------+---------+-------+-------+--------+-------+--------+--------+--------+-----------+
*/
-- Analyze every column from table 'test.store' and update the column statistics
ANALYZE test.store;
-- Retrieve statistics for table 'store' in schema 'test'
SELECT * FROM sys.statistics
WHERE "schema" = 'test'
AND "table" = 'store'
ORDER BY column_id;
-- Or alternatively use the faster table returning function:
SELECT * FROM sys.statistics('test', 'store');
/*
+-----------+--------+-------+---------+---------+-------+-------+--------+-------+--------+--------+--------+-----------+
| column_id | schema | table | column | type | width | count | unique | nils | minval | maxval | sorted | revsorted |
+===========+========+=======+=========+=========+=======+=======+========+=======+========+========+========+===========+
| 8412 | test | store | id | int | 4 | 3 | true | false | 1 | 3 | true | false |
| 8413 | test | store | product | varchar | 1 | 3 | true | false | apple | orange | false | false |
| 8414 | test | store | price | decimal | 4 | 3 | true | false | 120 | 200 | false | false |
+-----------+--------+-------+---------+---------+-------+-------+--------+-------+--------+--------+--------+-----------+
*/
-- Notice that the values of "unique", "nils" and "sorted" have been changed after running the analyze statement.
-- Meanwhile more inserts are performed...
INSERT INTO test.store VALUES (4, 'pear', 1.2);
INSERT INTO test.store VALUES (5, 'kiwi', 1.9);
-- Analyze columns 'product' and 'price' from 'test.store'
ANALYZE test.store ("product", "price");
-- Retrieve statistics for columns 'product' and 'price'
SELECT * FROM sys.statistics('test', 'store', 'product')
UNION ALL
SELECT * FROM sys.statistics('test', 'store', 'price');
/*
+-----------+--------+-------+---------+---------+-------+-------+--------+-------+--------+--------+--------+-----------+
| column_id | schema | table | column | type | width | count | unique | nils | minval | maxval | sorted | revsorted |
+===========+========+=======+=========+=========+=======+=======+========+=======+========+========+========+===========+
| 8413 | test | store | product | varchar | 1 | 5 | true | false | apple | pear | false | false |
| 8414 | test | store | price | decimal | 4 | 5 | false | false | 120 | 200 | false | false |
+-----------+--------+-------+---------+---------+-------+-------+--------+-------+--------+--------+--------+-----------+
*/
TRUNCATE TABLE test.store;
SELECT * FROM sys.statistics('test', 'store');
/*
+-----------+--------+-------+---------+---------+-------+-------+--------+-------+--------+--------+--------+-----------+
| column_id | schema | table | column | type | width | count | unique | nils | minval | maxval | sorted | revsorted |
+===========+========+=======+=========+=========+=======+=======+========+=======+========+========+========+===========+
| 8412 | test | store | id | int | 4 | 0 | false | false | null | null | true | true |
| 8413 | test | store | product | varchar | 1 | 0 | false | false | null | null | true | true |
| 8414 | test | store | price | decimal | 4 | 0 | false | false | null | null | true | true |
+-----------+--------+-------+---------+---------+-------+-------+--------+-------+--------+--------+--------+-----------+
*/
Since release Jan2022, the performance of the ANALYZE statement has been improved.
The previous updatable system table sys.statistics
has been replaced by table producing functions and
a system view (with added and changed columns) and is now constructed internally when queried.
Also the previous supported ANALYZE statement options: SAMPLE n
and MINMAX
are no longer needed and now silently ignored.