The storage footprint for any given database schema can be obtained by inspecting
the table producing function storage()
.
To illustrate below we see the storage characteristics of the lineitem table in
TPCH SF-10. The column width for variable length strings, e.g. l_shipinstruct
and l_comment
,
represents the average length found using a sample over the underlying column storage.
sql>select * from storagemodel() where "table" = 'lineitem';
FIX picture
To estimate the storage footprint for a variation of this scheme, we first construct
a relation with the model input. Thereafter it can be updated to reflect the expected
database size and varwidth properties.
The footprint then becomes available as table producing function storagemodel().
Also be aware that dictionary encoding may have taken place, which leads to a much
less storage footprint (see l_shipinstruct
).
Dictionary encoding is currently a runtime feature, it depends on insertion
sequence and the dictionary size. It may lead to an overshoot in the estimated size.
Therefore, the size of varchar columns should be taken with a grain of salt.
sql>call storagemodelinit();
sql>update storagemodelinput set count = 1000000 where "table"='lineitem';
sql>update storagemodelinput set "distinct" = 1000 where "table"='lineitem' and "type"='varchar';
sql>update storagemodelinput set "distinct" = 330000 where "table"='lineitem' and "column"='l_comment';
sql>select * from storagemodel() where "table" = 'lineitem';
When data updates are done on columns, the changes are first stored in so-called delta structures and later merged (in bulk) into the column storages. For analysis, monitoring and debugging purposes we provide three table producing functions:
sys.deltas("schema" string, "table" string, "column" string)
sys.deltas("schema" string, "table" string)
sys.deltas("schema" string)
Some examples to query these table producing functions:
SELECT * FROM sys.deltas('sys','statistics','minval');
SELECT id, cleared, immutable, inserted, updates, deletes, level
FROM sys.deltas('sys','statistics');
SELECT * FROM sys.deltas('tmp');
SELECT s.name as "schemanm", t.name as "tablenm", c.name as "columnnm", d.*
FROM sys.schemas s
JOIN sys.tables t ON s.id = t.schema_id
JOIN sys.columns c ON t.id = c.table_id
JOIN sys.deltas('sys') d ON c.id = d.id;