Hi all,
I'm trying to figure out how much storage MonetDB (Feb2013-SP2) uses
when populated with my data on a fresh database.
Here is my table definition: create table genomics.grch36 (site
bigint, base char(1));
I'm populating the table via COPY INTO.
The source file size is 40GB and contains more than 3 billion rows.
Following the suggestions on
http://www.monetdb.org/Documentation/Userguide/diskspace,
I get following result for the SQL query:
sql>select * from storage() where "table" = 'grch36';
+----------+--------+--------+--------+----------+------------+-----------+-------------+----------+---------+--------+
| schema | table | column | type | location | count |
typewidth | columnsize | heapsize | indices | sorted |
+==========+========+========+========+==========+============+===========+=============+==========+=========+========+
| genomics | grch36 | site | bigint |
12/1252 | 3137459420 | 8 | 25099675360 | 0 | 0
| false |
| genomics | grch36 | base | char |
12/1247 | 3137459420 | 1 | 6274918840 | 131072 | 0
| false |
+----------+--------+--------+--------+----------+------------+-----------+-------------+----------+---------+--------+
I assume that 'columnsize' refers to the storage demand. I'm I
right? Based on the numbers given, I assume a storage need of round
about 31 GB (without any compression).
Calling 'du -h' on my database directory confirms this assumption.
4,0K ./bat/14
44K ./bat/10
108K ./bat/07
476K ./bat/01
30G ./bat/12
256K ./bat/02
4,0K ./bat/03
4,0K ./bat/HC
4,0K ./bat/16
4,0K ./bat/17
12K ./bat/13
4,0K ./bat/LEFTOVERS
28K ./bat/BACKUP
4,0K ./bat/15
4,0K ./bat/11
12K ./bat/04
30G ./bat
8,0K ./sql_logs/sql
12K ./sql_logs
4,0K ./box
30G .
But why is the 'columnsize' of the 'base' column something about 6GB
whereas 3 billion times 1 Byte (for a 1 char) is 3GB?
For the 'site' column this calculation works!
Another question that comes into my mind:
The number of distinct values in the 'base' column is 5.
Thus, assuming a ideal dictionary encoding 3 Bit would be sufficient
to encode all values.
Since MonetDB uses dictionary compression, can it utilize this fact?
Regards,
Sebastian