
Hi all, I'm trying to figure out how much storage MonetDB (Feb2013-SP2) uses when populated with my data on a fresh database. Here is my table definition: create table genomics.grch36 (site bigint, base char(1)); I'm populating the table via COPY INTO. The source file size is 40GB and contains more than 3 billion rows. Following the suggestions on http://www.monetdb.org/Documentation/Userguide/diskspace, I get following result for the SQL query: sql>select * from storage() where "table" = 'grch36'; +----------+--------+--------+--------+----------+------------+-----------+-------------+----------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +==========+========+========+========+==========+============+===========+=============+==========+=========+========+ | genomics | grch36 | site | bigint | 12/1252 | 3137459420 | 8 | 25099675360 | 0 | 0 | false | | genomics | grch36 | base | char | 12/1247 | 3137459420 | 1 | 6274918840 | 131072 | 0 | false | +----------+--------+--------+--------+----------+------------+-----------+-------------+----------+---------+--------+ I assume that 'columnsize' refers to the storage demand. I'm I right? Based on the numbers given, I assume a storage need of round about 31 GB (without any compression). Calling 'du -h' on my database directory confirms this assumption. 4,0K ./bat/14 44K ./bat/10 108K ./bat/07 476K ./bat/01 30G ./bat/12 256K ./bat/02 4,0K ./bat/03 4,0K ./bat/HC 4,0K ./bat/16 4,0K ./bat/17 12K ./bat/13 4,0K ./bat/LEFTOVERS 28K ./bat/BACKUP 4,0K ./bat/15 4,0K ./bat/11 12K ./bat/04 30G ./bat 8,0K ./sql_logs/sql 12K ./sql_logs 4,0K ./box 30G . But why is the 'columnsize' of the 'base' column something about 6GB whereas 3 billion times 1 Byte (for a 1 char) is 3GB? For the 'site' column this calculation works! Another question that comes into my mind: The number of distinct values in the 'base' column is 5. Thus, assuming a ideal dictionary encoding 3 Bit would be sufficient to encode all values. Since MonetDB uses dictionary compression, can it utilize this fact? Regards, Sebastian