
I work for a company Adclick on Portugal, and we are trying MonetDb and
benchmarking it. We are doing it at AWS (Amazon Web Services), that IO is
very low, so we didn't expect great numbers, but we are very surprised
great numbers appear. Now we will benchmark on server with SSD disks. I'm
really enjoying MonetDb, *very good work *developing team. I think if
MonetDb team work in replication master-slave, on real delete's! more rpm's
for CentOS, AMI to Amazon Cloud, MonetDb can become real a top commercial
product (i already tried so many).
I have a question regarding memory:
"MonetDB excessively uses main memory for processing, but does not require
that all data fit in the available physical memory. To handle dataset that
exceed the available physical memory, MonetDB does not (only) rely on the
available swap space, but (also) uses memory-mapped files to exploit disk
storage beyond the swap space as virtual memory."
"For example, while bulk-loading data (preferably via a COPY INTO
statements from a (possibly compressed) CSV file), MonetDB need to have all
columns of the table that is currently being loaded "active", i.e.,
accessable in the address space. However, during loading, parts of the data
are continuously written to the persisten files on disk, i.e., the whole
table does not have to fit into main memory. E.g., loading a 100 GB table
works fine on a system with 8 GB RAM and 16 GB swap -- provided there is
sufficient free disk space."
Bulk-loading is not a problem since we split CSV in chunks of 2-5M, but i'm
more concern about queries, our datasize will be around 100-150GB ~1B rows
and we are thinking on 32GB Memory server. I think 32GB memory will be more
than enough for any query below:
Our queries will be
select ..., sum(...) where senddate>X and senddate