I work for a company Adclick on Portugal, and we are trying MonetDb and benchmarking it. We are doing it at AWS (Amazon Web Services), that IO is very low, so we didn't expect great numbers, but we are very surprised great numbers appear. Now we will benchmark on server with SSD disks. I'm really enjoying MonetDb, very good work developing team. I think if MonetDb team work in replication master-slave, on real delete's! more rpm's for CentOS, AMI to Amazon Cloud, MonetDb can become real a top commercial product (i already tried so many).
I have a question regarding memory:
"MonetDB excessively uses main memory for processing, but does not
require that all data fit in the available physical memory. To handle
dataset that exceed the available physical memory, MonetDB does not
(only) rely on the available swap space, but (also) uses memory-mapped
files to exploit disk storage beyond the swap space as virtual memory."
"For example, while bulk-loading data (preferably via a COPY INTO
statements from a (possibly compressed) CSV file), MonetDB need to have
all columns of the table that is currently being loaded "active", i.e.,
accessable in the address space. However, during loading, parts of the
data are continuously written to the persisten files on disk, i.e., the
whole table does not have to fit into main memory. E.g., loading a 100
GB table works fine on a system with 8 GB RAM and 16 GB swap -- provided
there is sufficient free disk space."
Bulk-loading is not a problem since we split CSV in chunks of 2-5M, but i'm more concern about queries, our datasize will be around 100-150GB ~1B rows and we are thinking on 32GB Memory server. I think 32GB memory will be more than enough for any query below:
Our queries will be select ..., sum(...) where senddate>X and senddate<Y group by K, [Z]
Or select ..., sum(...) where somecolumn = T and senddate>X and senddate<Y group by K, [Z]
We will have just one table, and we will have some concurrency, not a lot, we expect 50-200 (depending on the time of day) queries per minute. So it can happen 3-5 queries at the same time. MonetDb will be able to handle this with 32GB of memory without swapping?
I watch http://www.youtube.com/watch?v=yrLd-3lnZ58 and in our cause with our queries with senddate>X and senddate<Y it looks like that concurrent queries will not be able to share data between.
Of course i will test concurrency before putting on production, but if i
can get a rough answer will be great. Thanks a lot in advance.