
I work for a company Adclick on Portugal, and we are trying MonetDb and
benchmarking it. We are doing it at AWS (Amazon Web Services), that IO is
very low, so we didn't expect great numbers, but we are very surprised
great numbers appear. Now we will benchmark on server with SSD disks. I'm
really enjoying MonetDb, *very good work *developing team. I think if
MonetDb team work in replication master-slave, on real delete's! more rpm's
for CentOS, AMI to Amazon Cloud, MonetDb can become real a top commercial
product (i already tried so many).
I have a question regarding memory:
"MonetDB excessively uses main memory for processing, but does not require
that all data fit in the available physical memory. To handle dataset that
exceed the available physical memory, MonetDB does not (only) rely on the
available swap space, but (also) uses memory-mapped files to exploit disk
storage beyond the swap space as virtual memory."
"For example, while bulk-loading data (preferably via a COPY INTO
statements from a (possibly compressed) CSV file), MonetDB need to have all
columns of the table that is currently being loaded "active", i.e.,
accessable in the address space. However, during loading, parts of the data
are continuously written to the persisten files on disk, i.e., the whole
table does not have to fit into main memory. E.g., loading a 100 GB table
works fine on a system with 8 GB RAM and 16 GB swap -- provided there is
sufficient free disk space."
Bulk-loading is not a problem since we split CSV in chunks of 2-5M, but i'm
more concern about queries, our datasize will be around 100-150GB ~1B rows
and we are thinking on 32GB Memory server. I think 32GB memory will be more
than enough for any query below:
Our queries will be
select ..., sum(...) where senddate>X and senddate

Dear eduardo Thank you for the appraisal, it is very much appreciated. On 11/4/12 1:13 PM, Eduardo Oliveira wrote:
I work for a company Adclick on Portugal, and we are trying MonetDb and benchmarking it. We are doing it at AWS (Amazon Web Services), that IO is very low, so we didn't expect great numbers, but we are very surprised great numbers appear. Now we will benchmark on server with SSD disks. I'm really enjoying MonetDb, *very good work *developing team. I think if MonetDb team work in replication master-slave, on real delete's! more rpm's for CentOS, AMI to Amazon Cloud, MonetDb can become real a top commercial product (i already tried so many). Yes, but that is not the prime focus of the research group, but may become part of the road-map in a commercial setting.
I have a question regarding memory:
"MonetDB excessively uses main memory for processing, but does not require that all data fit in the available physical memory. To handle dataset that exceed the available physical memory, MonetDB does not (only) rely on the available swap space, but (also) uses memory-mapped files to exploit disk storage beyond the swap space as virtual memory."
Indeed.
"For example, while bulk-loading data (preferably via a COPY INTO statements from a (possibly compressed) CSV file), MonetDB need to have all columns of the table that is currently being loaded "active", i.e., accessable in the address space. However, during loading, parts of the
data are continuously written to the persisten files on disk, i.e., the If you used the LOCKED version, otherwise it will be written also to the SQL log afterwards. whole table does not have to fit into main memory. E.g., loading a 100 GB table works fine on a system with 8 GB RAM and 16 GB swap -- provided there is sufficient free disk space." indeed.
Bulk-loading is not a problem since we split CSV in chunks of 2-5M, but i'm more concern about queries, our datasize will be around 100-150GB ~1B rows and we are thinking on 32GB Memory server. I think 32GB memory will be more than enough for any query below: with 32GB RA, MonetDB would easily keep 4 1B type date columns in memory. If there is no locality of column access, then of course MonetDB has to read the columns needed from disk at some point. If those happen to be sorted, then most selections do not cause a major performance degradation.
Our queries will be select ..., sum(...) where senddate>X and senddate
Or select ..., sum(...) where somecolumn = T and senddate>X and senddate
We will have just one table, and we will have some concurrency, not a lot, we expect 50-200 (depending on the time of day) queries per minute. So it can happen 3-5 queries at the same time. MonetDb will be able to handle this with 32GB of memory without swapping? is That depends on the total footprint. If the columns on which selection takes place (comfortably) fit in memory, disk is only accessed for the
indeed projection columns by direct seeks.
I watch http://www.youtube.com/watch?v=yrLd-3lnZ58 and in our cause with our queries with senddate>X and senddate
Concurrent scans are relevant only when you stream from disk. Their effect in a memory setting remains to be seen.
Of course i will test concurrency before putting on production, but if i can get a rough answer will be great. Thanks a lot in advance.
Keeps us posted.. Thanks, Martin
-- *Eduardo Oliveira * /IT/ ***Email:* eduardo.oliveira@adclick.pt mailto:nuno.morais@adclick.pt *Web: *www.adclickint.com http://www.adclickint.com/ http://www.adclickint.com/
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
participants (2)
-
Eduardo Oliveira
-
Martin Kersten