On Wednesday 2009 July 22 12:25:23 pm Martin Kersten wrote:
Charles Samuels wrote:
Hi,
I have read previously that monetdb is intended to be used only when the data can fit in memory. I'm not sure how to interpret this statement - does this mean that the entire database must fit in memory, or does it mean that only the data being operated on when selected from the database must fit in memory?
MonetDB does *not* limit your database to fit in main memory.
However, it uses main memory techniques to address the elements in tables loaded for processing. This means that the hot-set accessed during query execution is limited when you run on 32-bit machines.
Can you elaborate a little bit? Does hot-set mean that a select on the database can only span the chunk that can fit in a 32-bit process's address space? (This seems like a perfectly reasonable limitation, I'm just making sure).
We are wanting to store *huge* amounts of data, ultimately several TBs. The actual data being select at any given time will only be several hundred MBs, however. Is this a reasonable usage of MonetDB?
The largest database we use internally to experiment with is a 6TB astronomical database. It contains two large tables, one with 400M rows of 500 columns and another of 20 B rows and 6 columns. Both tables are larger then 1TB each.
That's really great to hear :)
This runs on a Linux dual quadcore with 64G Ram and lots of disk space. Ofcourse, at some point you well notice IO behavior ;)
For what kind of operations do you need that amount of memory? If you were to do a select on just a few thousand rows on that same mega-database, would it be happy with more "mundane" amounts of memory?
But most importantly, I have noticed that all the row ID indexes are "int" - why the 32 bit limitation? This just seems like a symptom of the above problem; that the entire database must fit in memory.
You can use the system with both 32- and 64- bit oids.
Ah, I previously had a bit of a brain failure. When reading Mapi.h, I noticed that "mapi_get_row_count" returned an int - obviously you wouldn't want select to return 2^32 rows, which is what this function counts the rows of. Thanks Martin (and also Lefteris), Charles -- Charles Samuels
On Wed, 22 Jul 2009, Charles Samuels wrote:
This runs on a Linux dual quadcore with 64G Ram and lots of disk space. Ofcourse, at some point you well notice IO behavior ;)
For what kind of operations do you need that amount of memory? If you were to do a select on just a few thousand rows on that same mega-database, would it be happy with more "mundane" amounts of memory?
There are several reasons you would like to have more memory on big databases. The most trivial onces are preventing disk i/o. Because your OS obviously would like to fill its blockcache with data it has already read, which makes your scans faster. Since Monet uses memory mapping any amount of mapped regions that could be actually read into memory (just like you swap space is read into memory when its used) saves you a great deal of time. Memory mapping seen as operation to have more memory than you actually have, because it is backed by disk clearly benefits if the amount of disk interaction is limited. Now Monet has more 'tricks' like storing computed results. Wouldn't it be good to actually remember costly operations without doing them over and over again? So will it work without this amount of memory? Yes, if the required intermediate results can be written to disks at reasonable speed (otherwise creating a bottleneck) and can be accessed with a reasonable amount of speed. For some people disk i/o is never an issue, for 6 disks of 1TB it is. Stefan
participants (2)
-
Charles Samuels
-
Stefan de Konink