On Wednesday 2009 July 22 12:25:23 pm Martin Kersten wrote:
Charles Samuels wrote:
Hi,
I have read previously that monetdb is intended to be used only when the data can fit in memory. I'm not sure how to interpret this statement - does this mean that the entire database must fit in memory, or does it mean that only the data being operated on when selected from the database must fit in memory?
MonetDB does *not* limit your database to fit in main memory.
However, it uses main memory techniques to address the elements in tables loaded for processing. This means that the hot-set accessed during query execution is limited when you run on 32-bit machines.
Can you elaborate a little bit? Does hot-set mean that a select on the database can only span the chunk that can fit in a 32-bit process's address space? (This seems like a perfectly reasonable limitation, I'm just making sure).
We are wanting to store *huge* amounts of data, ultimately several TBs. The actual data being select at any given time will only be several hundred MBs, however. Is this a reasonable usage of MonetDB?
The largest database we use internally to experiment with is a 6TB astronomical database. It contains two large tables, one with 400M rows of 500 columns and another of 20 B rows and 6 columns. Both tables are larger then 1TB each.
That's really great to hear :)
This runs on a Linux dual quadcore with 64G Ram and lots of disk space. Ofcourse, at some point you well notice IO behavior ;)
For what kind of operations do you need that amount of memory? If you were to do a select on just a few thousand rows on that same mega-database, would it be happy with more "mundane" amounts of memory?
But most importantly, I have noticed that all the row ID indexes are "int" - why the 32 bit limitation? This just seems like a symptom of the above problem; that the entire database must fit in memory.
You can use the system with both 32- and 64- bit oids.
Ah, I previously had a bit of a brain failure. When reading Mapi.h, I noticed that "mapi_get_row_count" returned an int - obviously you wouldn't want select to return 2^32 rows, which is what this function counts the rows of. Thanks Martin (and also Lefteris), Charles -- Charles Samuels