[Monetdb-developers] Dealing with large tables
Dear Monet developers, Once again, here I am, writing for help ;-) We are working with a 'large' table, with 250 million rows, 11 columns. This takes up a lot of memory space, as you can imagine: a single column of 250 million 'ints' requires almost a gigabyte, and there's several of those; in fact, I think that the entire table takes up a bit over 6GB in memory (some columns are strings, and can be compressed). The problem is that we have a machine with 8GB, and every time we try to do something with that table, Monet (4.16.2) crashes. For example, I've been trying to delete rows from the table where two columns have the same value: var var_0 := [=](col1,col2).select(true).mirror(); col1.delete(var_0); commit(); col2.delete(var_0); commit(); col3.delete(var_0); commit(); ... After deleting from a few of those columns, Monet dies: !ERROR: BATSIGcrash: Mserver internal error (Segmentation fault), please restart. !ERROR: (One potential cause could be that your disk might be full...) Do you know if there a fix for this? In principle, Monet should be able to unload the BATs that it has already deleted from, to release memory. Also, would doing a semijoin() instead of a delete() help? (I though that a semijoin would need to create a new BAT, thereby using even more memory!). And, in general: is Monet 5 capable of accessing BATs that don't entirely fit in memory? Would the new version fix this problem? Maybe it's time for us to adapt it... Thanks again for all your help, and regards from Amherst, -- Agustin
Agustin Schapira wrote:
Dear Monet developers,
Once again, here I am, writing for help ;-)
We are working with a 'large' table, with 250 million rows, 11 columns. This takes up a lot of memory space, as you can imagine: a single column of 250 million 'ints' requires almost a gigabyte, and I would say 1GB is a minimum. What if you are on a 64bit machine? And also, if this table requires a hash index, then add another 1GB/table.
there's several of those; in fact, I think that the entire table takes up a bit over 6GB in memory (some columns are strings, and can be compressed). The problem is that we have a machine with 8GB, and every time we try to do something with that table, Monet (4.16.2) crashes. For example, I've been trying to delete rows from the table where two columns have the same value:
var var_0 := [=](col1,col2).select(true).mirror();
Hard to deduce what's happening.
col1.delete(var_0); commit(); col2.delete(var_0); commit(); col3.delete(var_0); commit(); I would first try to make sure if the commit is not sitting in the way. Do you need to commit here ? ...
After deleting from a few of those columns, Monet dies:
!ERROR: BATSIGcrash: Mserver internal error (Segmentation fault), please restart. !ERROR: (One potential cause could be that your disk might be full...)
Do you know if there a fix for this? In principle, Monet should be able to unload the BATs that it has already deleted from, to release memory. Also, would doing a semijoin() instead of a delete() help? (I though that a semijoin would need to create a new BAT, thereby using even more memory!). And, in general: is Monet 5 capable of accessing BATs that don't entirely fit in memory? Would the new version fix this problem? Maybe it's time for us to adapt it...
Thanks again for all your help, and regards from Amherst,
-- Agustin
------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Martin,
Once again, here I am, writing for help ;-) We are working with a 'large' table, with 250 million rows, 11 columns. This takes up a lot of memory space, as you can imagine: a single column of 250 million 'ints' requires almost a gigabyte, and I would say 1GB is a minimum. What if you are on a 64bit machine? And also, if this table requires a hash index, then add another 1GB/ table.
It is taking 1GB per BAT. It's running on a 64-bit machine, but Monet is compiled with 32-bit oids, and the column has 'ints', not 'words'...
Hard to deduce what's happening.
col1.delete(var_0); commit(); col2.delete(var_0); commit(); col3.delete(var_0); commit(); I would first try to make sure if the commit is not sitting in the way. Do you need to commit here ?
No, the commit() is not necessary. In fact, the original code didn't have it; I just thought that maybe calling commit() would help Monet free up BATs that it didn't need anymore. Thanks for your answer; I'll keep trying, -- A
Agustin,
--- Agustin Schapira
Dear Monet developers,
And, in general: is Monet 5 capable of accessing BATs that don't entirely fit in memory? Would the new version fix this problem? Maybe it's time for us to adapt it...
I'm not a developer but am working with MonetDB5 and have tables with a similar number of rows (250M+) and columns as what you described. My database is currently over 14GB in size and works fine on my servers with 8-9GB RAM. I'm sure you would be able to work with less RAM but haven't tried. I'm planning to quadruple the database over the next month. I can say that MonetDB4 didn't handle things nearly as well in terms of memory management or stability.
participants (3)
-
Agustin Schapira
-
Colin Foss
-
Martin Kersten