Re: [Monetdb-developers] Monetdb-developers Digest, Vol 30, Issue 10
Hi Stefan (the non-german .de one)
Maybe a very strange idea, could be your better solution. If by default these operations are mmap'ed, they should be taken care of by a disk based memory extension. Hence being independent of RAM/SWAP but having all pro's of both.
You are right. In fact, this is what MonetDB is supposed to do. Memory is organized in contiguous Heap objects. This does not only hold for BATs but also for Hash Tables. At some moment, I did add the possibility to use a real memory mapped files for their allocation, as a last resort. The problem was there that the Heap() API did not have a filename parameter, and a HEAPextend() thus would not know what file to use. In order not to break the API, I decided to initialize the filename fields at BAT creation time (this is why you may sometimes see .hhash and .thash files in a just stopped MonetDB repository -- even though hash tables are not persistent). With hindsight, this API decision was bad judgement; because the question whether real file mapping is used now depends on the question whether the filename was initialized at BAT or Hash table creation. Now, this should be the case, but it would be my first question to address in a debugging session. I also recall that in case of the Skyserver TB dataset, key checking was disabled to to similar problems. Of course, if we would find and fix this bug, life will not be rosy for you, doing random access (key uniqueness checking) into a 14GB heap that will never fit your 2GB RAM. That goes back to your question whether it is in fact reasonable to materialize 14GB. Questions like these led to a redesign that is much more careful with memory consumption (X100) whose spin-off development regrettably means that my time for such debugging sessions currently is limited. Peter PS The likely cause for the problem is a shortage of swap space. Increasing your swap file size (with a few 10s of GB) should be a workaround for the problem.
Peter Boncz wrote:
Hi Stefan (the non-german .de one)
session.
I also recall that in case of the Skyserver TB dataset, key checking was disabled to to similar problems.
No. The key checking was initially turned off, because the hash table (32-bit) could not well cope with the 13.000.000.000 tuples in the table, causing lengthly hash collisions chains. It took a day....to handle the request. Work is in progress to speed this up using a straightforward clustering on the built-in hash. The preliminary code has been checked in.
PS The likely cause for the problem is a shortage of swap space. Increasing your swap file size (with a few 10s of GB) should be a workaround for the problem.
The swap space set aside for Skyserver was 128G
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Monetdb-developers mailing list Monetdb-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-developers
Peter Boncz wrote:
With hindsight, this API decision was bad judgement; because the question whether real file mapping is used now depends on the question whether the filename was initialized at BAT or Hash table creation. Now, this should be the case, but it would be my first question to address in a debugging session.
If someone wants to try, and has time, I can compile it with debugging on again.
I also recall that in case of the Skyserver TB dataset, key checking was disabled to to similar problems.
Foreign key checking doesn't seem to be the problem on the smaller 7GB tables for now.
Of course, if we would find and fix this bug, life will not be rosy for you, doing random access (key uniqueness checking) into a 14GB heap that will never fit your 2GB RAM. That goes back to your question whether it is in fact reasonable to materialize 14GB.
I presume the amount of strings that are effectively stored for each string hash table; the amount of users of OSM, the variance in keys used < 100. And a big table that shouldn't have many duplicates and is not often used as lookup anyway.
PS The likely cause for the problem is a shortage of swap space. Increasing your swap file size (with a few 10s of GB) should be a workaround for the problem.
I'll give it a shot. Linux seems to like files as swap too. So that shouldn't be a problem :) Stefan
participants (3)
-
Martin Kersten
-
Peter Boncz
-
Stefan de Konink