Peter Boncz wrote:
With hindsight, this API decision was bad judgement; because the question whether real file mapping is used now depends on the question whether the filename was initialized at BAT or Hash table creation. Now, this should be the case, but it would be my first question to address in a debugging session.
If someone wants to try, and has time, I can compile it with debugging on again.
I also recall that in case of the Skyserver TB dataset, key checking was disabled to to similar problems.
Foreign key checking doesn't seem to be the problem on the smaller 7GB tables for now.
Of course, if we would find and fix this bug, life will not be rosy for you, doing random access (key uniqueness checking) into a 14GB heap that will never fit your 2GB RAM. That goes back to your question whether it is in fact reasonable to materialize 14GB.
I presume the amount of strings that are effectively stored for each string hash table; the amount of users of OSM, the variance in keys used < 100. And a big table that shouldn't have many duplicates and is not often used as lookup anyway.
PS The likely cause for the problem is a shortage of swap space. Increasing your swap file size (with a few 10s of GB) should be a workaround for the problem.
I'll give it a shot. Linux seems to like files as swap too. So that shouldn't be a problem :) Stefan