New subject: [MonetDB-users] 64bit MonetDB, JDBC Insert via RJDBC, >300 million rows

18 Mar 2009

      Yue Sheng wrote:
...
Martin,
It almost worked...
This is what I did and what have happened:
I have 322 files to insert into data base, totaling 650 million rows
I divided the file list into two, then for each sub list
(a) I insert first file in the list with N set to 650milllion rows, 
(b) all subsequent files have N set to the number of lines in *that* file
once first list done, then
(c) I insert first file in the second list with N set to 650million rows,
(d) all subsequent files have N set to the number of lines in *that* file
Then the same problem happened: it stucked at file number 316.
ok. using the 650M enables MonetDB to allocate enough space and does
not have to fall back on guessing. Guessing is painful, because
when a file of N records has been created and it needs more, it makes
a file of size 1.3xN. This leads to memory fragmentation.

in your case i would have been a little mode spacious and used
700M as a start, because miscalculation of 1 gives a lot of pain.
Such advice is only needed in (a)
...
Note: This is farther then previous tries, which all stopped in the 
region of file number 280 +/- a few.
My observation:
(i) at (a), the VSIZE went up to around 46GB, then after first insert, 
it drops to around 36GB
...
(ii) at (c), the VSIZE went up to around 130GB, then after first insert, 
it drops to around 45GB
you tell the system to extend existing BATs prepare for another 650 M,
which means it allocates 2*36 G, plus room for the old one gives 108GB
ok fits
then during processings some temporary BATs may be needed,e.g. to check
integrity constraints after each file,.
Then it runs out of swapspace.
...
(iii) the "Free Memory", as reported by Activity Monitor, just before it 
failed at file number 316, dipped to as low as 7.5MB!
yes, you are running out of swapspace on your system.
This should not have happened, because the system uses mmapped files
and may be an issue with the MacOS or relate to a problem we fixed
recently
...
My question:
(1) why we need to ramp N up to total number of lines (it takes along 
time to do that), then only have it drop down to 30GB-40GB right after
this might indicate that on MacOS, just like Windows, mmaped files
need to be written to disk. With a disk bandwidth of 50MB/sec it
still takes several minutes
...
the first insert and stay roughly there? Does it mean we're giving back 
all the pre-allocation space back to the OS? Then should we set N always 
to total number of lines? If so, it would take much much longer to 
process all the files...
(2) How come RSIZE never goes above 4GB? 
(3) Does sql log file size have some limit, that we need to tweak?
no limit
(4) Has anyone successfully implemented the 64bit version of MondeDB and 
successfully inserted more than 1billion rows?
you platform may be the first, but Amherst has worked with Macs for years
(5) when you say you "...The VSIZE of 44G is not too problematic, i 
am looking at queries letting it tumble between 20-80 GB....," What does 
it mean? Mine went up to as high as 135GB...
explained above.
regards, Martin
...
Thanks, as always.

Re: [MonetDB-users] 64bit MonetDB, JDBC Insert via RJDBC, >300 million rows

Martin Kersten

Stefan Manegold

tags

participants (2)