Re: [MonetDB-users] 64bit MonetDB, JDBC Insert via RJDBC, >300 million rows

Yue Sheng wrote:
Martin,
It almost worked...
This is what I did and what have happened:
I have 322 files to insert into data base, totaling 650 million rows
I divided the file list into two, then for each sub list
(a) I insert first file in the list with N set to 650milllion rows, (b) all subsequent files have N set to the number of lines in *that* file
once first list done, then
(c) I insert first file in the second list with N set to 650million rows, (d) all subsequent files have N set to the number of lines in *that* file
Then the same problem happened: it stucked at file number 316.
ok. using the 650M enables MonetDB to allocate enough space and does not have to fall back on guessing. Guessing is painful, because when a file of N records has been created and it needs more, it makes a file of size 1.3xN. This leads to memory fragmentation. in your case i would have been a little mode spacious and used 700M as a start, because miscalculation of 1 gives a lot of pain. Such advice is only needed in (a)
Note: This is farther then previous tries, which all stopped in the region of file number 280 +/- a few.
My observation: (i) at (a), the VSIZE went up to around 46GB, then after first insert, it drops to around 36GB
(ii) at (c), the VSIZE went up to around 130GB, then after first insert, it drops to around 45GB you tell the system to extend existing BATs prepare for another 650 M, which means it allocates 2*36 G, plus room for the old one gives 108GB
ok fits then during processings some temporary BATs may be needed,e.g. to check integrity constraints after each file,. Then it runs out of swapspace.
(iii) the "Free Memory", as reported by Activity Monitor, just before it failed at file number 316, dipped to as low as 7.5MB! yes, you are running out of swapspace on your system. This should not have happened, because the system uses mmapped files and may be an issue with the MacOS or relate to a problem we fixed recently
My question: (1) why we need to ramp N up to total number of lines (it takes along time to do that), then only have it drop down to 30GB-40GB right after
this might indicate that on MacOS, just like Windows, mmaped files need to be written to disk. With a disk bandwidth of 50MB/sec it still takes several minutes
the first insert and stay roughly there? Does it mean we're giving back all the pre-allocation space back to the OS? Then should we set N always to total number of lines? If so, it would take much much longer to process all the files... (2) How come RSIZE never goes above 4GB? (3) Does sql log file size have some limit, that we need to tweak? no limit (4) Has anyone successfully implemented the 64bit version of MondeDB and successfully inserted more than 1billion rows? you platform may be the first, but Amherst has worked with Macs for years (5) when you say you "...The VSIZE of 44G is not too problematic, i am looking at queries letting it tumble between 20-80 GB....," What does it mean? Mine went up to as high as 135GB... explained above.
regards, Martin
Thanks, as always.

On Wed, Mar 18, 2009 at 06:51:47AM +0100, Martin Kersten wrote:
Yue Sheng wrote:
Martin,
It almost worked...
This is what I did and what have happened:
I have 322 files to insert into data base, totaling 650 million rows
I divided the file list into two, then for each sub list
Why two lists and how did you devide the files?
(a) I insert first file in the list with N set to 650milllion rows, (b) all subsequent files have N set to the number of lines in *that* file
do (a) plus (b) load all 650M records, or less (say about half)?
once first list done, then
(c) I insert first file in the second list with N set to 650million rows,
does this go into the same table as (a) & (b)? are these 650M extra tuples of just staring the second half? in the first case (i.e., loading 1.3B tuples into one table), you should set N to 1.3B in (a) and set N in (c) just as you do in (b) & (d)
(d) all subsequent files have N set to the number of lines in *that* file
Then the same problem happened: it stucked at file number 316.
ok. using the 650M enables MonetDB to allocate enough space and does not have to fall back on guessing. Guessing is painful, because when a file of N records has been created and it needs more, it makes a file of size 1.3xN. This leads to memory fragmentation.
in your case i would have been a little mode spacious and used 700M as a start, because miscalculation of 1 gives a lot of pain. Such advice is only needed in (a)
Note: This is farther then previous tries, which all stopped in the region of file number 280 +/- a few.
My observation: (i) at (a), the VSIZE went up to around 46GB, then after first insert, it drops to around 36GB
(ii) at (c), the VSIZE went up to around 130GB, then after first insert, it drops to around 45GB you tell the system to extend existing BATs prepare for another 650 M, which means it allocates 2*36 G, plus room for the old one gives 108GB
ok fits then during processings some temporary BATs may be needed,e.g. to check integrity constraints after each file,. Then it runs out of swapspace.
(iii) the "Free Memory", as reported by Activity Monitor, just before it failed at file number 316, dipped to as low as 7.5MB! yes, you are running out of swapspace on your system. This should not have happened, because the system uses mmapped files and may be an issue with the MacOS or relate to a problem we fixed recently
My question: (1) why we need to ramp N up to total number of lines (it takes along time to do that), then only have it drop down to 30GB-40GB right after
this might indicate that on MacOS, just like Windows, mmaped files need to be written to disk. With a disk bandwidth of 50MB/sec it still takes several minutes
the first insert and stay roughly there? Does it mean we're giving back all the pre-allocation space back to the OS? Then should we set N always to total number of lines? If so, it would take much much longer to process all the files... (2) How come RSIZE never goes above 4GB? (3) Does sql log file size have some limit, that we need to tweak? no limit (4) Has anyone successfully implemented the 64bit version of MondeDB and successfully inserted more than 1billion rows? you platform may be the first, but Amherst has worked with Macs for years (5) when you say you "...The VSIZE of 44G is not too problematic, i am looking at queries letting it tumble between 20-80 GB....," What does it mean? Mine went up to as high as 135GB...
I've seen >200GB VSIZE MonetDB on 8 GB machines, and currently see 1.2TB VSIZE on 64GB machine --- VSIZE does not matter, unless you really need to access all this data "at the same time" ... Stefan
explained above.
regards, Martin
Thanks, as always.
------------------------------------------------------------------------------ Apps built with the Adobe(R) Flex(R) framework and Flex Builder(TM) are powering Web 2.0 with engaging, cross-platform capabilities. Quickly and easily build your RIAs with Flex Builder, the Eclipse(TM)based development software that enables intelligent coding and step-through debugging. Download the free 60 day trial. http://p.sf.net/sfu/www-adobe-com _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
participants (2)
-
Martin Kersten
-
Stefan Manegold