[MonetDB-users] Importing large dataset (TPC-H SF100)

18 Jun 2007

      Sorry, I posted it to a wrong (-developers) list accidentally.

Hi,

I am trying to import the TPC-H dataset (SF100) to the database without 
success. The import method is the same as in the benchmark/tpch, except 
the number of expected records in the load script and the load script 
execution (executed line-by-line in the console). The machine is an dual 
core AMD64 (64bit OS), with 4G ram and 8 disk raid0 for storage.

The import process consume nearly all memory and nearly all the swap 
(please note that the number of expected records is specified after the 
COPY command). In fact I have to restart the server process after each 
table import to prevent the "swap to death" state.

However it seems that the lineitem table import will fail, no matter 
what I do. I have tried it with different client (eg.: mjclient with 
-Xbatching mode), sliced lineitem.tlb file, with the same result.

I have noticed that two or three hours after the issued lineitem COPY 
command the mserver5 process does not consume CPU any more. The attached 
strace shows me the following:

[pid  4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout)
[pid  4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout)
[pid  4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout)

Any idea? Is there any way to import huge datasets (TB) in bulk mode to 
the database? For example the postgresql database has a feature that it 
can import the data without write ahead logging, nearly at disk speed.
The dataset can be sliced up per column, so a direct column copy would 
be possible.

Regards,
J.

James Laken

Colin Foss

James Laken

Martin Kersten

tags

participants (3)