[Monetdb-developers] Importing large dataset (TPC-H SF100)
Hi, I am trying to import the TPC-H dataset (SF100) to the database without success. The import method is the same as in the benchmark/tpch, except the number of expected records in the load script and the load script execution (executed line-by-line in the console). The machine is an dual core AMD64 (64bit OS), with 4G ram and 8 disk raid0 for storage. The import process consume nearly all memory and nearly all the swap (please note that the number of expected records is specified after the COPY command). In fact I have to restart the server process after each table import to prevent the "swap to death" state. However it seems that the lineitem table import will fail, no matter what I do. I have tried it with different client (eg.: mjclient with -Xbatching mode), sliced lineitem.tlb file, with the same result. I have noticed that two or three hours after the issued lineitem COPY command the mserver5 process does not consume CPU any more. The attached strace shows me the following: [pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout) [pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout) [pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout) Any idea? Is there any way to import huge datasets (TB) in bulk mode to the database? For example the postgresql database has a feature that it can import the data without write ahead logging, nearly at disk speed. The dataset can be sliced up per column, so a direct column copy would be possible. Regards, J.
participants (1)
-
James Laken