[MonetDB-users] Importing large dataset (TPC-H SF100)
Sorry, I posted it to a wrong (-developers) list accidentally. Hi, I am trying to import the TPC-H dataset (SF100) to the database without success. The import method is the same as in the benchmark/tpch, except the number of expected records in the load script and the load script execution (executed line-by-line in the console). The machine is an dual core AMD64 (64bit OS), with 4G ram and 8 disk raid0 for storage. The import process consume nearly all memory and nearly all the swap (please note that the number of expected records is specified after the COPY command). In fact I have to restart the server process after each table import to prevent the "swap to death" state. However it seems that the lineitem table import will fail, no matter what I do. I have tried it with different client (eg.: mjclient with -Xbatching mode), sliced lineitem.tlb file, with the same result. I have noticed that two or three hours after the issued lineitem COPY command the mserver5 process does not consume CPU any more. The attached strace shows me the following: [pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout) [pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout) [pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout) Any idea? Is there any way to import huge datasets (TB) in bulk mode to the database? For example the postgresql database has a feature that it can import the data without write ahead logging, nearly at disk speed. The dataset can be sliced up per column, so a direct column copy would be possible. Regards, J.
James,
I've had success avoiding "swap-of-death" with COPY
INTO by specifying the number of records on each
statement.
--- James Laken
Sorry, I posted it to a wrong (-developers) list accidentally.
Hi,
I am trying to import the TPC-H dataset (SF100) to the database without success. The import method is the same as in the benchmark/tpch, except the number of expected records in the load script and the load script execution (executed line-by-line in the console). The machine is an dual core AMD64 (64bit OS), with 4G ram and 8 disk raid0 for storage.
The import process consume nearly all memory and nearly all the swap (please note that the number of expected records is specified after the COPY command). In fact I have to restart the server process after each table import to prevent the "swap to death" state.
However it seems that the lineitem table import will fail, no matter what I do. I have tried it with different client (eg.: mjclient with -Xbatching mode), sliced lineitem.tlb file, with the same result.
I have noticed that two or three hours after the issued lineitem COPY command the mserver5 process does not consume CPU any more. The attached strace shows me the following:
[pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout) [pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout) [pid 4843] select(6, [5], NULL, NULL, {0, 500}) = 0 (Timeout)
Any idea? Is there any way to import huge datasets (TB) in bulk mode to the database? For example the postgresql database has a feature that it can import the data without write ahead logging, nearly at disk speed. The dataset can be sliced up per column, so a direct column copy would be possible.
Regards, J.
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net
I forgot to mention that the load script use the following formula: COPY N RECORDS INTO table from 'path/to/importfile' ... where N is the number of records in the inputfile. Regards, Zoltan Colin Foss wrote:
James,
I've had success avoiding "swap-of-death" with COPY INTO by specifying the number of records on each statement.
Dear All, The problems reported on loading have our full attention. They are hard to reproduce. It takes a long time to reach the point were it breaks. BUT, we can reproduce (hopefully) the bug (It seems a memory overwrite). Due to a local science meeting and priority (they pay for the development of MonetDB), the concerted frontal attack on the bug will start upcoming Wednesday. Please stay with us. regards, Martin James Laken wrote:
I forgot to mention that the load script use the following formula: COPY N RECORDS INTO table from 'path/to/importfile' ... where N is the number of records in the inputfile.
Regards, Zoltan
Colin Foss wrote:
James,
I've had success avoiding "swap-of-death" with COPY INTO by specifying the number of records on each statement.
------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (3)
-
Colin Foss
-
James Laken
-
Martin Kersten