Bulk Import

13 Sep 2013

      Hi everybody,
I have a problem inserting a large amount of data into a monetdb database
using bulk import.

I'm running the following commands in a loop:

connection.execute("COPY *19000000* OFFSET 2 RECORDS INTO XXX FROM '" + csv
+ "' USING DELIMITERS ';','\n' ;")

connection.commit()

, where csv is a different csv-file in each round of the loop, but always
containing 18000001 rows of data.

To be sure that enough memory is allocated I chose 19000000 in the execute
command.
I now have two questions:
1. Should the number of records (here 19000000) represent the number of
lines per .csv-file or the number of lines of the final database (number of
csv-files * 18Mio.)???
2. Can you think of any reason why monetdb would stop reading one specific
variable, while continuing to read the others? Let's say my csv has 8
columns and 18000000 rows with no missing values in the raw data.
Until Row 16537472
the total data is read-in, but for the following lines variable 3 is
missing until line 18000000 while variable 1 as well as 3-8 are perfectly
fine. Can this be due to memory or harddisk speed constraints? Why is no
error message raised?

It would be great if someone could help me.
Thanks,
Thomas

Thomas Johann

Ralph Holz

tjohann87＠googlemail.com

Ralph Holz

Stefan Manegold

tags

participants (4)