Hi everybody, I have a problem inserting a large amount of data into a monetdb database using bulk import. I'm running the following commands in a loop: connection.execute("COPY *19000000* OFFSET 2 RECORDS INTO XXX FROM '" + csv + "' USING DELIMITERS ';','\n' ;") connection.commit() , where csv is a different csv-file in each round of the loop, but always containing 18000001 rows of data. To be sure that enough memory is allocated I chose 19000000 in the execute command. I now have two questions: 1. Should the number of records (here 19000000) represent the number of lines per .csv-file or the number of lines of the final database (number of csv-files * 18Mio.)??? 2. Can you think of any reason why monetdb would stop reading one specific variable, while continuing to read the others? Let's say my csv has 8 columns and 18000000 rows with no missing values in the raw data. Until Row 16537472 the total data is read-in, but for the following lines variable 3 is missing until line 18000000 while variable 1 as well as 3-8 are perfectly fine. Can this be due to memory or harddisk speed constraints? Why is no error message raised? It would be great if someone could help me. Thanks, Thomas
Hi, I ran into a similar problem maybe two weeks ago and was tactfully reminded to check that the DB is in maintenance mode. Otherwise, the WAL will be too long too read (that's what crashed monetdb for me). In my case, there was no need for any additional parameters and I could load ten times 1B rows into ten tables without any problems. Admittedly, our machine has a lot of HDD and RAM. Ralph On 09/13/2013 05:38 PM, Thomas Johann wrote:
Hi everybody, I have a problem inserting a large amount of data into a monetdb database using bulk import.
I'm running the following commands in a loop:
connection.execute("COPY *19000000* OFFSET 2 RECORDS INTO XXX FROM '" + csv + "' USING DELIMITERS ';','\n' ;")
connection.commit()
, where csv is a different csv-file in each round of the loop, but always containing 18000001 rows of data.
To be sure that enough memory is allocated I chose 19000000 in the execute command. I now have two questions: 1. Should the number of records (here 19000000) represent the number of lines per .csv-file or the number of lines of the final database (number of csv-files * 18Mio.)??? 2. Can you think of any reason why monetdb would stop reading one specific variable, while continuing to read the others? Let's say my csv has 8 columns and 18000000 rows with no missing values in the raw data. Until Row 16537472 the total data is read-in, but for the following lines variable 3 is missing until line 18000000 while variable 1 as well as 3-8 are perfectly fine. Can this be due to memory or harddisk speed constraints? Why is no error message raised?
It would be great if someone could help me. Thanks, Thomas
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF
Hi Ralph, first of all: I’m pretty new to MonetDB. So what is the maintenance mode, where can I check/change it and what does it do? I’malso running the whole operations on a 6-core AMD Server with 32GB RAM and a large HDD so the machine should be sufficient. I guess the difference to your case is, that I want to have everything in one large table in the end. Currently I’m writing everything to the same destination XXX in my example below. And the data of the first .csv is properly written to the database. But with the second one I run into the described problems. However if I read the same file seperately into a seperate database, there are no missing variables any more. Would it make more sense, from your point of view to write each .csv into a seperate DB and join them at a later point in time. Best, Thomas Gesendet von Windows-Mail Von: Ralph Holz Gesendet: Samstag, 14. September 2013 18:53 An: users-list@monetdb.org Hi, I ran into a similar problem maybe two weeks ago and was tactfully reminded to check that the DB is in maintenance mode. Otherwise, the WAL will be too long too read (that's what crashed monetdb for me). In my case, there was no need for any additional parameters and I could load ten times 1B rows into ten tables without any problems. Admittedly, our machine has a lot of HDD and RAM. Ralph On 09/13/2013 05:38 PM, Thomas Johann wrote:
Hi everybody, I have a problem inserting a large amount of data into a monetdb database using bulk import.
I'm running the following commands in a loop:
connection.execute("COPY *19000000* OFFSET 2 RECORDS INTO XXX FROM '" + csv + "' USING DELIMITERS ';','\n' ;")
connection.commit()
, where csv is a different csv-file in each round of the loop, but always containing 18000001 rows of data.
To be sure that enough memory is allocated I chose 19000000 in the execute command. I now have two questions: 1. Should the number of records (here 19000000) represent the number of lines per .csv-file or the number of lines of the final database (number of csv-files * 18Mio.)??? 2. Can you think of any reason why monetdb would stop reading one specific variable, while continuing to read the others? Let's say my csv has 8 columns and 18000000 rows with no missing values in the raw data. Until Row 16537472 the total data is read-in, but for the following lines variable 3 is missing until line 18000000 while variable 1 as well as 3-8 are perfectly fine. Can this be due to memory or harddisk speed constraints? Why is no error message raised?
It would be great if someone could help me. Thanks, Thomas
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
Hi, Maintenance mode is done by monetdb lock: http://www.monetdb.org/Documentation/monetdb-man-page As for your second question: no - 19 million rows is really not much. One of our tables holds 50 times that much. Just do a bulk load into one table. Ralph On 09/15/2013 08:42 AM, tjohann87@googlemail.com wrote:
Hi Ralph,
first of all: I’m pretty new to MonetDB. So what is the maintenance mode, where can I check/change it and what does it do? I’m also running the whole operations on a 6-core AMD Server with 32GB RAM and a large HDD so the machine should be sufficient.
I guess the difference to your case is, that I want to have everything in one large table in the end. Currently I’m writing everything to the same destination XXX in my example below. And the data of the first .csv is properly written to the database. But with the second one I run into the described problems. However if I read the same file seperately into a seperate database, there are no missing variables any more. Would it make more sense, from your point of view to write each .csv into a seperate DB and join them at a later point in time.
Best, Thomas
Gesendet von Windows-Mail
*Von:* Ralph Holz *Gesendet:* Samstag, 14. September 2013 18:53 *An:* users-list@monetdb.org
Hi,
I ran into a similar problem maybe two weeks ago and was tactfully reminded to check that the DB is in maintenance mode. Otherwise, the WAL will be too long too read (that's what crashed monetdb for me).
In my case, there was no need for any additional parameters and I could load ten times 1B rows into ten tables without any problems. Admittedly, our machine has a lot of HDD and RAM.
Ralph
On 09/13/2013 05:38 PM, Thomas Johann wrote:
Hi everybody, I have a problem inserting a large amount of data into a monetdb database using bulk import.
I'm running the following commands in a loop:
connection.execute("COPY *19000000* OFFSET 2 RECORDS INTO XXX FROM '" + csv + "' USING DELIMITERS ';','\n' ;")
connection.commit()
, where csv is a different csv-file in each round of the loop, but always containing 18000001 rows of data.
To be sure that enough memory is allocated I chose 19000000 in the execute command. I now have two questions: 1. Should the number of records (here 19000000) represent the number of lines per .csv-file or the number of lines of the final database (number of csv-files * 18Mio.)??? 2. Can you think of any reason why monetdb would stop reading one specific variable, while continuing to read the others? Let's say my csv has 8 columns and 18000000 rows with no missing values in the raw data. Until Row 16537472 the total data is read-in, but for the following lines variable 3 is missing until line 18000000 while variable 1 as well as 3-8 are perfectly fine. Can this be due to memory or harddisk speed constraints? Why is no error message raised?
It would be great if someone could help me. Thanks, Thomas
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF
Dear all, incremental bulk load fro multiple files into one table is supposed to work just as fine as bulk loading from a single file. If it doesn't (in selected cases), we'd need to have access to all information (and data) to reproduce the problem in order to analyze and fix it. You might also want to consider doing bulk load in LOCKED mode; cf. http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto You can also try bulkloading from multiple files in a single copy-into statement giving a comma separated list of input files; cf., http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto Best, Stefan ----- Original Message -----
Hi,
Maintenance mode is done by monetdb lock: http://www.monetdb.org/Documentation/monetdb-man-page
As for your second question: no - 19 million rows is really not much. One of our tables holds 50 times that much. Just do a bulk load into one table.
Ralph
On 09/15/2013 08:42 AM, tjohann87@googlemail.com wrote:
Hi Ralph,
first of all: I’m pretty new to MonetDB. So what is the maintenance mode, where can I check/change it and what does it do? I’m also running the whole operations on a 6-core AMD Server with 32GB RAM and a large HDD so the machine should be sufficient.
I guess the difference to your case is, that I want to have everything in one large table in the end. Currently I’m writing everything to the same destination XXX in my example below. And the data of the first .csv is properly written to the database. But with the second one I run into the described problems. However if I read the same file seperately into a seperate database, there are no missing variables any more. Would it make more sense, from your point of view to write each .csv into a seperate DB and join them at a later point in time.
Best, Thomas
Gesendet von Windows-Mail
*Von:* Ralph Holz *Gesendet:* Samstag, 14. September 2013 18:53 *An:* users-list@monetdb.org
Hi,
I ran into a similar problem maybe two weeks ago and was tactfully reminded to check that the DB is in maintenance mode. Otherwise, the WAL will be too long too read (that's what crashed monetdb for me).
In my case, there was no need for any additional parameters and I could load ten times 1B rows into ten tables without any problems. Admittedly, our machine has a lot of HDD and RAM.
Ralph
On 09/13/2013 05:38 PM, Thomas Johann wrote:
Hi everybody, I have a problem inserting a large amount of data into a monetdb database using bulk import.
I'm running the following commands in a loop:
connection.execute("COPY *19000000* OFFSET 2 RECORDS INTO XXX FROM '" + csv + "' USING DELIMITERS ';','\n' ;")
connection.commit()
, where csv is a different csv-file in each round of the loop, but always containing 18000001 rows of data.
To be sure that enough memory is allocated I chose 19000000 in the execute command. I now have two questions: 1. Should the number of records (here 19000000) represent the number of lines per .csv-file or the number of lines of the final database (number of csv-files * 18Mio.)??? 2. Can you think of any reason why monetdb would stop reading one specific variable, while continuing to read the others? Let's say my csv has 8 columns and 18000000 rows with no missing values in the raw data. Until Row 16537472 the total data is read-in, but for the following lines variable 3 is missing until line 18000000 while variable 1 as well as 3-8 are perfectly fine. Can this be due to memory or harddisk speed constraints? Why is no error message raised?
It would be great if someone could help me. Thanks, Thomas
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- Ralph Holz I8 - Network Architectures and Services Technische Universität München http://www.net.in.tum.de/de/mitarbeiter/holz/ Phone +49.89.289.18043 PGP: A805 D19C E23E 6BBB E0C4 86DC 520E 0C83 69B0 03EF _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
participants (4)
-
Ralph Holz
-
Stefan Manegold
-
Thomas Johann
-
tjohann87@googlemail.com