Error: Failed to import - Leftover Data
Hi all: I am receiving an error "Failed to import table ... Leftover data ..." I am using a simple BASH script and the mclient tool to import my data from hundreds of CSV files. The fields are pipe | separated, text values are surrounded with double-quotes, and rows end with LF (\n). This is the command I am using to process the files - for csvfile in `ls | grep -i 2018.*\.csv$`; do mclient -d mydatabase -s "COPY INTO table FROM '/path/to/csv/files/$csvfile' USING DELIMITERS '|', '\n', '\"'"; done I have a .monetdb file setup in my home directory so as not to need the username and password. Thanks kindly, Adam Doherty
Hi Adam, the error suggest that (at least) one line in your CSV file contains more columns than the table you try to load the data into. The "..." after "Leftover data" should tell you what that "excess content" of that line is. Hence, searching for that "..." in your CVS file should help you to locate the problem in the CSV file. Best, Stefan ----- On May 22, 2018, at 5:06 PM, Doherty, Adam adam.doherty@esso.ca wrote:
Hi all:
I am receiving an error "Failed to import table ... Leftover data ..."
I am using a simple BASH script and the mclient tool to import my data from hundreds of CSV files. The fields are pipe | separated, text values are surrounded with double-quotes, and rows end with LF (\n).
This is the command I am using to process the files -
for csvfile in `ls | grep -i 2018.*\.csv$`; do mclient -d mydatabase -s "COPY INTO table FROM '/path/to/csv/files/$csvfile' USING DELIMITERS '|', '\n', '\"'"; done
I have a .monetdb file setup in my home directory so as not to need the username and password.
Thanks kindly,
Adam Doherty _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hi Stefan:
Thanks for the quick reply.
I ran the following on the first CSV file.
awk -F"|" 'NF != 18 {print NR,$0}' mycsvfile.csv
However none of the lines come back, and thus all lines are 18 fields.
When I run COPY INTO with BEST EFFORT, I am able import all but 5,428 records from that file.
Unfortunately, this is event data for an analytics dashboard, I can't have any records missing.
Is there anything else I can dig into to find out why Monet is rejecting the records?
Thanks,
Adam
-----Original Message-----
From: users-list [mailto:users-list-bounces+adam.doherty=esso.ca@monetdb.org] On Behalf Of Stefan Manegold
Sent: Tuesday, May 22, 2018 10:12
To: Communication channel for MonetDB users
Hi all:
I am receiving an error "Failed to import table ... Leftover data ..."
I am using a simple BASH script and the mclient tool to import my data from hundreds of CSV files. The fields are pipe | separated, text values are surrounded with double-quotes, and rows end with LF (\n).
This is the command I am using to process the files -
for csvfile in `ls | grep -i 2018.*\.csv$`; do mclient -d mydatabase -s "COPY INTO table FROM '/path/to/csv/files/$csvfile' USING DELIMITERS '|', '\n', '\"'"; done
I have a .monetdb file setup in my home directory so as not to need the username and password.
Thanks kindly,
Adam Doherty _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Adam, In my experience, the cause of this kind of problem is usually incorrect quoting in the CSV. One thing that could help you figure out what happens is the sys.rejects table, which records information about lines the CSV parser rejected sql>COPY INTO example FROM '/tmp/lala.csv' USING DELIMITERS '|','\n','"'; Failed to import table 'example', Leftover data 'b' sql>SELECT * FROM sys.rejects; +-------+-------+-------------------+-------+ | rowid | fldid | message | input | +=======+=======+===================+=======+ | 0 | 2 | Leftover data 'b' | 1|a | +-------+-------+-------------------+-------+ The CSV was: 1|a|b 2|c and the table: CREATE TABLE "sys"."example" ( "i" INTEGER, "s" CHARACTER LARGE OBJECT ); Please note that the contents of the sys.rejects table do not carry over to new client sessions. Best regards, Panos. On 22/05/18 19:43, Doherty, Adam wrote:
Hi Stefan:
Thanks for the quick reply. I ran the following on the first CSV file.
awk -F"|" 'NF != 18 {print NR,$0}' mycsvfile.csv
However none of the lines come back, and thus all lines are 18 fields.
When I run COPY INTO with BEST EFFORT, I am able import all but 5,428 records from that file. Unfortunately, this is event data for an analytics dashboard, I can't have any records missing.
Is there anything else I can dig into to find out why Monet is rejecting the records?
Thanks, Adam
-----Original Message----- From: users-list [mailto:users-list-bounces+adam.doherty=esso.ca@monetdb.org] On Behalf Of Stefan Manegold Sent: Tuesday, May 22, 2018 10:12 To: Communication channel for MonetDB users
Subject: Re: Error: Failed to import - Leftover Data Hi Adam,
the error suggest that (at least) one line in your CSV file contains more columns than the table you try to load the data into.
The "..." after "Leftover data" should tell you what that "excess content" of that line is. Hence, searching for that "..." in your CVS file should help you to locate the problem in the CSV file.
Best, Stefan
----- On May 22, 2018, at 5:06 PM, Doherty, Adam adam.doherty@esso.ca wrote:
Hi all:
I am receiving an error "Failed to import table ... Leftover data ..."
I am using a simple BASH script and the mclient tool to import my data from hundreds of CSV files. The fields are pipe | separated, text values are surrounded with double-quotes, and rows end with LF (\n).
This is the command I am using to process the files -
for csvfile in `ls | grep -i 2018.*\.csv$`; do mclient -d mydatabase -s "COPY INTO table FROM '/path/to/csv/files/$csvfile' USING DELIMITERS '|', '\n', '\"'"; done
I have a .monetdb file setup in my home directory so as not to need the username and password.
Thanks kindly,
Adam Doherty _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Adam, in addition to what Panos suggests, also (incorrect) quoting might play a role, which is not necessarily detected by your simple "sanity check", as the following example reveals: ======== $ cat my.csv -------- 1|"xyz"|"q" 2|"a|b"|"z" 3|"qqq"|" "|"zzz"|"" ======== $ awk -F"|" 'NF != 3 {print NR,$0}' my.csv -------- 2 2|"a|b"|"z" 5 ======== $ mclient -s'create table t (a int, b string, c string);' operation successful $ echo '\d t' | mclient -i CREATE TABLE "sys"."t" ( "a" INTEGER, "b" CHARACTER LARGE OBJECT, "c" CHARACTER LARGE OBJECT ); ======== $ echo "copy into t from '/tmp/my.csv' delimiters '|','\n','\"'; SELECT * FROM sys.rejects;" | mclient -ei -------- Failed to import table 't', Leftover data '"zzz"|""' copy into t from '/tmp/my.csv' delimiters '|','\n','"'; SELECT * FROM sys.rejects; +-------+-------+--------------------------+----------+ | rowid | fldid | message | input | +=======+=======+==========================+==========+ | 2 | 3 | Leftover data '"zzz"|""' | 3|qqq| | : : : : : +-------+-------+--------------------------+----------+ 1 tuple ======== Moreover, MonetDB (unfortunately!) has a "peculiar", "non-consistent", "non-intuitive" (not to say erroneous) treatment of backslash "\" quotes/escapes (at least) when bulk-loading data (too complicated to explain here in detail). Hence, if your csv files contain backslashes ('\'), that might be another origin of the problem. If so, please feel free to share snippets that do contain backslashes ('\') and we can advise. Finally, MonetDB can (by design) only bulk-load from UTF-8 encoded csv files. Hence, please ensure that your csv files are indeed pure UTF-8 and do not contain any other "strange"/"alien" characters. Best, Stefan ----- On May 23, 2018, at 10:15 AM, Panagiotis Koutsourakis panagiotis.koutsourakis@monetdbsolutions.com wrote:
Hi Adam,
In my experience, the cause of this kind of problem is usually incorrect quoting in the CSV. One thing that could help you figure out what happens is the sys.rejects table, which records information about lines the CSV parser rejected
sql>COPY INTO example FROM '/tmp/lala.csv' USING DELIMITERS '|','\n','"'; Failed to import table 'example', Leftover data 'b'
sql>SELECT * FROM sys.rejects; +-------+-------+-------------------+-------+ | rowid | fldid | message | input | +=======+=======+===================+=======+ | 0 | 2 | Leftover data 'b' | 1|a | +-------+-------+-------------------+-------+
The CSV was: 1|a|b 2|c
and the table:
CREATE TABLE "sys"."example" ( "i" INTEGER, "s" CHARACTER LARGE OBJECT );
Please note that the contents of the sys.rejects table do not carry over to new client sessions.
Best regards, Panos.
On 22/05/18 19:43, Doherty, Adam wrote:
Hi Stefan:
Thanks for the quick reply. I ran the following on the first CSV file.
awk -F"|" 'NF != 18 {print NR,$0}' mycsvfile.csv
However none of the lines come back, and thus all lines are 18 fields.
When I run COPY INTO with BEST EFFORT, I am able import all but 5,428 records from that file. Unfortunately, this is event data for an analytics dashboard, I can't have any records missing.
Is there anything else I can dig into to find out why Monet is rejecting the records?
Thanks, Adam
-----Original Message----- From: users-list [mailto:users-list-bounces+adam.doherty=esso.ca@monetdb.org] On Behalf Of Stefan Manegold Sent: Tuesday, May 22, 2018 10:12 To: Communication channel for MonetDB users
Subject: Re: Error: Failed to import - Leftover Data Hi Adam,
the error suggest that (at least) one line in your CSV file contains more columns than the table you try to load the data into.
The "..." after "Leftover data" should tell you what that "excess content" of that line is. Hence, searching for that "..." in your CVS file should help you to locate the problem in the CSV file.
Best, Stefan
----- On May 22, 2018, at 5:06 PM, Doherty, Adam adam.doherty@esso.ca wrote:
Hi all:
I am receiving an error "Failed to import table ... Leftover data ..."
I am using a simple BASH script and the mclient tool to import my data from hundreds of CSV files. The fields are pipe | separated, text values are surrounded with double-quotes, and rows end with LF (\n).
This is the command I am using to process the files -
for csvfile in `ls | grep -i 2018.*\.csv$`; do mclient -d mydatabase -s "COPY INTO table FROM '/path/to/csv/files/$csvfile' USING DELIMITERS '|', '\n', '\"'"; done
I have a .monetdb file setup in my home directory so as not to need the username and password.
Thanks kindly,
Adam Doherty _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Hello Stefan and Panos:
Thanks very kindly for your detailed responses.
I was able to find the issue, a module name had a typo with a backslash as the last character.
As the module name field is double quoted, the backslash in the name was escaping the double quote.
Passing the errant CSV through tr to strip the backslashes fixed the import; all records imported without issue.
Thanks again,
Adam
-----Original Message-----
From: users-list [mailto:users-list-bounces+adam.doherty=esso.ca@monetdb.org] On Behalf Of Stefan Manegold
Sent: Wednesday, May 23, 2018 05:06
To: Communication channel for MonetDB users
Hi Adam,
In my experience, the cause of this kind of problem is usually incorrect quoting in the CSV. One thing that could help you figure out what happens is the sys.rejects table, which records information about lines the CSV parser rejected
sql>COPY INTO example FROM '/tmp/lala.csv' USING DELIMITERS '|','\n','"'; Failed to import table 'example', Leftover data 'b'
sql>SELECT * FROM sys.rejects; +-------+-------+-------------------+-------+ | rowid | fldid | message | input | +=======+=======+===================+=======+ | 0 | 2 | Leftover data 'b' | 1|a | +-------+-------+-------------------+-------+
The CSV was: 1|a|b 2|c
and the table:
CREATE TABLE "sys"."example" ( "i" INTEGER, "s" CHARACTER LARGE OBJECT );
Please note that the contents of the sys.rejects table do not carry over to new client sessions.
Best regards, Panos.
On 22/05/18 19:43, Doherty, Adam wrote:
Hi Stefan:
Thanks for the quick reply. I ran the following on the first CSV file.
awk -F"|" 'NF != 18 {print NR,$0}' mycsvfile.csv
However none of the lines come back, and thus all lines are 18 fields.
When I run COPY INTO with BEST EFFORT, I am able import all but 5,428 records from that file. Unfortunately, this is event data for an analytics dashboard, I can't have any records missing.
Is there anything else I can dig into to find out why Monet is rejecting the records?
Thanks, Adam
-----Original Message----- From: users-list [mailto:users-list-bounces+adam.doherty=esso.ca@monetdb.org] On Behalf Of Stefan Manegold Sent: Tuesday, May 22, 2018 10:12 To: Communication channel for MonetDB users
Subject: Re: Error: Failed to import - Leftover Data Hi Adam,
the error suggest that (at least) one line in your CSV file contains more columns than the table you try to load the data into.
The "..." after "Leftover data" should tell you what that "excess content" of that line is. Hence, searching for that "..." in your CVS file should help you to locate the problem in the CSV file.
Best, Stefan
----- On May 22, 2018, at 5:06 PM, Doherty, Adam adam.doherty@esso.ca wrote:
Hi all:
I am receiving an error "Failed to import table ... Leftover data ..."
I am using a simple BASH script and the mclient tool to import my data from hundreds of CSV files. The fields are pipe | separated, text values are surrounded with double-quotes, and rows end with LF (\n).
This is the command I am using to process the files -
for csvfile in `ls | grep -i 2018.*\.csv$`; do mclient -d mydatabase -s "COPY INTO table FROM '/path/to/csv/files/$csvfile' USING DELIMITERS '|', '\n', '\"'"; done
I have a .monetdb file setup in my home directory so as not to need the username and password.
Thanks kindly,
Adam Doherty _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Doherty, Adam
-
Panagiotis Koutsourakis
-
Stefan Manegold