[MonetDB-users] COPY INTO: wrong length with backslashed chars?
Hi all, I'm using MonetDB for the analysis for ~90GB of data which come from a MySQL-based web-application. The data is dumped into CSV-files on a regular basis and then imported into MonetDB with the COPY command. I have experienced some strange behaviour: obviously, MonetDB counts the number of chars for a VARCHAR-field before removing backslashes which are used e.g. for escaping newlines or other backslashes and rejects strings that should actually fit. A simple reproduction is this: 1. In mclient: CREATE TABLE t (x VARCHAR(1)); 2. Create a textfile foo.txt containing two backslashes: \\ 3. In mclient: sql>copy into t from '/tmp/foo.txt'; SQLException:importTable:value '\\' from line 1 field 1 not inserted, expecting type varchar(1) failed to import table Performing exactly the same steps with a VARCHAR(2) column works without errors and results in t.x containg the de-escaped single backslash. I'm using the Oct-2010 release. My current workaround is to enlarge VARCHARs which caused problems in the past, but this is of course no guarantee that it won't happen again. Although I found a few monetdb-bugs entries concerning the COPY command (e.g. about the misleading error message seen above), none seems to deal with my problem. Is this a bug? Best regards, Daniel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Op 05-12-10 19:08, Daniel Boesswetter schreef:
Hi all,
I'm using MonetDB for the analysis for ~90GB of data which come from a MySQL-based web-application. The data is dumped into CSV-files on a regular basis and then imported into MonetDB with the COPY command. I have experienced some strange behaviour: obviously, MonetDB counts the number of chars for a VARCHAR-field before removing backslashes which are used e.g. for escaping newlines or other backslashes and rejects strings that should actually fit.
A simple reproduction is this:
1. In mclient: CREATE TABLE t (x VARCHAR(1)); 2. Create a textfile foo.txt containing two backslashes: \\ 3. In mclient: sql>copy into t from '/tmp/foo.txt'; SQLException:importTable:value '\\' from line 1 field 1 not inserted, expecting type varchar(1) failed to import table
Performing exactly the same steps with a VARCHAR(2) column works without errors and results in t.x containg the de-escaped single backslash.
I'm using the Oct-2010 release. My current workaround is to enlarge VARCHARs which caused problems in the past, but this is of course no guarantee that it won't happen again.
Although I found a few monetdb-bugs entries concerning the COPY command (e.g. about the misleading error message seen above), none seems to deal with my problem.
Is this a bug?
What are you escaping? Paste the line that fails ;) Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEAREKAAYFAkz71sgACgkQYH1+F2Rqwn3AmgCfayhLg6zjXZgvqhV9OHJbj5Kh z64An2xZJy7H1wv3M6hr/SHQaK2C8IM3 =//V/ -----END PGP SIGNATURE-----
Hi Stefan,
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Op 05-12-10 19:08, Daniel Boesswetter schreef:
Hi all,
I'm using MonetDB for the analysis for ~90GB of data which come from a MySQL-based web-application. The data is dumped into CSV-files on a regular basis and then imported into MonetDB with the COPY command. I have experienced some strange behaviour: obviously, MonetDB counts the number of chars for a VARCHAR-field before removing backslashes which are used e.g. for escaping newlines or other backslashes and rejects strings that should actually fit.
A simple reproduction is this:
1. In mclient: CREATE TABLE t (x VARCHAR(1)); 2. Create a textfile foo.txt containing two backslashes: \\ 3. In mclient: sql>copy into t from '/tmp/foo.txt'; SQLException:importTable:value '\\' from line 1 field 1 not inserted, expecting type varchar(1) failed to import table
Performing exactly the same steps with a VARCHAR(2) column works without errors and results in t.x containg the de-escaped single backslash.
I'm using the Oct-2010 release. My current workaround is to enlarge VARCHARs which caused problems in the past, but this is of course no guarantee that it won't happen again.
Although I found a few monetdb-bugs entries concerning the COPY command (e.g. about the misleading error message seen above), none seems to deal with my problem.
Is this a bug?
What are you escaping? Paste the line that fails ;)
As I said: try to import a single "\" into a VARCHAR(1) with COPY. Writing "\" does not work because it escapes the newline. Writing "\\" does not work because it is longer than 1 char. Regards, Daniel
Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEAREKAAYFAkz71sgACgkQYH1+F2Rqwn3AmgCfayhLg6zjXZgvqhV9OHJbj5Kh z64An2xZJy7H1wv3M6hr/SHQaK2C8IM3 =//V/ -----END PGP SIGNATURE-----
Hi Daniel, thanks for reporting and sorry for the inconveniences! Yes, this appears to be a bug; we/MonetDB cannot count correctly. Please file a bug report via bugs.monetdb.org Thanks, Stefan (M. ;-)) On Fri, Dec 10, 2010 at 11:11:34AM +0100, Daniel Boesswetter wrote:
Hi Stefan,
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Op 05-12-10 19:08, Daniel Boesswetter schreef:
Hi all,
I'm using MonetDB for the analysis for ~90GB of data which come from a MySQL-based web-application. The data is dumped into CSV-files on a regular basis and then imported into MonetDB with the COPY command. I have experienced some strange behaviour: obviously, MonetDB counts the number of chars for a VARCHAR-field before removing backslashes which are used e.g. for escaping newlines or other backslashes and rejects strings that should actually fit.
A simple reproduction is this:
1. In mclient: CREATE TABLE t (x VARCHAR(1)); 2. Create a textfile foo.txt containing two backslashes: \\ 3. In mclient: sql>copy into t from '/tmp/foo.txt'; SQLException:importTable:value '\\' from line 1 field 1 not inserted, expecting type varchar(1) failed to import table
Performing exactly the same steps with a VARCHAR(2) column works without errors and results in t.x containg the de-escaped single backslash.
I'm using the Oct-2010 release. My current workaround is to enlarge VARCHARs which caused problems in the past, but this is of course no guarantee that it won't happen again.
Although I found a few monetdb-bugs entries concerning the COPY command (e.g. about the misleading error message seen above), none seems to deal with my problem.
Is this a bug?
What are you escaping? Paste the line that fails ;)
As I said: try to import a single "\" into a VARCHAR(1) with COPY. Writing "\" does not work because it escapes the newline. Writing "\\" does not work because it is longer than 1 char.
Regards, Daniel
Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEAREKAAYFAkz71sgACgkQYH1+F2Rqwn3AmgCfayhLg6zjXZgvqhV9OHJbj5Kh z64An2xZJy7H1wv3M6hr/SHQaK2C8IM3 =//V/ -----END PGP SIGNATURE-----
------------------------------------------------------------------------------ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
Thank you, Stefan! The bug is here: http://bugs.monetdb.org/show_bug.cgi?id=2748 Regards, Daniel Stefan Manegold wrote:
Hi Daniel,
thanks for reporting and sorry for the inconveniences!
Yes, this appears to be a bug; we/MonetDB cannot count correctly. Please file a bug report via bugs.monetdb.org
Thanks, Stefan (M. ;-))
On Fri, Dec 10, 2010 at 11:11:34AM +0100, Daniel Boesswetter wrote:
Hi Stefan,
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512
Op 05-12-10 19:08, Daniel Boesswetter schreef:
Hi all,
I'm using MonetDB for the analysis for ~90GB of data which come from a MySQL-based web-application. The data is dumped into CSV-files on a regular basis and then imported into MonetDB with the COPY command. I have experienced some strange behaviour: obviously, MonetDB counts the number of chars for a VARCHAR-field before removing backslashes which are used e.g. for escaping newlines or other backslashes and rejects strings that should actually fit.
A simple reproduction is this:
1. In mclient: CREATE TABLE t (x VARCHAR(1)); 2. Create a textfile foo.txt containing two backslashes: \\ 3. In mclient: sql>copy into t from '/tmp/foo.txt'; SQLException:importTable:value '\\' from line 1 field 1 not inserted, expecting type varchar(1) failed to import table
Performing exactly the same steps with a VARCHAR(2) column works without errors and results in t.x containg the de-escaped single backslash.
I'm using the Oct-2010 release. My current workaround is to enlarge VARCHARs which caused problems in the past, but this is of course no guarantee that it won't happen again.
Although I found a few monetdb-bugs entries concerning the COPY command (e.g. about the misleading error message seen above), none seems to deal with my problem.
Is this a bug?
What are you escaping? Paste the line that fails ;)
As I said: try to import a single "\" into a VARCHAR(1) with COPY. Writing "\" does not work because it escapes the newline. Writing "\\" does not work because it is longer than 1 char.
Regards, Daniel
Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.16 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEAREKAAYFAkz71sgACgkQYH1+F2Rqwn3AmgCfayhLg6zjXZgvqhV9OHJbj5Kh z64An2xZJy7H1wv3M6hr/SHQaK2C8IM3 =//V/ -----END PGP SIGNATURE-----
------------------------------------------------------------------------------ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (3)
-
Daniel Boesswetter
-
Stefan de Konink
-
Stefan Manegold