[MonetDB-users] varchar type with copy into + misleading error message
Hi, using copy into, I run into a problem where a string does not fit into my varchar column. So the first point is that the error message is quite misleading, as it states 'field 4 not inserted, expecting type str'. Well, it is a string, but the length is too long... Using insert into, the error message will be 'SQLException:str_cast:value too long for type (var)char(100)' which is much more helpful. The second point is that my string is 99 characters long, and utf-8 with accented characters. So its length is less than 100 characters but more than 100 bytes. In postgresql, where my data come from, varchar(100) means 100 characters, whatever encoding the database happen to be in. Oracle gives the choise with varchar2(100 char) or varchar2(100 byte) and will default to NLS_LENGTH_SEMANTICS (which in turn is BYTE by default). I guess in monetdb, it means 100 bytes... Am I right stating that, and is it the expected behaviour or a bug ? Franck
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Hi Franck, Franck Routier wrote:
So the first point is that the error message is quite misleading, as it states 'field 4 not inserted, expecting type str'. Well, it is a string, but the length is too long... Using insert into, the error message will be 'SQLException:str_cast:value too long for type (var)char(100)' which is much more helpful.
I totally agree :)
The second point is that my string is 99 characters long, and utf-8 with accented characters. So its length is less than 100 characters but more than 100 bytes.
Since MonetDB internally doesn't care for the length, as in, it could be the length of your free space, your point is valid. It seems that the wrong strlen function is used, one that is not, or only utf8 compatible if the locale is set so, in my perspective varchar(100) should be 100 visible characters and not bytes. As developer reference the following contains a smart and fast implementation of an UTF8 strlen: http://www.daemonology.net/blog/2008-06-05-faster-utf8-strlen.html I hope it is picked up in the MonetDB codebase. Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEAREKAAYFAkpfBAkACgkQYH1+F2Rqwn37ygCgkcekozDxAs4RCQzx7C7XVyaB 0mwAn0MyOC9tHi1lLuqy/td63v8o0GaX =ite2 -----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 Stefan de Konink wrote:
Since MonetDB internally doesn't care for the length, as in, it could be the length of your free space, your point is valid. It seems that the wrong strlen function is used, one that is not, or only utf8 compatible if the locale is set so, in my perspective varchar(100) should be 100 visible characters and not bytes.
https://sourceforge.net/tracker/?func=detail&aid=2822855&group_id=56967&atid=482468 I have just added a bug + patch (which on my system seems to give very reasonable output) most likely the MonetDB development team can see if the patch makes enough sense to be included. (And if I didn't miss any corner cases). Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.12 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEAREKAAYFAkpf5QkACgkQYH1+F2Rqwn0k+wCcDT+u8f0f35N+vxxiSVS/rQ1Y Um0An0n9kqF/R7k2nrM3eSkFymt4d2V2 =wy7n -----END PGP SIGNATURE-----
participants (2)
-
Franck Routier
-
Stefan de Konink