On 2010-11-17 20:20, paulr135 wrote:
I have a tab-delimited data file that is mostly ascii, but has a few hex characters in it. For example: ... 3263640152 ABC<99> ...
where <99> is a hex value for (tm) - trademark.
When I try and load this data file into monetdb, I get the following error: [root@bidbtest ~]# /usr/local/MonetDB/bin/mclient -lsql --database=myschema -t -s "COPY 20603 records into test_table from '/tmp/testdata.txt' using delimiters '\t','\n','\"' null as '';" user(root):monetdb password: MAPI = (monetdb) /usr/local/MonetDB/var/MonetDB5/dbfarm/mapi_socket QUERY = COPY 20603 records into test_table from '/tmp/testdata.txt' using delimiters '\t','\n','"' null as ''; ERROR = !SQLException:importTable:value 'failed to import table Timer 790.415 msec
The table def is: CREATE TABLE creative_dim ( id bigint NOT NULL, headline varchar(100) DEFAULT NULL )
How can I go about loading this file? Ideally, I would like to get these characters loaded as is.
I saw that mclient has a -E flag that accepts an encoding type. However, I'm not sure what to set it to for a file with ascii / hex data.
A complicating factor is that I don't know all the hex characters that are in my file. I luckily spotted the <99>/tm one above as it was one of the first lines in the file. But after replacing all instances of that character with the empty string, I still get a similar error to what I pasted above. I manually paged through the file and found different hex characters. It isn't practical for me to page through an entire file and monetdb doesn't tell me which line number the problem is on.
By the way, I am using MonetDB v5.20.5.
Thanks for the help!
The MonetDB server only accepts text in the UTF-8 encoding. Mclient can translate from encodings to UTF-8 using the -E option. However, when you use the COPY INTO command with an actual file name, the file is read directly by the server and not by mclient. This means that such a file must be encoded using UTF-8. Your file seems to be encoded in one of the Windows formats such as cp1252. If you want to load this file using a COPY INTO command, you will need to convert the file to UTF-8 using e.g. the iconv program: iconv -f cp1252 -t utf-8 < /tmp/testdata.txt > /tmp/testdata-utf8.txt and then use that instead. Another thing you can try is to do this (all on one line): mclient -lsql --database=myschema -t -i -s "COPY 20603 records into test_table from stdin using delimiters '\t','\n','\"' null as ''" < /tmp/testdata.txt I added a -i option (that's important) to your command, and I removed the semicolon (I'm not sure that's needed, but it certainly won't harm--mclient adds one) and then mclient can read the file and convert it for you. I hope this helps. -- Sjoerd Mullender