Sjoerd Mullender-2 wrote:
mclient -lsql --database=myschema -t -i -s "COPY 20603 records into test_table from stdin using delimiters '\t','\n','\"' null as ''" < /tmp/testdata.txt
Did you mean to add the -E flag to the command above? I get errors if I don't put it in (mclient spits out a few rows of data and tells me it "failed to import table". I tried it with that flag and it partially worked: [root@bidbtest ~]# mclient -E cp1252 -lsql --database=myschema -t -i -s "COPY 20603 records into test_table from stdin using delimiters '\t','\n','\"' null as ''" < /tmp/testdata.txt user:monetdb password: [ 20547 ] Timer 5824.428 msec The obvious issue is that I am missing 56 records from my file - each of those records I'm missing have hex data in them. This didn't work for all of my files, however. For some larger files, I get [root@bidbtest import]# mclient -E cp1252 -lsql --database=myschema -t -i -s "COPY 2642786 records into a_click from stdin using delimiters '\t','\n','\"' null as '\N'" < /tmp/a_click.tbl user:monetdb password: Timer 7.856 msec MAPI = (monetdb) /usr/local/MonetDB/var/MonetDB5/dbfarm/mapi_socket ...(some data rows are spit out)... ERROR = !SQLException:importTable:value 'http://neuilly\\frintranetguest:inxo-03del@frintranet.fr.deloitte.co...' from line 13494 field 6 not inserted, expecting type str !failed to import table !current transaction is aborted (please ROLLBACK) Timer 4636.574 msec MAPI = (monetdb) /usr/local/MonetDB/var/MonetDB5/dbfarm/mapi_socket ...(hundreds of data rows are spit out)... I checked the file on line 13494 and field 6 is of type string (3rd line, below). ... 2 127228269 0D4B69FA512A11DFAE5196C7A77DC2E3 0DC2EC5AA6CB 2062958 http://www.islamweb.net/services/currency/default.htm \N 2 127228667 2276781A473E11DF9CCA79E3CD83EF10 0DBF68E35593 2062957 http://www.sayaraat.com/change.htm \N 2 127228546 C4B18426512F11DF85C472B3CD83EF10 0DC2F5E9CBB3 2062956 http://neuilly\\frintranetguest:inxo-03del@frintranet.fr.deloitte.co... \N ... I originally thought that there was a problem with the "\\" in the file, so I used sed to delete all lines with "\\". However, I get a similar error message as above. Sjoerd Mullender-2 wrote:
Your file seems to be encoded in one of the Windows formats such as cp1252. If you want to load this file using a COPY INTO command, you will need to convert the file to UTF-8 using e.g. the iconv program: iconv -f cp1252 -t utf-8 < /tmp/testdata.txt > /tmp/testdata-utf8.txt and then use that instead.
You are correct - I dug around and found out that the source for this data was a mysql table which has its charset set to latin1 - which is the same as cp1252 (http://dev.mysql.com/doc/refman/5.0/en/charset-we-sets.html) I also tried the iconv method with little success. [root@bidbtest import]# iconv -f cp1252 -t utf-8//IGNORE -c < a_click.tbl > a_click.tbl_cp1252 [root@bidbtest import]# /usr/local/MonetDB/bin/mclient -lsql --database=myschema -t -s "COPY 2642786 records into a_click from '/tmp/a_click.tbl_cp1252' using delimiters '\t','\n','\"' null as '\N';" user(root):monetdb password: MAPI = (monetdb) /usr/local/MonetDB/var/MonetDB5/dbfarm/mapi_socket QUERY = COPY 2642786 records into a_click from '/tmp/a_click.tbl_cp1252' using delimiters '\t','\n','"' null as '\N'; ERROR = !SQLException:importTable:missing separator ' ' line 159725 field 5 !failed to import table Timer 368.344 msec Again, line 159725 didn't seem like it had any mistakes in the data (3rd line, below) ... 2 127150530 EC33DA849AB02E4F4F7D5A72EAA680BD 0DB70FA4FF8A 1916744 http://translate.googleusercontent.com/translate_c?hl=pt-BR&sl=en&u... \N 2 127148183 7A4C09B627E021CCCFA0ADD15E7FAA43 0DB6B3FF70C3 1916743 http://www.myfootprint.org/en/visitor_information/ \N 2 127150355 C596B7337BD4A8BF1690269D1584CA87 0DB708CF4917 1916742 http://www.portugal-linha.pt/cambios/ \N ... -- View this message in context: http://old.nabble.com/Copy-into-issues-with-mixed-ascii---hex-data-tp3023894... Sent from the monetdb-users mailing list archive at Nabble.com.