On Fri, Dec 6, 2013 at 5:50 AM, Sjoerd Mullender
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 2013-12-06 10:38, Lee Hachadoorian wrote:
I need to bulk load some files with characters like ñ and é. I would do this using the LATIN1 encoding. I read that bulk loading from file assumes UTF8, but bulk loading from STDIN will use encoding of mclient. I would like to know:
(a) How do I determine and change the client encoding? (b) What is an example for piping STDIN to mclient?
Regarding the latter, I tried:
cat /my/file.csv | mclient -d mydb -u monetdb -s "COPY INTO mytable FROM (STDIN) USING DELIMITERS ',', '\n' , '"' NULL AS '';"
I wasn't sure how the embedded " would be handled, but the terminal basically became unresponsive until I ctl-C'ed out of it. Note the the file was similar in size to several others (without non-UTF8 characters) that loaded in seconds within mclient.
Best, --Lee
See [1] for recipes for bulk loading, including the relevant invocation of mclient. The default character set that mclient uses depends on the environment. You can specify a specific character set using the -E option. See the manual at [2].
Quoting commands is an art. Look at the command in [1] how you can do it correctly. Hint: when quoting using double quotes ("), you need to use a backslash (\) to quote embedded backslashes and double quotes.
[1] http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData [2] http://www.monetdb.org/Documentation/mclient-man-page
- -- Sjoerd Mullender
Thanks Sjoerd. OK, now the file is getting to mclient, and I am at least at the point where I am getting the same error message as when I COPY ... FROM 'file' within mclient. $ mclient -E LATIN9 -d universe -u monetdb -s "COPY INTO acs2010_5yr.geoheader FROM STDIN USING DELIMITERS ',', '\\n', '\"'" - < /data/lee/postgres_upload/acs2010_5yr/All_Geographies_Not_Tracts_Block_Groups/g20105az.csv password: value from line 10560 field 50 not inserted, expecting type varchar(200) failed to import table current transaction is aborted (please ROLLBACK) syntax error, unexpected sqlINT in: "04" I tried it both with the terminal's character encoding set to UTF8 and set to LATIN9. Does that make a difference? When I sed the problematic line with the terminal in LATIN9, it is human-readable, while in UTF8 it is not. At this point I could just use iconv on the file, but I'm would like to know how to get this working. Best, --Lee