On Fri, Dec 6, 2013 at 5:50 AM, Sjoerd Mullender <sjoerd@acm.org> wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 2013-12-06 10:38, Lee Hachadoorian wrote:
> I need to bulk load some files with characters like ñ and é. I
> would do this using the LATIN1 encoding. I read that bulk loading
> from file assumes UTF8, but bulk loading from STDIN will use
> encoding of mclient. I would like to know:
>
> (a) How do I determine and change the client encoding? (b) What is
> an example for piping STDIN to mclient?
>
> Regarding the latter, I tried:
>
> cat /my/file.csv | mclient -d mydb -u monetdb -s "COPY INTO mytable
>  FROM (STDIN) USING DELIMITERS ',', '\n' , '"' NULL AS '';"
>
> I wasn't sure how the embedded " would be handled, but the
> terminal basically became unresponsive until I ctl-C'ed out of it.
> Note the the file was similar in size to several others (without
> non-UTF8 characters) that loaded in seconds within mclient.
>
> Best, --Lee
>

See [1] for recipes for bulk loading, including the relevant
invocation of mclient.
The default character set that mclient uses depends on the
environment.  You can specify a specific character set using the -E
option.  See the manual at [2].

Quoting commands is an art.  Look at the command in [1] how you can do
it correctly.  Hint: when quoting using double quotes ("), you need to
use a backslash (\) to quote embedded backslashes and double quotes.

[1]
http://www.monetdb.org/Documentation/Cookbooks/SQLrecipes/LoadingBulkData
[2] http://www.monetdb.org/Documentation/mclient-man-page

- --
Sjoerd Mullender


Thanks Sjoerd. OK, now the file is getting to mclient, and I am at least at the point where I am getting the same error message as when I COPY ... FROM 'file' within mclient.

$ mclient -E LATIN9 -d universe -u monetdb -s "COPY INTO acs2010_5yr.geoheader FROM STDIN USING DELIMITERS ',', '\\n', '\"'" - < /data/lee/postgres_upload/acs2010_5yr/All_Geographies_Not_Tracts_Block_Groups/g20105az.csv
password:
value from line 10560 field 50 not inserted, expecting type varchar(200)
failed to import table
current transaction is aborted (please ROLLBACK)
syntax error, unexpected sqlINT in: "04"

I tried it both with the terminal's character encoding set to UTF8 and set to LATIN9. Does that make a difference? When I sed the problematic line with the terminal in LATIN9, it is human-readable, while in UTF8 it is not.

At this point I could just use iconv on the file, but I'm would like to know how to get this working.

Best,
--Lee