Hello Karanbir,
This sounds like a BOM (Byte Order Mark,
http://unicode.org/faq/utf_bom.html#BOM) is not dealt with correctly.
If you try:
xxd /home/kbsingh/data/data/1000.utf8 | head
does it start with 'EF BB BF'?
A little experiment (on the head) reveals a bug in mclient (it does
not handle correctly the optional BOM at the beginning of the input):
$ cat selectWithBOM.py
print "\xEF\xBB\xBFSELECT 1;"
$ python selectWithBOM.py > queryWithBOM.sql
$ xxd queryWithBOM.sql
0000000: efbb bf53 454c 4543 5420 313b 0a ...SELECT 1;.
$ cat queryWithBOM.sql
SELECT 1;
$ echo "SELECT 1;" | mclient -lsql
% . # table_name
% single_value # name
% tinyint # type
% 1 # length
[ 1 ]
$ cat queryWithBOM.sql | mclient -lsql
(Hangs)
I guess a bug should be filed.
If your data starts with the BOM, a workaround would be to strip the
first three bytes of your data (as the BOM is not very meaningful when
using UTF-8).
Greetings,
Wouter
2009/4/30 Karanbir Singh
Hi,
This is on CentOS-5/x86_64 running :
[kbsingh@koala data]$ rpm -qa | grep -i Monet MonetDB-1.22.0-1.el5.kb.oid64 MonetDB5-server-5.4.0-1.el5.kb.oid64 MonetDB-SQL-2.22.0-1.el5.kb.oid64 MonetDB-client-1.22.0-1.el5.kb.oid64 MonetDB-SQL-server5-2.22.0-1.el5.kb.oid64
server started with : [root@koala ~]# mserver5 --dbinit="include sql;"
on the client :
[kbsingh@koala data]$ echo "\d sample" | mclient -lsql -ukb -Pkb CREATE TABLE "kb"."sample" ( "id" char(33), "sdate" char(20), "key" varchar(255), "country" char(2), "region" char(2), "city" varchar(50), "pid" int );
[kbsingh@koala data]$ echo "select 'hello';" | mclient -lsql -ukb -Pkb % . # table_name % single_value # name % char # type % 5 # length [ "hello" ]
[kbsingh@koala data]$ echo "copy 1000 records into sample from '/home/kbsingh/data/data/1000.utf8' using delimiters '\t','\n','';" | mclient -lsql -ukb -Pkb [ 1000 ]
also:
[kbsingh@koala data]$ mclient -lsql -ukb -Pkb -s "copy 1000 records into sample from '/home/kbsingh/data/data/1000.utf8' using delimiters '\t','\n','';" [ 1000 ]
so, works so far. However:
[kbsingh@koala data]$ cat 1000.utf8 | mclient -lsql -ukb -Pkb -s "copy 1000 records into sample from STDIN using delimiters '\t','\n','';" MAPI = kb@localhost:50000 QUERY = copy 1000 records into sample from STDIN using delimiters '\t','\n',''; ERROR = !SQLException:sql:missing sep line 0 field 0 !SQLException:importTable:failed to import table
given that mclient is on the right side of the pipe, the data is surely being made available on the stdin, but why does mclient fail like this ?
also, adding an OFFSET makes mclient just die quietly like this:
[kbsingh@koala data]$ cat 1000.utf8 | mclient -lsql -ukb -Pkb -s "copy 5 offset 1000 records into sample from STDIN using delimiters '\t','\n','';" [kbsingh@koala data]$
so, what am I getting so wrong here ? the load chain I need to setup will look like this : zcat andy1.gz | iconv -f latin1 -t utf8 | mclient ......
that compressed file is 62 GB, the changes of me being able to expand that run the load from a specific filename is zero.
-- Karanbir Singh : http://www.karan.org/ : 2522219@icq
------------------------------------------------------------------------------ Register Now & Save for Velocity, the Web Performance & Operations Conference from O'Reilly Media. Velocity features a full day of expert-led, hands-on workshops and two days of sessions from industry leaders in dedicated Performance & Operations tracks. Use code vel09scf and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users