Re: [MonetDB-users] copy from stdin oddness

6 May 2009

      Hello Karanbir,

Well, that can't be the cause then. :) I will file the bug-report
regarding the BOM.

When I try to write the command exactly as you did (with both '-s' and
a pipe, i get)

$ echo 'CREATE TABLE aap (a int);'  | mclient -lsql -dtest
$ cat data.dat
1
2
3
4
5
$ N=4; head -n $N data.dat | mclient -lsql -dtest -s "copy $N records
into aap from STDIN;"
MAPI  = monetdb@localhost:50000
QUERY = copy 4 records into aap from STDIN;
ERROR = !SQLException:sql:value ';' while parsing ';' from line 0
field 0 not inserted, expecting type int
        !SQLException:importTable:failed to import table

It seems that mclient is confused about what to read first (statement
or stdin) and perhaps it is a bug? I think an mclient guru might be
able to answer this?

I also tried (and this seems to work fine):

$ (N=4; echo "copy $N records into aap from STDIN;"; head -n $N
data.dat) | mclient -lsql -dtest
[ 4     ]
$ mclient -lsql -dtest -s "select * from aap;"
% sys.aap # table_name
% a # name
% int # type
% 1 # length
[ 1     ]
[ 2     ]
[ 3     ]
[ 4     ]

So this could be another workaround? Have you tried this already?

Wouter

p.s. Otherwise, I guess another workaround would be to create a
(temporary) pipe on your filesystem (but i'm not sure whether that
works):
$ mkfifo /tmp/workaroundpipe
$ (cat mydata > /tmp/workaroundpipe) &
$ mclient -lsql -s "copy x records into x from '/tmp/workaroundpipe';"

2009/5/5 Karanbir Singh :
...
Hi Wouter,
Wouter Alink wrote:
...
Hello Karanbir,
This sounds like a BOM (Byte Order Mark,
http://unicode.org/faq/utf_bom.html#BOM) is not dealt with correctly.
Thats interesting, and not something I'd considered at all. However :
...
If you try:
xxd /home/kbsingh/data/data/1000.utf8 | head
does it start with 'EF BB BF'?
[kbsingh@koala ~]$ xxd /home/kbsingh/data/data/1000.utf8 | head
0000000: 3664 6266 6339 6431 6635 3464 3137 3366  6dbfc9d1f54d173f
0000010: 6130 3962 6664 6131 3965 3566 6335 3062  a09bfda19e5fc50b
So that does not seem to be the issue in this case.
...
A little experiment (on the head) reveals a bug in mclient (it does
not handle correctly the optional BOM at the beginning of the input):
$ cat selectWithBOM.py
print "\xEF\xBB\xBFSELECT 1;"
$ python selectWithBOM.py > queryWithBOM.sql
$ xxd queryWithBOM.sql
0000000: efbb bf53 454c 4543 5420 313b 0a         ...SELECT 1;.
$ cat queryWithBOM.sql
SELECT 1;
$ echo "SELECT 1;" | mclient -lsql
% . # table_name
% single_value # name
% tinyint # type
% 1 # length
[ 1     ]
$ cat queryWithBOM.sql | mclient -lsql
(Hangs)
I guess a bug should be filed.
Good call, should I go ahead and do that using your test case here ? or
would you like to file the bugreport yourself ? The only reason I am
hesitant to do this is that while there seems to be this issue, its not
an issue that my data suffers from here.
...
If your data starts with the BOM, a workaround would be to strip the
first three bytes of your data (as the BOM is not very meaningful when
using UTF-8).
I dont think that its the case here, so what are the workaround options
available ? Essentially : I need to load about 600 to 700 G worth of
data thats going to be delivered to me in a .gz file, expanding that to
raw text is not something I'd like to consider unless thats was the
_only_ way to get data loaded here.
--
Karanbir Singh : http://www.karan.org/  : 2522219@icq
------------------------------------------------------------------------------
The NEW KODAK i700 Series Scanners deliver under ANY circumstances! Your
production scanning environment may not be a perfect world - but thanks to
Kodak, there's a perfect scanner to get the job done! With the NEW KODAK i700
Series Scanner you'll get full speed at 300 dpi even with all image
processing features enabled. http://p.sf.net/sfu/kodak-com
_______________________________________________
MonetDB-users mailing list
MonetDB-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/monetdb-users