Re: [MonetDB-users] MonetDB Permission/COPY INTO Questions
Sean,
Maybe here is why you did not spot the duplicate:
cat /tmp/testdata2.csv | cut -d, -f1,5 | sort | uniq -d
Regards
Allen
On 11/12/11 12:39 PM, "Sean McNamara"
Looks like I had too much faith in the "uniq" command (or misunderstand its constraints.) There are definitely duplicates in my import set. :-(
Back to data scrubbing...
On Sat, Nov 12, 2011 at 12:08 PM, Allen Zhang
wrote: Sean,
Use sql select to find out duplicates.
Regards Allen
On 11/12/11 12:01 PM, "Sean McNamara"
wrote: I tried that (as both the normal dbuser and as admin). Both fail the same way but don't give an indicator of where it feels the constraint is violated:
$ mclient -u monetdb -s "alter table mduser.testdata add constraint testdata_pkey primary key (foo,epoch);" testdb password: SQLException:assert:UPDATE: PRIMARY KEY constraint 'testdata.testdata_pkey' violated
On Sat, Nov 12, 2011 at 11:42 AM, Allen Zhang
wrote: How about add constraint after you import? I think it might tell you which one had problem.
Regards Allen
On 11/12/11 11:19 AM, "Sean McNamara"
wrote: Brilliant suggestion, and I think it exposed the problem.
$ mclient -u monetdb -s "copy into mduser.testdata from '/tmp/testdata2.csv' using delimiters ',','\n','\'';" testdb password: SQLException:importTable:value 'ONE_SECOND' from line 12734 field 3 not inserted, expecting type varchar(8) failed to import table
It looks like the field I created is too small to handle some of the values in the file. After expanding all the varchar fields to 32 and re-running, I get a different error:
$ mclient -u monetdb -s "copy into mduser.testdata from '/tmp/testdata2.csv' using delimiters ',','\n','\'';" testdb password: SQLException:assert:INSERT INTO: PRIMARY KEY constraint 'testdata.testdata_pkey' violated
That makes me think I have duplicate data. The PKEY is comprised of fields #1 and #5, so I check for that but it looks fine:
$ cat /tmp/testdata2.csv | cut -d, -f1,5 |uniq -d $
However if I drop and re-add the table without the constraint, it works:
sql>create table testdata ( more> foo varchar(32), more> bar varchar(32), more> baz varchar(32), more> woz varchar(32), more> epoch bigint more>); operation successful (1.477ms) sql>\q $ mclient -u monetdb -s "copy into mduser.testdata from '/tmp/testdata2.csv' using delimiters ',','\n','\'';" testdb password: 2519931 affected rows
Thanks Allen! This gets me able to import. Now I'm wondering what might be wrong with the constraint. Any insight there?
On Sat, Nov 12, 2011 at 10:58 AM, Allen Zhang
wrote: What about just issue the following:
mclient -s "copy into testdata from '/tmp/testdata2.csv' using delimiters ',','\n','\'';"
Regards Allen
On 11/12/11 10:48 AM, "Sean McNamara"
wrote: >Thanks Allen. Red-herring there. My first file had "SPY" in the >first column, and the second had "testdata1", so the difference >there >is just because of the contents. If I convert the original SPY file >and re-run, I get the same behavior. > >$ mclient -s "copy into testdata from STDIN using delimiters >',','\n','\'';" testdb - < /tmp/testdata2.csv >syntax error, unexpected STRING in: "'SPY'" > > >On Sat, Nov 12, 2011 at 10:43 AM, Allen Zhang
>wrote: >> Hi Sean, >> >> You should inspect your data quality, it seems in the first 3 >>imports, >>it >> stopped at "spy", I suspect it might be some \t in that supposed >>to >>be >> string, and last import you put single-quote around string, then >>it >> stopped at "testvalue1", search your import file, look for that >> "testvalue1", the problem should around there, maybe an extra >>field >> because wrong replace by sed. >> >> Best Regards >> Allen >> >> On 11/12/11 10:26 AM, "Sean McNamara" wrote: >> >>>Thanks Viktor, but no dice. I also tried converting from CSV to >>>TAB-delim to match what you were doing, but it's still not >>>working... >>> >>>$ mclient -s "copy into testdata from STDIN using delimiters >>>',','\n','';" testdb - < /tmp/testdata.csv >>>syntax error, unexpected IDENT in: "spy" >>> >>>$ sed 's/,/\'$'\t''/g' /tmp/testdata.csv > /tmp/testdata.dat >>> >>>$ mclient -s "copy into testdata from STDIN;" testdb - < >>>/tmp/testdata.dat >>>syntax error, unexpected IDENT in: "spy" >>> >>>$ mclient -s "copy into testdata from STDIN using delimiters >>>'\t','\n','';" testdb - < /tmp/testdata.dat >>>syntax error, unexpected IDENT in: "spy" >>> >>>I'm understanding the format of the documentation quite a bit >>>better >>>now after spending a few days with it, but I'm still not clear on >>>why >>>this insert is failing. >>>I went one step further and converted the input file to >>>single-quote >>>the string fields, but this fails as well: >>> >>>$ mclient -s "copy into testdata from STDIN using delimiters >>>',','\n','\'';" testdb - < /tmp/testdata2.csv >>>syntax error, unexpected STRING in: "'testvalue1'" >>> >>> >>>The problem has to be something trivial, but I have no idea what >>>it >>>is... >>> >>>On Sat, Nov 12, 2011 at 5:30 AM, Viktor Rosenfeld >>> wrote: >>>> Hi Sean, >>>> >>>> Sean McNamara wrote: >>>> >>>>> $ mclient -s "copy 2519931 records into testdata from STDIN >>>>>using >>>>> delimiters ',';" testdb - < /tmp/testdata.csv >>>>> syntax error, unexpected IDENT in: "testvalue1" >>>>> >>>>> where testvalue1 is one of the actual values in the FOO column >>>>> (actually the only value in this particular file.) >>>>> >>>>> I'm wondering if this is telling me it's expecting the field to >>>>>be >>>>> quoted, but that seems unlikely as it would make working with >>>>>CSVs >>>>> quite a bit painful. >>>>> >>>>> Can you guys kindly help me take another step forward. I >>>>>really >>>>> appreciate the guidance. >>>> >>>> I'm using >>>> >>>> COPY ... USING DELIMITERS '\t','\n',''; >>>> >>>> to bulk-load tab-delimited unquoted data. The '\t' is the >>>>default >>>>field >>>> separator, the '\n' is the default record separator. The final >>>>'' >>>>is >>>>the >>>> quote string -- empty in my case. >>>> >>>> See: >>>>http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInt >>>>o >>>> >>>> Cheers, >>>> Viktor >>>> >>> >>>------------------------------------------------------------------ >>>-- >>>-- >>>-- >>>-- >>>---- >>>RSA(R) Conference 2012 >>>Save $700 by Nov 18 >>>Register now >>>http://p.sf.net/sfu/rsa-sfdev2dev1 >>>_______________________________________________ >>>MonetDB-users mailing list >>>MonetDB-users@lists.sourceforge.net >>>https://lists.sourceforge.net/lists/listinfo/monetdb-users >> >> >>
participants (1)
-
Allen Zhang