[MonetDB-users] COPY with empty fields
COPY 1 OFFSET 100 RECORDS INTO tlbc FROM '/mnt/data/myfile.txt' DELIMITERS '\t', '\r\n' Trying to use COPY to import tab delimited text files. The text file has a header row and I've verified the delimiters and eols. I have a number of empty columns at the end of each row in the text file and those colums in the table are a combination of integers and decimals. Received the following error at the first of the empty columns: !SQLException:sql:value '' while parsing from line 0 field 213 not inserted, expecting type lng !SQLException:importTable:failed to import table !ERROR: Decimal cannot be parsed from lng I realize I could do manipulations but it's second choice given the data size, but the file is 8G. How can I overcome? Paul
Hi Paul, here some hints and examples: Consider $ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,,33 41,42,43 51,52,53 First, if you what to skip the first row, and then load the remaining 5, your SQL COPY syntax should not read "copy 1 offset 5 records" as in $ cat /tmp/load.sql create table t (a int, b int, c int); copy 1 offset 5 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t; $ mclient -lsql /tmp/load.sql [ 1 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 41, 42, 43 ] but rather "copy 5 offset 2 records" as in $ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t; (this is not MonetDB specific, but SQL standard; read it as "copy 5 starting from 2"). This still results in your error: $ mclient -lsql /tmp/load.sql MAPI = monetdb@localhost:50000 QUERY = create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t; ERROR = !SQLException:sql:value '' while parsing ',33' from line 2 field 1 not inserted, expecting type int !SQLException:importTable:failed to import table % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 1, 1, 1 # length To overcome, you need to specify that NULL values are not given as "NULL" or "null" in your data file (as e.g. in $ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,null,33 41,42,43 51,52,53 $ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ] ), but rather as empty fields (empty strings) by adding "null as ''" to your copy command: $ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n' null as ''; select * from t; drop table t; $ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ] (Again, this is SQL standard.) How this helps you further. Stefan On Wed, Nov 18, 2009 at 04:38:56PM -0600, pphillips99@gmail.com wrote:
COPY 1 OFFSET 100 RECORDS INTO tlbc FROM '/mnt/data/myfile.txt' DELIMITERS '\t', '\r\n'
Trying to use COPY to import tab delimited text files. The text file has a header row and I've verified the delimiters and eols. I have a number of empty columns at the end of each row in the text file and those colums in the table are a combination of integers and decimals. Received the following error at the first of the empty columns:
!SQLException:sql:value '' while parsing from line 0 field 213 not inserted, expecting type lng !SQLException:importTable:failed to import table !ERROR: Decimal cannot be parsed from lng
I realize I could do manipulations but it's second choice given the data size, but the file is 8G.
How can I overcome?
Paul
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
Stefan, Thanks so much for the quick reply, that straightened me out right away. But I find myself at another hurdle. I updated the copy to read COPY 8000000 OFFSET 2 RECORDS INTO tblc ... and I'm faced with the following exception: !SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 64000000 for 01/46/14603tail !ERROR: TABLETcreate_bats: Failed to create bat of size 8000000 I tried to chunk up the copy 1M rows at a time using COPY 1000000 OFFSET 2 RECORDS ... COPY 1000000 OFFSET 1000002 RECORDS ... The first succeeds but the second fails similarly: !SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 8000000 for 01/63/16336tail !ERROR: TABLETcreate_bats: Failed to create bat of size 1000000 I imagine I'm having some kind of monetdb.conf setting or the table definition is wrong though I don't find any thing about size in the SQL ref. Thoughts? If I'm overlooking the details in the manuals or the mailing list please just point me in the correct direction. I've gone after a number of searches without much direct reference. Thank you, Paul On 11/18/2009 5:11 PM, Stefan Manegold wrote:
Hi Paul,
here some hints and examples:
Consider
$ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,,33 41,42,43 51,52,53
First, if you what to skip the first row, and then load the remaining 5, your SQL COPY syntax should not read "copy 1 offset 5 records" as in
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 1 offset 5 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t;
$ mclient -lsql /tmp/load.sql [ 1 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 41, 42, 43 ]
but rather "copy 5 offset 2 records" as in
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t;
(this is not MonetDB specific, but SQL standard; read it as "copy 5 starting from 2").
This still results in your error:
$ mclient -lsql /tmp/load.sql MAPI = monetdb@localhost:50000 QUERY = create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t; ERROR = !SQLException:sql:value '' while parsing ',33' from line 2 field 1 not inserted, expecting type int !SQLException:importTable:failed to import table % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 1, 1, 1 # length
To overcome, you need to specify that NULL values are not given as "NULL" or "null" in your data file (as e.g. in
$ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,null,33 41,42,43 51,52,53
$ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ]
), but rather as empty fields (empty strings) by adding "null as ''" to your copy command:
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n' null as ''; select * from t; drop table t;
$ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ]
(Again, this is SQL standard.)
How this helps you further.
Stefan
On Wed, Nov 18, 2009 at 04:38:56PM -0600, pphillips99@gmail.com wrote:
COPY 1 OFFSET 100 RECORDS INTO tlbc FROM '/mnt/data/myfile.txt' DELIMITERS '\t', '\r\n'
Trying to use COPY to import tab delimited text files. The text file has a header row and I've verified the delimiters and eols. I have a number of empty columns at the end of each row in the text file and those colums in the table are a combination of integers and decimals. Received the following error at the first of the empty columns:
!SQLException:sql:value '' while parsing from line 0 field 213 not inserted, expecting type lng !SQLException:importTable:failed to import table !ERROR: Decimal cannot be parsed from lng
I realize I could do manipulations but it's second choice given the data size, but the file is 8G.
How can I overcome?
Paul
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Is the machine you've installed MonetDB on a 32 bit system? I got
those HEAPextend messages a lot on my 32 bit dev machine. A 64 bit
machine is definitely required if you're dealing with a large amount
of rows. (Or even a not-so-large amount of very wide rows which was my
case)
The comments on this bug report (which turned out to not actually be a
bug) should explain what I figure is happening to you too:
http://sourceforge.net/tracker/?func=detail&atid=482468&aid=2879117&group_id=56967
On Wed, Nov 18, 2009 at 11:01 PM, pphillips99@gmail.com
Stefan,
Thanks so much for the quick reply, that straightened me out right away.
But I find myself at another hurdle. I updated the copy to read COPY 8000000 OFFSET 2 RECORDS INTO tblc ... and I'm faced with the following exception:
!SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 64000000 for 01/46/14603tail !ERROR: TABLETcreate_bats: Failed to create bat of size 8000000
I tried to chunk up the copy 1M rows at a time using
COPY 1000000 OFFSET 2 RECORDS ... COPY 1000000 OFFSET 1000002 RECORDS ...
The first succeeds but the second fails similarly:
!SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 8000000 for 01/63/16336tail !ERROR: TABLETcreate_bats: Failed to create bat of size 1000000
I imagine I'm having some kind of monetdb.conf setting or the table definition is wrong though I don't find any thing about size in the SQL ref. Thoughts?
If I'm overlooking the details in the manuals or the mailing list please just point me in the correct direction. I've gone after a number of searches without much direct reference.
Thank you, Paul
On 11/18/2009 5:11 PM, Stefan Manegold wrote:
Hi Paul,
here some hints and examples:
Consider
$ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,,33 41,42,43 51,52,53
First, if you what to skip the first row, and then load the remaining 5, your SQL COPY syntax should not read "copy 1 offset 5 records" as in
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 1 offset 5 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t;
$ mclient -lsql /tmp/load.sql [ 1 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 41, 42, 43 ]
but rather "copy 5 offset 2 records" as in
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t;
(this is not MonetDB specific, but SQL standard; read it as "copy 5 starting from 2").
This still results in your error:
$ mclient -lsql /tmp/load.sql MAPI = monetdb@localhost:50000 QUERY = create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t; ERROR = !SQLException:sql:value '' while parsing ',33' from line 2 field 1 not inserted, expecting type int !SQLException:importTable:failed to import table % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 1, 1, 1 # length
To overcome, you need to specify that NULL values are not given as "NULL" or "null" in your data file (as e.g. in
$ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,null,33 41,42,43 51,52,53
$ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ]
), but rather as empty fields (empty strings) by adding "null as ''" to your copy command:
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n' null as ''; select * from t; drop table t;
$ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ]
(Again, this is SQL standard.)
How this helps you further.
Stefan
On Wed, Nov 18, 2009 at 04:38:56PM -0600, pphillips99@gmail.com wrote:
COPY 1 OFFSET 100 RECORDS INTO tlbc FROM '/mnt/data/myfile.txt' DELIMITERS '\t', '\r\n'
Trying to use COPY to import tab delimited text files. The text file has a header row and I've verified the delimiters and eols. I have a number of empty columns at the end of each row in the text file and those colums in the table are a combination of integers and decimals. Received the following error at the first of the empty columns:
!SQLException:sql:value '' while parsing from line 0 field 213 not inserted, expecting type lng !SQLException:importTable:failed to import table !ERROR: Decimal cannot be parsed from lng
I realize I could do manipulations but it's second choice given the data size, but the file is 8G.
How can I overcome?
Paul
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Guillaume, The machine is 32bit with PAE kernel with 6G of RAM. I had imagined that would give MonetDB the best chance at as much of 4G as possible. I'm still not clear on PAE though. The table is quite wide -- 240 columns most of which are also DECIMAL and INT. Is there any gain in considering the "Fast Loading" method described in the docs? Although I do not understand the binary dump method. I suppose that will involve some kind of C coding. Is there any guidance on the relevance/details of gdk_mem_bigsize, gdk_vmtrim, gdk_alloc_map in monetdb5.conf? Maybe I'm completely off base with that. Thanks for the bug link. I think you're quite right. I will study it more when I'm less nappy. My B,M,MB,GB maths falter at this time. Thank you. Paul On 11/18/2009 10:26 PM, Guillaume Theoret wrote:
Is the machine you've installed MonetDB on a 32 bit system? I got those HEAPextend messages a lot on my 32 bit dev machine. A 64 bit machine is definitely required if you're dealing with a large amount of rows. (Or even a not-so-large amount of very wide rows which was my case)
The comments on this bug report (which turned out to not actually be a bug) should explain what I figure is happening to you too: http://sourceforge.net/tracker/?func=detail&atid=482468&aid=2879117&group_id=56967
On Wed, Nov 18, 2009 at 11:01 PM, pphillips99@gmail.com
wrote: Stefan,
Thanks so much for the quick reply, that straightened me out right away.
But I find myself at another hurdle. I updated the copy to read COPY 8000000 OFFSET 2 RECORDS INTO tblc ... and I'm faced with the following exception:
!SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 64000000 for 01/46/14603tail !ERROR: TABLETcreate_bats: Failed to create bat of size 8000000
I tried to chunk up the copy 1M rows at a time using
COPY 1000000 OFFSET 2 RECORDS ... COPY 1000000 OFFSET 1000002 RECORDS ...
The first succeeds but the second fails similarly:
!SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 8000000 for 01/63/16336tail !ERROR: TABLETcreate_bats: Failed to create bat of size 1000000
I imagine I'm having some kind of monetdb.conf setting or the table definition is wrong though I don't find any thing about size in the SQL ref. Thoughts?
If I'm overlooking the details in the manuals or the mailing list please just point me in the correct direction. I've gone after a number of searches without much direct reference.
Thank you, Paul
On 11/18/2009 5:11 PM, Stefan Manegold wrote:
Hi Paul,
here some hints and examples:
Consider
$ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,,33 41,42,43 51,52,53
First, if you what to skip the first row, and then load the remaining 5, your SQL COPY syntax should not read "copy 1 offset 5 records" as in
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 1 offset 5 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t;
$ mclient -lsql /tmp/load.sql [ 1 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 41, 42, 43 ]
but rather "copy 5 offset 2 records" as in
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t;
(this is not MonetDB specific, but SQL standard; read it as "copy 5 starting from 2").
This still results in your error:
$ mclient -lsql /tmp/load.sql MAPI = monetdb@localhost:50000 QUERY = create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t; ERROR = !SQLException:sql:value '' while parsing ',33' from line 2 field 1 not inserted, expecting type int !SQLException:importTable:failed to import table % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 1, 1, 1 # length
To overcome, you need to specify that NULL values are not given as "NULL" or "null" in your data file (as e.g. in
$ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,null,33 41,42,43 51,52,53
$ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ]
), but rather as empty fields (empty strings) by adding "null as ''" to your copy command:
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n' null as ''; select * from t; drop table t;
$ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ]
(Again, this is SQL standard.)
How this helps you further.
Stefan
On Wed, Nov 18, 2009 at 04:38:56PM -0600, pphillips99@gmail.com wrote:
COPY 1 OFFSET 100 RECORDS INTO tlbc FROM '/mnt/data/myfile.txt' DELIMITERS '\t', '\r\n'
Trying to use COPY to import tab delimited text files. The text file has a header row and I've verified the delimiters and eols. I have a number of empty columns at the end of each row in the text file and those colums in the table are a combination of integers and decimals. Received the following error at the first of the empty columns:
!SQLException:sql:value '' while parsing from line 0 field 213 not inserted, expecting type lng !SQLException:importTable:failed to import table !ERROR: Decimal cannot be parsed from lng
I realize I could do manipulations but it's second choice given the data size, but the file is 8G.
How can I overcome?
Paul
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
pphillips99@gmail.com wrote:
Guillaume,
The machine is 32bit with PAE kernel with 6G of RAM. I had imagined that would give MonetDB the best chance at as much of 4G as possible. I'm still not clear on PAE though. 32 bit remains 32 bit addressing and limits the amount of data that can be concurrently handled. No matter what the OS does with the remaining GB ram (perhaps a RAM disk)
The table is quite wide -- 240 columns most of which are also DECIMAL and INT. Is there any gain in considering the "Fast Loading" method described in the docs? Although I do not understand the binary dump method. I suppose that will involve some kind of C coding. Yes, this is indeed relevant for those in control over the source and are experienced in C.
Is there any guidance on the relevance/details of gdk_mem_bigsize, gdk_vmtrim, gdk_alloc_map in monetdb5.conf? Maybe I'm completely off base with that.
Those configuration settings are not meant to be set unless you want to experiment with the kernel. regards, Martin
Thanks for the bug link. I think you're quite right. I will study it more when I'm less nappy. My B,M,MB,GB maths falter at this time.
Thank you. Paul
On 11/18/2009 10:26 PM, Guillaume Theoret wrote:
Is the machine you've installed MonetDB on a 32 bit system? I got those HEAPextend messages a lot on my 32 bit dev machine. A 64 bit machine is definitely required if you're dealing with a large amount of rows. (Or even a not-so-large amount of very wide rows which was my case)
The comments on this bug report (which turned out to not actually be a bug) should explain what I figure is happening to you too: http://sourceforge.net/tracker/?func=detail&atid=482468&aid=2879117&group_id=56967
On Wed, Nov 18, 2009 at 11:01 PM, pphillips99@gmail.com
wrote: Stefan,
Thanks so much for the quick reply, that straightened me out right away.
But I find myself at another hurdle. I updated the copy to read COPY 8000000 OFFSET 2 RECORDS INTO tblc ... and I'm faced with the following exception:
!SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 64000000 for 01/46/14603tail !ERROR: TABLETcreate_bats: Failed to create bat of size 8000000
I tried to chunk up the copy 1M rows at a time using
COPY 1000000 OFFSET 2 RECORDS ... COPY 1000000 OFFSET 1000002 RECORDS ...
The first succeeds but the second fails similarly:
!SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 8000000 for 01/63/16336tail !ERROR: TABLETcreate_bats: Failed to create bat of size 1000000
I imagine I'm having some kind of monetdb.conf setting or the table definition is wrong though I don't find any thing about size in the SQL ref. Thoughts?
If I'm overlooking the details in the manuals or the mailing list please just point me in the correct direction. I've gone after a number of searches without much direct reference.
Thank you, Paul
On 11/18/2009 5:11 PM, Stefan Manegold wrote:
Hi Paul,
here some hints and examples:
Consider
$ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,,33 41,42,43 51,52,53
First, if you what to skip the first row, and then load the remaining 5, your SQL COPY syntax should not read "copy 1 offset 5 records" as in
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 1 offset 5 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t;
$ mclient -lsql /tmp/load.sql [ 1 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 41, 42, 43 ]
but rather "copy 5 offset 2 records" as in
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t;
(this is not MonetDB specific, but SQL standard; read it as "copy 5 starting from 2").
This still results in your error:
$ mclient -lsql /tmp/load.sql MAPI = monetdb@localhost:50000 QUERY = create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t; ERROR = !SQLException:sql:value '' while parsing ',33' from line 2 field 1 not inserted, expecting type int !SQLException:importTable:failed to import table % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 1, 1, 1 # length
To overcome, you need to specify that NULL values are not given as "NULL" or "null" in your data file (as e.g. in
$ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,null,33 41,42,43 51,52,53
$ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ]
), but rather as empty fields (empty strings) by adding "null as ''" to your copy command:
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n' null as ''; select * from t; drop table t;
$ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ]
(Again, this is SQL standard.)
How this helps you further.
Stefan
On Wed, Nov 18, 2009 at 04:38:56PM -0600, pphillips99@gmail.com wrote:
COPY 1 OFFSET 100 RECORDS INTO tlbc FROM '/mnt/data/myfile.txt' DELIMITERS '\t', '\r\n'
Trying to use COPY to import tab delimited text files. The text file has a header row and I've verified the delimiters and eols. I have a number of empty columns at the end of each row in the text file and those colums in the table are a combination of integers and decimals. Received the following error at the first of the empty columns:
!SQLException:sql:value '' while parsing from line 0 field 213 not inserted, expecting type lng !SQLException:importTable:failed to import table !ERROR: Decimal cannot be parsed from lng
I realize I could do manipulations but it's second choice given the data size, but the file is 8G.
How can I overcome?
Paul
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On Thu, Nov 19, 2009 at 08:18:10AM +0100, Martin Kersten wrote:
pphillips99@gmail.com wrote:
Guillaume,
The machine is 32bit with PAE kernel with 6G of RAM. I had imagined that would give MonetDB the best chance at as much of 4G as possible. I'm still not clear on PAE though. 32 bit remains 32 bit addressing and limits the amount of data that can be concurrently handled. No matter what the OS does with the remaining GB ram (perhaps a RAM disk)
In detail: While PAE allows a 32-bit OS to access physical memory addresses >32-bit (i.e. >4GB), the virtual address space remains 32-bit, i.e., any process can still address (and hence use) at most 32-bit (4GB) of address spaces (regardless of whether that's physical memory (RAM) or virtual memory (swap)). Hence, even with PAE kernel 32-bit MonetDB remains indeed 32-bit MonetDB, limited to a 32-bit (4GB) address space.
The table is quite wide -- 240 columns most of which are also DECIMAL and INT. Is there any gain in considering the "Fast Loading" method described in the docs? Although I do not understand the binary dump method. I suppose that will involve some kind of C coding. Yes, this is indeed relevant for those in control over the source and are experienced in C.
It is relevant to the extend that loading might work, then, provided each column (-> BAT) is no larger than 4GB (in practice the limit is obviouslysmaller, as the process needs some address area for itself and internal data, etc.). However, during query processing ove this table, you might hit the address space limit, once your queries require more then 32-bit address space. Address space fragmentation will lower this limit in practice. Background: MonetDB need to be able to address all data that is need to access at any given point in time. During "normal" copy into, this is the whole table. During query processing, this can be less, as only the required columns are accessed, and not necessarily all at the same time. Stefan
Is there any guidance on the relevance/details of gdk_mem_bigsize, gdk_vmtrim, gdk_alloc_map in monetdb5.conf? Maybe I'm completely off base with that.
Those configuration settings are not meant to be set unless you want to experiment with the kernel.
regards, Martin
Thanks for the bug link. I think you're quite right. I will study it more when I'm less nappy. My B,M,MB,GB maths falter at this time.
Thank you. Paul
On 11/18/2009 10:26 PM, Guillaume Theoret wrote:
Is the machine you've installed MonetDB on a 32 bit system? I got those HEAPextend messages a lot on my 32 bit dev machine. A 64 bit machine is definitely required if you're dealing with a large amount of rows. (Or even a not-so-large amount of very wide rows which was my case)
The comments on this bug report (which turned out to not actually be a bug) should explain what I figure is happening to you too: http://sourceforge.net/tracker/?func=detail&atid=482468&aid=2879117&group_id=56967
On Wed, Nov 18, 2009 at 11:01 PM, pphillips99@gmail.com
wrote: Stefan,
Thanks so much for the quick reply, that straightened me out right away.
But I find myself at another hurdle. I updated the copy to read COPY 8000000 OFFSET 2 RECORDS INTO tblc ... and I'm faced with the following exception:
!SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 64000000 for 01/46/14603tail !ERROR: TABLETcreate_bats: Failed to create bat of size 8000000
I tried to chunk up the copy 1M rows at a time using
COPY 1000000 OFFSET 2 RECORDS ... COPY 1000000 OFFSET 1000002 RECORDS ...
The first succeeds but the second fails similarly:
!SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 8000000 for 01/63/16336tail !ERROR: TABLETcreate_bats: Failed to create bat of size 1000000
I imagine I'm having some kind of monetdb.conf setting or the table definition is wrong though I don't find any thing about size in the SQL ref. Thoughts?
If I'm overlooking the details in the manuals or the mailing list please just point me in the correct direction. I've gone after a number of searches without much direct reference.
Thank you, Paul
On 11/18/2009 5:11 PM, Stefan Manegold wrote:
Hi Paul,
here some hints and examples:
Consider
$ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,,33 41,42,43 51,52,53
First, if you what to skip the first row, and then load the remaining 5, your SQL COPY syntax should not read "copy 1 offset 5 records" as in
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 1 offset 5 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t;
$ mclient -lsql /tmp/load.sql [ 1 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 41, 42, 43 ]
but rather "copy 5 offset 2 records" as in
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t;
(this is not MonetDB specific, but SQL standard; read it as "copy 5 starting from 2").
This still results in your error:
$ mclient -lsql /tmp/load.sql MAPI = monetdb@localhost:50000 QUERY = create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n'; select * from t; drop table t; ERROR = !SQLException:sql:value '' while parsing ',33' from line 2 field 1 not inserted, expecting type int !SQLException:importTable:failed to import table % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 1, 1, 1 # length
To overcome, you need to specify that NULL values are not given as "NULL" or "null" in your data file (as e.g. in
$ cat /tmp/data.csv -1,-2,-3 11,12,13 21,22,23 31,null,33 41,42,43 51,52,53
$ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ]
), but rather as empty fields (empty strings) by adding "null as ''" to your copy command:
$ cat /tmp/load.sql create table t (a int, b int, c int); copy 5 offset 2 records into t from '/tmp/data.csv' delimiters ',','\n' null as ''; select * from t; drop table t;
$ mclient -lsql /tmp/load.sql [ 5 ] % sys.t, sys.t, sys.t # table_name % a, b, c # name % int, int, int # type % 2, 2, 2 # length [ 11, 12, 13 ] [ 21, 22, 23 ] [ 31, NULL, 33 ] [ 41, 42, 43 ] [ 51, 52, 53 ]
(Again, this is SQL standard.)
How this helps you further.
Stefan
On Wed, Nov 18, 2009 at 04:38:56PM -0600, pphillips99@gmail.com wrote:
COPY 1 OFFSET 100 RECORDS INTO tlbc FROM '/mnt/data/myfile.txt' DELIMITERS '\t', '\r\n'
Trying to use COPY to import tab delimited text files. The text file has a header row and I've verified the delimiters and eols. I have a number of empty columns at the end of each row in the text file and those colums in the table are a combination of integers and decimals. Received the following error at the first of the empty columns:
!SQLException:sql:value '' while parsing from line 0 field 213 not inserted, expecting type lng !SQLException:importTable:failed to import table !ERROR: Decimal cannot be parsed from lng
I realize I could do manipulations but it's second choice given the data size, but the file is 8G.
How can I overcome?
Paul
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4312 |
Stefan and Martin, Thank you both for so much detail. I will have to consider a 64bit machine for future convenience. I have managed to import all my data by splitting the raw file into parts by rows. Much improved. I am now doing queries. I wonder now if there is a direct way to do percent rank of a column value by date? I understand how to make the calculation by I did not want to overlook a simple answer with MonetDB. My experience has been with Oracle functions like percent_rank and lag/lead for example. Again thank you all for starting me off. My initial test query are performing very fast and I'm excited about MonetDB's success. Thanks. Paul
participants (4)
-
Guillaume Theoret
-
Martin Kersten
-
pphillips99@gmail.com
-
Stefan Manegold