[MonetDB-users] MonetDB database size restrictions
Hi, I hope you don't get asked this question too often, I've had a look for FAQs but couldn't find anything obvious. I'm trying to load some data into MonetDB (built on a i386 Debian "etch" box from MonetDB-Feb2008-SuperBall.tar.bz2) and I'm getting strange problems. I've got a table with 11 columns (2 int, 5 varchar, 4 date) and about 38 million rows (the text file I want to pull in is 3.3GB). If I try COPYing the data in, Monet alternately stops making progress (the computer is reading from the disk slowly, so maybe it's just going to take a *long* time to finish) or crashes with the following message: !SQLException:importTable:failed to import table !ERROR: GDKload: cannot mmap(): name=24/2470, ext=tail.priv !OS: Cannot allocate memory !ERROR: GDKload failed: name=24/2470, ext=tail.priv If I try splitting the file into smaller pieces (about a gig each), things work and I end up with the data in the database. It would be nice to know why I needed to split the file up first. Now I've got the data in I tried running a few queries to make sure everything went OK. I'm getting some very strange results when doing aggregations. The following work: SELECT count(*) FROM source_livestock; +----------+ | count_ | +==========+ | 37898310 | +----------+ SELECT sex_code, count(*) FROM source_livestock GROUP BY sex_code; +----------+---------------+ | sex_code |count_sex_code | +==========+===============+ | M | 17505304 | | F | 20392997 | | null | 9 | +----------+---------------+ This last query came back amazingly quickly, about ten times faster than Postgres. Monet looks like a fairly amazing database if I can routinely expect queries to be this much faster. SELECT count(*); +--------------+ | single_value | +==============+ | 1 | +--------------+ but the following don't do what I'd expect: SELECT count(1); MAPI = monetdb@endemic:50001 ACTION= read_line QUERY = select count(1); ERROR = Connection terminated I'd expect to get a single row with a single value containing the number 1 from this query. And for the following query I'd expect to get the same, just with the result being zero. SELECT count(null); !syntax error, unexpected sqlNULL in: "select count(null" This is how I found the bug: SELECT sex_code, count(1) FROM source_livestock GROUP BY sex_code; !TypeException:user.s3_1[26]:'aggr.count_no_nil' undefined in: _36:bat[:any,:int] := aggr.count_no_nil(_35:bte, _31:bat[:oid,:oid], _32:bat[:oid,:oid]) !SQLException:sql.rsColumn:Cannot access descriptor !WARNING: BATdescriptor: range error 1 I've come to expect COUNT(*) and COUNT(1) as having the same semantics. But I may be missing something. SELECT sex_code, count(1+1) FROM source_livestock GROUP BY sex_code; !TypeException:user.s4_1[28]:'aggr.count_no_nil' undefined in: _39:bat[:any,:int] := aggr.count_no_nil(_38:bte, _32:bat[:oid,:oid], _33:bat[:oid,:oid]) +----------+---------------------+ | sex_code |sql_add_single_value | +==========+=====================+ | M | monetdb | +----------+---------------------+ This second one is particularly bad because I get an incorrect answer back. Any idea what I'm doing wrong? Thanks, Sam
On Fri, Jun 13, 2008 at 04:17:19PM +0100, Sam Mason wrote:
Hi,
It seems you hit a bug in MonetDB/SQL. Could you file a bug report on the sourceforge bug tracker. This way we can keep track of this issue. The problem you have with loading should be fixed in the new (bound te be released) version (ie June-2008 release). Please check out that version, for example compile a version using the monetdb-install.sh script (see http://monetdb.cwi.nl/projects/monetdb//download.php?target=/projects/monetd...) Niels
I hope you don't get asked this question too often, I've had a look for FAQs but couldn't find anything obvious.
I'm trying to load some data into MonetDB (built on a i386 Debian "etch" box from MonetDB-Feb2008-SuperBall.tar.bz2) and I'm getting strange problems. I've got a table with 11 columns (2 int, 5 varchar, 4 date) and about 38 million rows (the text file I want to pull in is 3.3GB).
If I try COPYing the data in, Monet alternately stops making progress (the computer is reading from the disk slowly, so maybe it's just going to take a *long* time to finish) or crashes with the following message:
!SQLException:importTable:failed to import table !ERROR: GDKload: cannot mmap(): name=24/2470, ext=tail.priv !OS: Cannot allocate memory !ERROR: GDKload failed: name=24/2470, ext=tail.priv
If I try splitting the file into smaller pieces (about a gig each), things work and I end up with the data in the database. It would be nice to know why I needed to split the file up first.
Now I've got the data in I tried running a few queries to make sure everything went OK. I'm getting some very strange results when doing aggregations. The following work:
SELECT count(*) FROM source_livestock; +----------+ | count_ | +==========+ | 37898310 | +----------+
SELECT sex_code, count(*) FROM source_livestock GROUP BY sex_code; +----------+---------------+ | sex_code |count_sex_code | +==========+===============+ | M | 17505304 | | F | 20392997 | | null | 9 | +----------+---------------+
This last query came back amazingly quickly, about ten times faster than Postgres. Monet looks like a fairly amazing database if I can routinely expect queries to be this much faster.
SELECT count(*); +--------------+ | single_value | +==============+ | 1 | +--------------+
but the following don't do what I'd expect:
SELECT count(1); MAPI = monetdb@endemic:50001 ACTION= read_line QUERY = select count(1); ERROR = Connection terminated
I'd expect to get a single row with a single value containing the number 1 from this query. And for the following query I'd expect to get the same, just with the result being zero.
SELECT count(null); !syntax error, unexpected sqlNULL in: "select count(null"
This is how I found the bug:
SELECT sex_code, count(1) FROM source_livestock GROUP BY sex_code; !TypeException:user.s3_1[26]:'aggr.count_no_nil' undefined in: _36:bat[:any,:int] := aggr.count_no_nil(_35:bte, _31:bat[:oid,:oid], _32:bat[:oid,:oid]) !SQLException:sql.rsColumn:Cannot access descriptor !WARNING: BATdescriptor: range error 1
I've come to expect COUNT(*) and COUNT(1) as having the same semantics. But I may be missing something.
SELECT sex_code, count(1+1) FROM source_livestock GROUP BY sex_code; !TypeException:user.s4_1[28]:'aggr.count_no_nil' undefined in: _39:bat[:any,:int] := aggr.count_no_nil(_38:bte, _32:bat[:oid,:oid], _33:bat[:oid,:oid]) +----------+---------------------+ | sex_code |sql_add_single_value | +==========+=====================+ | M | monetdb | +----------+---------------------+
This second one is particularly bad because I get an incorrect answer back. Any idea what I'm doing wrong?
Thanks, Sam
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Fri, Jun 13, 2008 at 06:06:59PM +0200, Niels Nes wrote:
On Fri, Jun 13, 2008 at 04:17:19PM +0100, Sam Mason wrote:
Hi,
It seems you hit a bug in MonetDB/SQL. Could you file a bug report on the sourceforge bug tracker. This way we can keep track of this issue. The problem you have with loading should be fixed in the new (bound te be released) version (ie June-2008 release). Please check out that version, for example compile a version using the monetdb-install.sh script (see http://monetdb.cwi.nl/projects/monetdb//download.php?target=/projects/monetd...)
I just checked in a fix for your count(single value) problems. Niels
Niels
I hope you don't get asked this question too often, I've had a look for FAQs but couldn't find anything obvious.
I'm trying to load some data into MonetDB (built on a i386 Debian "etch" box from MonetDB-Feb2008-SuperBall.tar.bz2) and I'm getting strange problems. I've got a table with 11 columns (2 int, 5 varchar, 4 date) and about 38 million rows (the text file I want to pull in is 3.3GB).
If I try COPYing the data in, Monet alternately stops making progress (the computer is reading from the disk slowly, so maybe it's just going to take a *long* time to finish) or crashes with the following message:
!SQLException:importTable:failed to import table !ERROR: GDKload: cannot mmap(): name=24/2470, ext=tail.priv !OS: Cannot allocate memory !ERROR: GDKload failed: name=24/2470, ext=tail.priv
If I try splitting the file into smaller pieces (about a gig each), things work and I end up with the data in the database. It would be nice to know why I needed to split the file up first.
Now I've got the data in I tried running a few queries to make sure everything went OK. I'm getting some very strange results when doing aggregations. The following work:
SELECT count(*) FROM source_livestock; +----------+ | count_ | +==========+ | 37898310 | +----------+
SELECT sex_code, count(*) FROM source_livestock GROUP BY sex_code; +----------+---------------+ | sex_code |count_sex_code | +==========+===============+ | M | 17505304 | | F | 20392997 | | null | 9 | +----------+---------------+
This last query came back amazingly quickly, about ten times faster than Postgres. Monet looks like a fairly amazing database if I can routinely expect queries to be this much faster.
SELECT count(*); +--------------+ | single_value | +==============+ | 1 | +--------------+
but the following don't do what I'd expect:
SELECT count(1); MAPI = monetdb@endemic:50001 ACTION= read_line QUERY = select count(1); ERROR = Connection terminated
I'd expect to get a single row with a single value containing the number 1 from this query. And for the following query I'd expect to get the same, just with the result being zero.
SELECT count(null); !syntax error, unexpected sqlNULL in: "select count(null"
This is how I found the bug:
SELECT sex_code, count(1) FROM source_livestock GROUP BY sex_code; !TypeException:user.s3_1[26]:'aggr.count_no_nil' undefined in: _36:bat[:any,:int] := aggr.count_no_nil(_35:bte, _31:bat[:oid,:oid], _32:bat[:oid,:oid]) !SQLException:sql.rsColumn:Cannot access descriptor !WARNING: BATdescriptor: range error 1
I've come to expect COUNT(*) and COUNT(1) as having the same semantics. But I may be missing something.
SELECT sex_code, count(1+1) FROM source_livestock GROUP BY sex_code; !TypeException:user.s4_1[28]:'aggr.count_no_nil' undefined in: _39:bat[:any,:int] := aggr.count_no_nil(_38:bte, _32:bat[:oid,:oid], _33:bat[:oid,:oid]) +----------+---------------------+ | sex_code |sql_add_single_value | +==========+=====================+ | M | monetdb | +----------+---------------------+
This second one is particularly bad because I get an incorrect answer back. Any idea what I'm doing wrong?
Thanks, Sam
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
--
Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Fri, Jun 13, 2008 at 06:06:59PM +0200, Niels Nes wrote:
It seems you hit a bug in MonetDB/SQL. Could you file a bug report on the sourceforge bug tracker. This way we can keep track of this issue.
Which bug did this apply to? it looks from the follow-up message that you've fixed the other bug already as well. I've found some other things and I'll report them to the sf bug tracker.
The problem you have with loading should be fixed in the new (bound te be released) version (ie June-2008 release). Please check out that version, for example compile a version using the monetdb-install.sh script (see http://monetdb.cwi.nl/projects/monetdb//download.php?target=/projects/monetd...)
After suffering a disk crash I've finally got back to this. I've built a "nightly" stable version, generated the data again in the correct format and tried to perform the COPY in again. It still fails when running: COPY INTO source_livestock FROM '/mnt/scratch/smason/source_livestock' DELIMITERS ','; !SQLException:importTable:failed to import table !ERROR: GDKload: cannot mmap(): name=16/1634, ext=theap !OS: Cannot allocate memory !ERROR: GDKload failed: name=16/1634, ext=theap Would you expect me to get this still? and hence should I provide more details or should I be building something else? Thanks, Sam
On Tue, Jun 17, 2008 at 07:06:24PM +0100, Sam Mason wrote:
On Fri, Jun 13, 2008 at 06:06:59PM +0200, Niels Nes wrote:
It seems you hit a bug in MonetDB/SQL. Could you file a bug report on the sourceforge bug tracker. This way we can keep track of this issue.
Which bug did this apply to? it looks from the follow-up message that you've fixed the other bug already as well. I've found some other things and I'll report them to the sf bug tracker.
The problem you have with loading should be fixed in the new (bound te be released) version (ie June-2008 release). Please check out that version, for example compile a version using the monetdb-install.sh script (see http://monetdb.cwi.nl/projects/monetdb//download.php?target=/projects/monetd...)
After suffering a disk crash I've finally got back to this. I've built a "nightly" stable version, generated the data again in the correct format and tried to perform the COPY in again. It still fails when running:
COPY INTO source_livestock FROM '/mnt/scratch/smason/source_livestock' DELIMITERS ','; !SQLException:importTable:failed to import table !ERROR: GDKload: cannot mmap(): name=16/1634, ext=theap !OS: Cannot allocate memory !ERROR: GDKload failed: name=16/1634, ext=theap
Would you expect me to get this still? and hence should I provide more details or should I be building something else? For this problem I indeed need more info. How much memory does your system have and how much swap. Also what are the sizes of the to be loaded table (nr of rows, columns, data types) ?
Niels
Thanks, Sam
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Tue, Jun 17, 2008 at 08:20:58PM +0200, Niels Nes wrote:
On Tue, Jun 17, 2008 at 07:06:24PM +0100, Sam Mason wrote:
COPY INTO source_livestock FROM '/mnt/scratch/smason/source_livestock' DELIMITERS ','; !SQLException:importTable:failed to import table !ERROR: GDKload: cannot mmap(): name=16/1634, ext=theap !OS: Cannot allocate memory !ERROR: GDKload failed: name=16/1634, ext=theap
Would you expect me to get this still? and hence should I provide more details or should I be building something else?
For this problem I indeed need more info. How much memory does your system have and how much swap. Also what are the sizes of the to be loaded table (nr of rows, columns, data types) ?
Sorry, it's almost the same as my original post. The table now has 10 columns, 2 integer, 4 varchars and 4 dates. I'm trying to pull in about 38million rows (text file is about 3GB) into a 32bit build of monetdb. RAM and swap space are 2GB each, i.e. I've got 4GB in total. Why would ram/swap space matter for anything but performance? If they do, I'm assuming monet wouldn't work with files larger than the memory addressable by a single process? This file is going into one of the smaller tables that I want to work with. If this is going to be a problem how much of a change is it going to be to the code to support out-of-core style operations. Sam
On Wed, Jun 18, 2008 at 10:47:18AM +0100, Sam Mason wrote:
On Tue, Jun 17, 2008 at 08:20:58PM +0200, Niels Nes wrote:
On Tue, Jun 17, 2008 at 07:06:24PM +0100, Sam Mason wrote:
COPY INTO source_livestock FROM '/mnt/scratch/smason/source_livestock' DELIMITERS ','; !SQLException:importTable:failed to import table !ERROR: GDKload: cannot mmap(): name=16/1634, ext=theap !OS: Cannot allocate memory !ERROR: GDKload failed: name=16/1634, ext=theap
Would you expect me to get this still? and hence should I provide more details or should I be building something else?
For this problem I indeed need more info. How much memory does your system have and how much swap. Also what are the sizes of the to be loaded table (nr of rows, columns, data types) ?
Sorry, it's almost the same as my original post. The table now has 10 columns, 2 integer, 4 varchars and 4 dates. I'm trying to pull in about 38million rows (text file is about 3GB) into a 32bit build of monetdb. RAM and swap space are 2GB each, i.e. I've got 4GB in total. I must have mist your 32 bit build in your previous mail. The 32 bit adress space will cause problems as that limits you to 2G databases. 38 * 10*4 (int) (already about 1.5G) + what ever string data. This may just not fit, leading to the GDKload etc error.
The solution is to switch to a 64 bit machine + build. So indeed the memory/swap isn't the limiting factor (other then speed) but the adress space is. Niels
Why would ram/swap space matter for anything but performance? If they do, I'm assuming monet wouldn't work with files larger than the memory addressable by a single process? This file is going into one of the smaller tables that I want to work with. If this is going to be a problem how much of a change is it going to be to the code to support out-of-core style operations.
Sam
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Wed, Jun 18, 2008 at 11:54:46AM +0200, Niels Nes wrote:
On Wed, Jun 18, 2008 at 10:47:18AM +0100, Sam Mason wrote:
The table now has 10 columns, 2 integer, 4 varchars and 4 dates. I'm trying to pull in about 38million rows (text file is about 3GB) into a 32bit build of monetdb. RAM and swap space are 2GB each, i.e. I've got 4GB in total. I must have mist your 32 bit build in your previous mail.
I'm not sure I thought of specifying it, every other database I've used hasn't been limited by address space size in this way :)
The 32 bit adress space will cause problems as that limits you to 2G databases. 38 * 10*4 (int) (already about 1.5G) + what ever string data. This may just not fit, leading to the GDKload etc error.
What does monet actually try to fit in memory at any one time? I guess it depends on the operation, but does it say try and mmap the entire source file when you're doing a COPY or does it stream that in? What about where it's being stored, is that mmapped in as well?
The solution is to switch to a 64 bit machine + build. So indeed the memory/swap isn't the limiting factor (other then speed) but the adress space is.
OK. It'll be a bit of a fiddle installing a 64bit OS on this machine so I'd prefer not to unless it's going to be a benefit. I've split the data up and it all imports OK now. I tried adding a primary key to the table and this causes the server to thrash, implying that it's not using particularly swap friendly algorithms. How much testing has this use case had? The dbfarm (what's the correct terminology?) takes up a couple of gig on disk if that helps. Sam
On Wed, Jun 18, 2008 at 12:34:13PM +0100, Sam Mason wrote:
On Wed, Jun 18, 2008 at 11:54:46AM +0200, Niels Nes wrote:
On Wed, Jun 18, 2008 at 10:47:18AM +0100, Sam Mason wrote:
The table now has 10 columns, 2 integer, 4 varchars and 4 dates. I'm trying to pull in about 38million rows (text file is about 3GB) into a 32bit build of monetdb. RAM and swap space are 2GB each, i.e. I've got 4GB in total. I must have mist your 32 bit build in your previous mail.
I'm not sure I thought of specifying it, every other database I've used hasn't been limited by address space size in this way :) We like to be special ;-).
The 32 bit adress space will cause problems as that limits you to 2G databases. 38 * 10*4 (int) (already about 1.5G) + what ever string data. This may just not fit, leading to the GDKload etc error.
What does monet actually try to fit in memory at any one time? I guess it depends on the operation, but does it say try and mmap the entire source file when you're doing a COPY or does it stream that in? What about where it's being stored, is that mmapped in as well? The input is streamed, but the result is stored using mmap files.
The solution is to switch to a 64 bit machine + build. So indeed the memory/swap isn't the limiting factor (other then speed) but the adress space is.
OK. It'll be a bit of a fiddle installing a 64bit OS on this machine so I'd prefer not to unless it's going to be a benefit. I've split the data up and it all imports OK now. I tried adding a primary key to the table and this causes the server to thrash, implying that it's not using particularly swap friendly algorithms. How much testing has this use case had? The dbfarm (what's the correct terminology?) takes up a couple of gig on disk if that helps. Indeed the current primary key check is (in out of memory situations) bad as it trashes swap (ie close to random access on a large column).
Niels
Sam
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Wed, Jun 18, 2008 at 06:18:02PM +0200, Niels Nes wrote:
On Wed, Jun 18, 2008 at 12:34:13PM +0100, Sam Mason wrote:
I'm not sure I thought of specifying it, every other database I've used hasn't been limited by address space size in this way :) We like to be special ;-).
there's always one isn't there :)
Indeed the current primary key check is (in out of memory situations) bad as it trashes swap (ie close to random access on a large column).
OK, how much of a change to the code do you think it would be to remove this behavior? I'm reasonably proficient in C, the thing that tends to take time is learning the structure of a new codebase. Sam
On Wed, Jun 18, 2008 at 05:28:29PM +0100, Sam Mason wrote:
On Wed, Jun 18, 2008 at 06:18:02PM +0200, Niels Nes wrote:
On Wed, Jun 18, 2008 at 12:34:13PM +0100, Sam Mason wrote:
I'm not sure I thought of specifying it, every other database I've used hasn't been limited by address space size in this way :) We like to be special ;-).
there's always one isn't there :)
Indeed the current primary key check is (in out of memory situations) bad as it trashes swap (ie close to random access on a large column).
OK, how much of a change to the code do you think it would be to remove this behavior? I'm reasonably proficient in C, the thing that tends to take time is learning the structure of a new codebase.
Depends on how you want to solve it. Current code uses a hash solution a clustered hash or sorted solution would work better (less random io). The code for schema altering sql statements is layered as follows sql parser, semantic checks,generate a tree of the needed 'mal' instructions, optimizers, generate mal. So adaption of the code in sql_schema/sql_updates would be needed possibly with a new index structure to support the more advanced key checks (clusters/ordered index). This is not a task some one new at MonetDB should start at. There are better 'introductionary' tasks by which you could slowly learn the system. One such task would be the addition of new functions, ie c-functions which can be called from sql. From the feature requests for example I know we would like more advanced time/date output format functions. Niels
Sam
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Wed, Jun 18, 2008 at 07:13:58PM +0200, Niels Nes wrote:
On Wed, Jun 18, 2008 at 05:28:29PM +0100, Sam Mason wrote:
OK, how much of a change to the code do you think it would be to remove this behavior? I'm reasonably proficient in C, the thing that tends to take time is learning the structure of a new codebase.
Depends on how you want to solve it. Current code uses a hash solution a clustered hash or sorted solution would work better (less random io).
As in some sort of hash structure is persistently maintained?
The code for schema altering sql statements is layered as follows sql parser, semantic checks,generate a tree of the needed 'mal' instructions, optimizers, generate mal.
sounds like a bit of learning!
So adaption of the code in sql_schema/sql_updates would be needed possibly with a new index structure to support the more advanced key checks (clusters/ordered index).
When you say "more advanced key checks" do you pretty much mean anything more than equality?
This is not a task some one new at MonetDB should start at. There are better 'introductionary' tasks by which you could slowly learn the system. One such task would be the addition of new functions, ie c-functions which can be called from sql. From the feature requests for example I know we would like more advanced time/date output format functions.
Oracle and PG both provide a to_char function[1] that I could try implementing a subset of. I don't really like the formatting string of to_char, I prefer C's strftime more, but it could be a good compatible start. Sam [1] http://www.postgresql.org/docs/current/static/functions-formatting.html
On Wed, Jun 18, 2008 at 06:28:17PM +0100, Sam Mason wrote:
On Wed, Jun 18, 2008 at 07:13:58PM +0200, Niels Nes wrote:
On Wed, Jun 18, 2008 at 05:28:29PM +0100, Sam Mason wrote:
OK, how much of a change to the code do you think it would be to remove this behavior? I'm reasonably proficient in C, the thing that tends to take time is learning the structure of a new codebase.
Depends on how you want to solve it. Current code uses a hash solution a clustered hash or sorted solution would work better (less random io).
As in some sort of hash structure is persistently maintained? Its basically a column which stores key values, which is accessed using an in memory hash table.
The code for schema altering sql statements is layered as follows sql parser, semantic checks,generate a tree of the needed 'mal' instructions, optimizers, generate mal.
sounds like a bit of learning! Indeed a sql compiler is usualy a bit of learning ;-).
So adaption of the code in sql_schema/sql_updates would be needed possibly with a new index structure to support the more advanced key checks (clusters/ordered index).
When you say "more advanced key checks" do you pretty much mean anything more than equality? Clusters could support more than equality. But for the primary key checks I just mean a more advanced (faster) way of checking unique ness.
This is not a task some one new at MonetDB should start at. There are better 'introductionary' tasks by which you could slowly learn the system. One such task would be the addition of new functions, ie c-functions which can be called from sql. From the feature requests for example I know we would like more advanced time/date output format functions.
Oracle and PG both provide a to_char function[1] that I could try implementing a subset of. I don't really like the formatting string of to_char, I prefer C's strftime more, but it could be a good compatible start.
We do support strftime for 'type date', but we would like to support it for time and timestamp as well. See the current implementation in MonetDB5/src/modules/adt/mtime.mx as a starting point. The functionality can be added to sql using sql/src/sql/date.sql. Niels -- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
On Tue, Jun 17, 2008 at 07:06:24PM +0100, Sam Mason wrote:
On Fri, Jun 13, 2008 at 06:06:59PM +0200, Niels Nes wrote:
It seems you hit a bug in MonetDB/SQL. Could you file a bug report on the sourceforge bug tracker. This way we can keep track of this issue.
Which bug did this apply to? it looks from the follow-up message that you've fixed the other bug already as well. I've found some other things and I'll report them to the sf bug tracker.
The problem you have with loading should be fixed in the new (bound te be released) version (ie June-2008 release). Please check out that version, for example compile a version using the monetdb-install.sh script (see http://monetdb.cwi.nl/projects/monetdb//download.php?target=/projects/monetd...)
After suffering a disk crash I've finally got back to this. I've built a "nightly" stable version, generated the data again in the correct format and tried to perform the COPY in again. It still fails when running:
COPY INTO source_livestock FROM '/mnt/scratch/smason/source_livestock' DELIMITERS ','; !SQLException:importTable:failed to import table !ERROR: GDKload: cannot mmap(): name=16/1634, ext=theap !OS: Cannot allocate memory !ERROR: GDKload failed: name=16/1634, ext=theap
Would you expect me to get this still? and hence should I provide more details or should I be building something else?
What could help is to specify your number of records, ie COPY xx RECORDS INTO ... Where xx is the correct number of records ( or some more). This way monet knows directly howmuch memory is needed (ie less allocation and less fragmentation). Niels
Thanks, Sam
------------------------------------------------------------------------- Check out the new SourceForge.net Marketplace. It's the best place to buy or sell services for just about anything Open Source. http://sourceforge.net/services/buy/index.php _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
participants (2)
-
Niels Nes
-
Sam Mason