[MonetDB-users] M5 SQL load data with "copy into" behaviour and 32 bits limitation

Hi all: I'm testing MonetDB5 to know if it's suitable for an OLAP project. I have to load a bit more than 11.5 million rows into one table, and I get the following results (I'm working on a Windows 2000 Professional 32 bits box with 1GB RAM and Monet v5.0.0_beta1_2): - when I try to load all the data in a single "copy into", mserver crashes at a certain point. - when I try to load the data in 100K chunks with several "copy into" into the same sql session, the server crashes at 7.8 million rows, and when I try to load the rest of the data it gets loaded, but the first 7.8 million rows get corrupt and the rest get ok. - when I load the data in 100K chunks in 5 different times (2.6 million rows each time) everything gets ok; after every 2.6 million I disconnect from sql client and shutdown the server, so when I restart the server and connect from sql client all the logs get persistent (thanks to Niels Nes to explain simply that logs get persistent after a server restart and when a sql session starts). So, initially I thought that MonetDB couldn't load that amount of data in a 32 bits box, but after all it can, because at last the data was successfully loaded. Is there a limitation with the "copy into"? Is there a way to achieve this process without restarting the server and sql sessions? I've read some threads about 32 bits limitation, but I don't know for sure how to calculate the maximum rows I can have into a single table. In my case, the table has a 147 bytes length using the following sizes in data types: smallint --> 2 bytes int --> 4 bytes char(2) --> 2 bytes the limitation would be 2GB / 147 bytes --> 14.608.732 rows ? I've assumed this after what I've read, but please let me know if I'm doing a wrong calculation. Thanks in advance, Franco -- View this message in context: http://www.nabble.com/M5-SQL-load-data-with-%22copy-into%22-behaviour-and-32... Sent from the monetdb-users mailing list archive at Nabble.com.

On Sat, May 19, 2007 at 03:06:15PM -0700, Franco Guidoli wrote:
Hi all: I'm testing MonetDB5 to know if it's suitable for an OLAP project. I have to load a bit more than 11.5 million rows into one table, and I get the following results (I'm working on a Windows 2000 Professional 32 bits box with 1GB RAM and Monet v5.0.0_beta1_2):
- when I try to load all the data in a single "copy into", mserver crashes at a certain point.
- when I try to load the data in 100K chunks with several "copy into" into the same sql session, the server crashes at 7.8 million rows, and when I try to load the rest of the data it gets loaded, but the first 7.8 million rows get corrupt and the rest get ok.
Crashes and corrupt data are bug we should look into. So if you could spare some time and send us the sema (create table statements). Maybe even with some data generator process. I would like to fix these bugs.
- when I load the data in 100K chunks in 5 different times (2.6 million rows each time) everything gets ok; after every 2.6 million I disconnect from sql client and shutdown the server, so when I restart the server and connect from sql client all the logs get persistent (thanks to Niels Nes to explain simply that logs get persistent after a server restart and when a sql session starts).
So, initially I thought that MonetDB couldn't load that amount of data in a 32 bits box, but after all it can, because at last the data was successfully loaded. Is there a limitation with the "copy into"? Is there a way to achieve this process without restarting the server and sql sessions?
I've read some threads about 32 bits limitation, but I don't know for sure how to calculate the maximum rows I can have into a single table.
In my case, the table has a 147 bytes length using the following sizes in data types:
smallint --> 2 bytes int --> 4 bytes char(2) --> 2 bytes
the limitation would be 2GB / 147 bytes --> 14.608.732 rows ? I've assumed this after what I've read, but please let me know if I'm doing a wrong calculation.
Yes every column should be adressable. On many 32 bits systems the usable adres space is 32bits (ie 4G). Because of signedness we loss one bit, ie giving us de 2GB. Niels
Thanks in advance,
Franco
-- View this message in context: http://www.nabble.com/M5-SQL-load-data-with-%22copy-into%22-behaviour-and-32... Sent from the monetdb-users mailing list archive at Nabble.com.
------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centre for Mathematics and Computer Science (CWI) Kruislaan 413, 1098 SJ Amsterdam, The Netherlands room C0.02, phone ++31 20 592-4098, fax ++31 20 592-4312 url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl

Niels Nes wrote:
On Sat, May 19, 2007 at 03:06:15PM -0700, Franco Guidoli wrote:
Hi all: I'm testing MonetDB5 to know if it's suitable for an OLAP project. I have to load a bit more than 11.5 million rows into one table, and I get the following results (I'm working on a Windows 2000 Professional 32 bits box with 1GB RAM and Monet v5.0.0_beta1_2):
- when I try to load all the data in a single "copy into", mserver crashes at a certain point.
- when I try to load the data in 100K chunks with several "copy into" into the same sql session, the server crashes at 7.8 million rows, and when I try to load the rest of the data it gets loaded, but the first 7.8 million rows get corrupt and the rest get ok.
Crashes and corrupt data are bug we should look into. So if you could spare some time and send us the sema (create table statements). Maybe even with some data generator process. I would like to fix these bugs.
- when I load the data in 100K chunks in 5 different times (2.6 million rows each time) everything gets ok; after every 2.6 million I disconnect from sql client and shutdown the server, so when I restart the server and connect from sql client all the logs get persistent (thanks to Niels Nes to explain simply that logs get persistent after a server restart and when a sql session starts).
So, initially I thought that MonetDB couldn't load that amount of data in a 32 bits box, but after all it can, because at last the data was successfully loaded. Is there a limitation with the "copy into"? Is there a way to achieve this process without restarting the server and sql sessions?
I've read some threads about 32 bits limitation, but I don't know for sure how to calculate the maximum rows I can have into a single table.
In my case, the table has a 147 bytes length using the following sizes in data types:
smallint --> 2 bytes int --> 4 bytes char(2) --> 2 bytes
the limitation would be 2GB / 147 bytes --> 14.608.732 rows ? I've assumed this after what I've read, but please let me know if I'm doing a wrong calculation. Yes every column should be adressable. On many 32 bits systems the usable adres space is 32bits (ie 4G). Because of signedness we loss one bit, ie giving us de 2GB.
11.5M tuples at 147 gives 1.69Gb, so yes it barely fits. There are a few more cost factors that play a role. For example, the system may decide to build a hash structure to speed up key constraint enforcement, which is another 11.5M*8 bytes (90Mb). Another potential source of problems is that if a table grows, the system may call for a larger storage space, leaving the old portion un-used. (MonetDB requires a contiguous memory area for each table.) This may lead to memory fragmentation and effectively reduces your play ground. Your remarks all hint in the direction that you are indeed at the edge of a single bulk load on a 32bit machine. Just to illustrate that a lot is happening behind the scene consider the following debug session of a simple COPY INTO command. Niels can explain better, but I guess that the system needs at least room for another complete copy. Unless the sql.columnBind has a side effect that it immediately removes the BAT in which it dumped the table. BUT, a system should not crash if we can prevent it easily.
debug copy into tables from 'xx' ; # mdb.start() mdb>n # user.s1_0() mdb> # _4 := streams.openRead(_3="xx") mdb>o user.s1_0[ 0] actions=0 time=10 usec optimizer.inline user.s1_0[ 1] actions=0 time=11 usec optimizer.remap user.s1_0[ 2] actions=1 time=134 usec optimizer.evaluate user.s1_0[ 3] actions=1 time=26 usec optimizer.costModel user.s1_0[ 4] actions=0 time=5 usec optimizer.coercion user.s1_0[ 5] actions=0 time=35 usec optimizer.emptySet user.s1_0[ 6] actions=0 time=5 usec optimizer.accessmode user.s1_0[ 7] actions=0 time=23 usec optimizer.aliases user.s1_0[ 8] actions=0 time=31 usec optimizer.commonTerms user.s1_0[ 9] actions=0 time=26 usec optimizer.accumulators user.s1_0[10] actions=0 time=6 usec optimizer.joinPath user.s1_0[11] actions=29 time=31 usec optimizer.deadcode user.s1_0[12] actions=40 time=293 usec optimizer.reduce user.s1_0[13] actions=9 time=45 usec optimizer.garbageCollector user.s1_0[14] actions=0 time=9 usec optimizer.multiplex user.s1_0[15] actions=0 time=9 usec optimizer.multiplex mdb>l [15] function user.s1_0():void; _4 := streams.openRead("xx"); _5 := bstream.create(_4,1048576); _7 := sql.importTable(_5,"tables","|","\n",-1); bstream.destroy(_5); streams.close(_4); _13:bat[:oid,:int] := sql.columnBind(_7,"id"); sql.append("sys","tables","id",3,_13); _18:bat[:oid,:str] := sql.columnBind(_7,"name"); sql.append("sys","tables","name",3,_18); _18:bat[:oid,:str] := nil; _21:bat[:oid,:int] := sql.columnBind(_7,"schema_id"); sql.append("sys","tables","schema_id",3,_21); _21:bat[:oid,:int] := nil; _24:bat[:oid,:str] := sql.columnBind(_7,"query"); sql.append("sys","tables","query",3,_24); _24:bat[:oid,:str] := nil; _27:bat[:oid,:sht] := sql.columnBind(_7,"type"); sql.append("sys","tables","type",3,_27); _27:bat[:oid,:sht] := nil; _30:bat[:oid,:bit] := sql.columnBind(_7,"system"); sql.append("sys","tables","system",3,_30); _30:bat[:oid,:bit] := nil; _33:bat[:oid,:sht] := sql.columnBind(_7,"commit_action"); sql.append("sys","tables","commit_action",3,_33); _33:bat[:oid,:sht] := nil; _36:bat[:oid,:sht] := sql.columnBind(_7,"temporary"); _7 := nil; sql.append("sys","tables","temporary",3,_36); _36:bat[:oid,:sht] := nil; _38 := aggr.count(_13); _13:bat[:oid,:int] := nil; sql.affectedRows(_38,""); end s1_0;
participants (3)
-
Franco Guidoli
-
Martin Kersten
-
Niels Nes