Monetdb copy binary time varys very much!

Hi dear all, I did a test of insert 2,000,000,000 rows into MonetDB with "COPY BINARY INTO FROM binary_file", in this test 1. i generate 21 files, each file represents a table column and has 200000 rows. I created them with rand() number and use fwrite() binary write method. the table creation sql command is : create table tmatch(id bigint,a float,b float,c float, d float, e float, f float,g float,h float, i float,j float, k float,l float,m float, n float,o float, p float,q float,r float, s float,t float); the table has 21 columns,each column has 8 bytes, so each column file is c1=200000*21*8 Byte= 268800000 Byte=3.2MB 2. I use "COPY BINARY INTO FROM above_binary" to load each binary file into tmatch. The test was run 10000 times repeatedly. the average time of 10000 times is only 1.0635589558955727, but when at 9043th time, it cost 227m36.136 ,some times later the time value increase to a large number, is it because of flush data from cache into database after the cache is full? The problem is since we have to control the total process time within 15 seconds , I am wondering if you can help me reduce the maximum time to a lower point? Thanks very much! Meng

Hi, You should plot the times of each of your runs to see the trend. This single run may be an outlier, which could come from anything in your system environment. Even a seemingly harmless concurrent program using significant memory could compete with MonetDB. And indeed, at some point you will see disk IO. regards, Martin On 7/26/13 10:40 AM, integrity wrote:
Hi dear all,
I did a test of insert 2,000,000,000 rows into MonetDB with "COPY BINARY INTO FROM binary_file", in this test 1. i generate 21 files, each file represents a table column and has 200000 rows. I created them with rand() number and use fwrite() binary write method.
the table creation sql command is : *create table tmatch(id bigint,a float,b float,c float, d float, e float,* *f float,g float,h float, i float,j float,* *k float,l float,m float, n float,o float,* *p float,q float,r float, s float,t float);* the table has 21 columns,each column has 8 bytes, so each column file is c1=200000*21*8 Byte= 268800000 Byte=3.2MB 2. I use "COPY BINARY INTO FROM above_binary" to load each binary file into tmatch. The test was run 10000 times repeatedly.
the average time of 10000 times is only 1.0635589558955727, but when at 9043th time, it cost 227m36.136 ,some times later the time value increase to a large number, is it because of flush data from cache into database after the cache is full? The problem is since we have to control the total process time within 15 seconds , I am wondering if you can help me reduce the maximum time to a lower point?
Thanks very much! Meng
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list

Hi, I'm not sure whether I understand correct what you are doing. If you repeat the test 1000 times, does that mean that (1) 10000 times you re-create (or empty) the table and thus always copy into an empty table, or (2) 10000 times you copy into the same (growing) table, i.e., resulting in a table of 10,000 times 200,000 rows, i.e., 2,000,000,000 rows, i.e., ~16 GB per column, i.e., ~336 GB in total? (Only) in case (1) the binary files to be imported are simply moved at zero costs. In case (2), only the first copy into (into the empty table) can simply move the files at zero costs; all subsequent copy into (into a no longe empty table) must copy the files (and delete them afterwards to mimic the same behavior as the initial copy into), which is of cause not "for free". Also, as Martin explained, unless your machine has (significantly) more RAM than the ~336 GB of data you copy, the data needs to be written to disk in between, making some copy into's "slower" than others. There's not much to do about that other than (a) getting more RAM, or (b) improving I/O bandwidth by using either a high performance RAID or SSDs. Stefan On Fri, Jul 26, 2013 at 04:40:37PM +0800, integrity wrote:
Hi dear all,
I did a test of insert 2,000,000,000 rows into MonetDB with "COPY BINARY INTO FROM binary_file", in this test 1. i generate 21 files, each file represents a table column and has 200000 rows. I created them with rand() number and use fwrite() binary write method.
the table creation sql command is : create table tmatch(id bigint,a float,b float,c float, d float, e float, f float,g float,h float, i float,j float, k float,l float,m float, n float,o float, p float,q float,r float, s float,t float);
the table has 21 columns,each column has 8 bytes, so each column file is c1=200000*21*8 Byte= 268800000 Byte=3.2MB 2. I use "COPY BINARY INTO FROM above_binary" to load each binary file into tmatch. The test was run 10000 times repeatedly.
the average time of 10000 times is only 1.0635589558955727, but when at 9043th time, it cost 227m36.136 ,some times later the time value increase to a large number, is it because of flush data from cache into database after the cache is full? The problem is since we have to control the total process time within 15 seconds , I am wondering if you can help me reduce the maximum time to a lower point?
Thanks very much! Meng
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

Hai Meng, Since you repeat the COPY INTO statement 10000 times, I wonder how exactly did you it and measure the time. For instance, did you first create _all_ necessary files (i.e., 10000 x 21 files) and execute 10000 COPY INTO consecutively? Or did you interchange create 21 files and execute COPY INTO? In the second case, are the times in your plots only the execution time of COPY INTO, or do they also include the time to create the files? Again, in the second case, have you also measured the time of creating the files? You have ever mentioned that the total available execution time in your application is 15 seconds. How many data do you need to process within this time? Regards, Jennie On Jul 26, 2013, at 10:40, integrity wrote:
Hi dear all,
I did a test of insert 2,000,000,000 rows into MonetDB with "COPY BINARY INTO FROM binary_file", in this test 1. i generate 21 files, each file represents a table column and has 200000 rows. I created them with rand() number and use fwrite() binary write method.
the table creation sql command is : create table tmatch(id bigint,a float,b float,c float, d float, e float, f float,g float,h float, i float,j float, k float,l float,m float, n float,o float, p float,q float,r float, s float,t float); the table has 21 columns,each column has 8 bytes, so each column file is c1=200000*21*8 Byte= 268800000 Byte=3.2MB 2. I use "COPY BINARY INTO FROM above_binary" to load each binary file into tmatch. The test was run 10000 times repeatedly.
the average time of 10000 times is only 1.0635589558955727, but when at 9043th time, it cost 227m36.136 ,some times later the time value increase to a large number, is it because of flush data from cache into database after the cache is full? The problem is since we have to control the total process time within 15 seconds , I am wondering if you can help me reduce the maximum time to a lower point?
Thanks very much! Meng
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
participants (4)
-
integrity
-
Martin Kersten
-
Stefan Manegold
-
Ying Zhang