Re: Monetdb copy binary time varys very much!
Hi Stefan,
Just as you said ,we issue the next copy into as soon as the previous ended.
What would be different if i mimic the real-world scenario and respect these gaps between each two consecutive copy into's?
Thank!
Meng
------------------ Original ------------------
From: "Stefan Manegold"
Hi Stefan,
Due to all the data in RAM having to be written to disk when it's full, if the RAM grows larger and i don't change hard disk, the data volume become larger, will it make the COPY BINARY INTO much slower compared to the small RAM?
Contrarily, if the RAM get smaller, will it make the COPY BINARY INTO quicker?
I just want to lower the peak value of COPY, the 10+ seconds every about a hundred times.
For the most time COPY BINARY INTO took only less than 1 second, is it because they are written to RAM not disk when the RAM is not full? if so, can i write to disk more frequently before the disk is full so i could lower down the peak value of COPY.
Regrads, Meng
------------------ Original ------------------ From: "Stefan Manegold"
; Date: Fri, Jul 26, 2013 11:10 PM To: "Communication channel for MonetDB users" ; Subject: Re: Monetdb copy binary time varys very much!
Hi,
I'm not sure whether I understand correct what you are doing.
If you repeat the test 1000 times, does that mean that (1) 10000 times you re-create (or empty) the table and thus always copy into an empty table, or (2) 10000 times you copy into the same (growing) table, i.e., resulting in a table of 10,000 times 200,000 rows, i.e., 2,000,000,000 rows, i.e., ~16 GB per column, i.e., ~336 GB in total?
(Only) in case (1) the binary files to be imported are simply moved at zero costs. In case (2), only the first copy into (into the empty table) can simply move the files at zero costs; all subsequent copy into (into a no longe empty table) must copy the files (and delete them afterwards to mimic the same behavior as the initial copy into), which is of cause not "for free".
Also, as Martin explained, unless your machine has (significantly) more RAM than the ~336 GB of data you copy, the data needs to be written to disk in between, making some copy into's "slower" than others. There's not much to do about that other than (a) getting more RAM, or (b) improving I/O bandwidth by using either a high performance RAID or SSDs.
Stefan
. _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) | _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list .
the difference would be that then you would measure the time required for each copy into in a scenario that would realistically mimic your real-world scenario --- only do this experiment can tell, whether or not these times differ, and if so how, from the ones you measured in a less realistic scenario. Stefan integrity <357416268@qq.com> wrote:
Hi Stefan, Just as you said ,we issue the next copy into as soon as the previous ended.
What would be different if i mimic the real-world scenario and respect these gaps between each two consecutive copy into's?
Thank! Meng ------------------ Original ------------------ From: "Stefan Manegold"
; Date: Mon, Jul 29, 2013 02:45 PM To: "Communication channel for MonetDB users" ; Subject: Re: Monetdb copy binary time varys very much!
Hi Meng,
My analysis was mainly an (educated) guess of what (most probably) happens. To be sure, you need to profile your system in detail, e.g., monitor CPU and I/O activities.
Having said that, with less RAM, you might force the system to write the loaded data to disk instantly with each copy into, making each copy into slower than merely loading the data into disk, but the worst case might become better since each time less data has to we written than with more RAM. Again, this is my guess what happening; the behaviour you observe might be caused by something else; only a detailed profiling analysis can tell.
Also, if you eventually want to query your 300+ GB (or even more?) efficiently, you might want to have a suitable system, in particular sufficient RAM. (Would you mind sharing the hardware characteristics of your machine?).
Moreover, what was the time gap between two consecutive copy into's in your experiment, i.e., did you issue the next copy into as soon as the previous ended? Does this mimic the your "real-world" scenario realistically? Or would there be a time gap between to copy into's in reality? I recall you mentioned some 15 seconds? If so, you should rerun your experiment respecting these gaps between each two consecutive copy into's.
Best, Stefan
Hi Stefan,
Due to all the data in RAM having to be written to disk when it's full, if the RAM grows larger and i don't change hard disk, the data volume become larger, will it make the COPY BINARY INTO much slower compared to the small RAM?
Contrarily, if the RAM get smaller, will it make the COPY BINARY INTO quicker?
I just want to lower the peak value of COPY, the 10+ seconds every about a hundred times.
For the most time COPY BINARY INTO took only less than 1 second, is it because they are written to RAM not disk when the RAM is not full? if so, can i write to disk more frequently before the disk is full so i could lower down the peak value of COPY.
Regrads, Meng
------------------ Original ------------------ From: "Stefan Manegold"
; Date: Fri, Jul 26, 2013 11:10 PM To: "Communication channel for MonetDB users" ; Subject: Re: Monetdb copy binary time varys very much!
Hi,
I'm not sure whether I understand correct what you are doing.
If you repeat the test 1000 times, does that mean that (1) 10000 times you re-create (or empty) the table and thus always copy into an empty
----- Original Message ----- table, or
(2) 10000 times you copy into the same (growing) table, i.e., resulting in a table of 10,000 times 200,000 rows, i.e., 2,000,000,000 rows, i.e., ~16 GB per column, i.e., ~336 GB in total?
(Only) in case (1) the binary files to be imported are simply moved at zero costs. In case (2), only the first copy into (into the empty table) can simply move the files at zero costs; all subsequent copy into (into a no longe empty table) must copy the files (and delete them afterwards to mimic the same behavior as the initial copy into), which is of cause not "for free".
Also, as Martin explained, unless your machine has (significantly) more RAM than the ~336 GB of data you copy, the data needs to be written to disk in between, making some copy into's "slower" than others. There's not much to do about that other than (a) getting more RAM, or (b) improving I/O bandwidth by using either a high performance RAID or SSDs.
Stefan
. _______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list .
------------------------------------------------------------------------
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | Database Architectures (DA) | | www.CWI.nl/~manegold | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
participants (2)
-
integrity
-
Stefan Manegold