Re: DEC2016-SP2 and BINARY bulk load

9 Mar 2017

      Hi Lynn Carol,

COPY BINARY INTO now indeed copies the data,
and that can add some cost to it,
in particular in case your data is large
and your dbfarm is on the same I/O system,
or even the same (single) disk.

To better understand your situation,
could you possibly share how big your (binary) data
is that you load (i.e., how much diskspace do the 135 files
with 2 billions values each occupy),
and whether you I/O system is a single hard disk, or a RAID system,
or an SSD?
Also, how much RAM does you machine have?

Given that you used to exploit the old COPY BINARY INTO's "move feature",
I assume your data and your dbfarm are on the same filesystem.

(NB. in the old version, we could do the "move trick" only when bulk-loading
into an empty table; when loading more data into a non-empty table, we also
had to copy the data ...)

In case your machine has more than one filesystem, each on a different
hard disk / SSD / RAID, you coudl try to have your to-be-loaded data on one
and your dbfarm on the other, spreading the I/O load over both
(one mostly reading, the other mostly writing).

You can also inspect you systems I/O activity during the load, e.g.,
using iostat.

Best,
Stefan

----- On Mar 8, 2017, at 10:12 PM, Lynn Carol Johnson lcj34@cornell.edu wrote:
...
BTW, my 135 binary files together are 726G. I note the dec2016 release says:
BATattach now copies the input file instead of "stealing" it.
Could this be why it’s gone from 3 minutes to over 3 hours to load this data? My
files and monetdb are on the same machine – no network access. And “top” shows
nothing of significance running on the machine except mserver5.
I loved the speed with which I could add new columns to my table by dropping it,
re-create the table, COPY BINARY INTO table. Hoping you have ideas to get this
back, or an idea on what could be wrong.
Thanks - Lynn
From: users-list  on behalf of
Lynn Carol Johnson 
Reply-To: Communication channel for MonetDB users 
Date: Wednesday, March 8, 2017 at 3:19 PM
To: Communication channel for MonetDB users 
Subject: DEC2016-SP2 and BINARY bulk load
Hi all –
I have always used the COPY BINARY INTO … commands to load my 2.0 Billion row
genetic data into a monetdb table. With 135 columns, it has been blindingly
fast.
Last week I moved from the June2016-SP2 release to dec2016-SP2. My binary loads
are taking WAY longer. I killed one after 3 hours (via “call sys.stop(pid)” so
it could clean up properly). I then started the load again, thinking perhaps
the problem was related to the new columns I was adding.
I have since dropped the table and remade it using the same data and scripts
that worked in just over 3 minutes in February on the jun2016-SP2 load. It is
really chugging along – I’m up to 30 minutes and counting. I don’t have access
to the sql log files, but the Merovingian.log shows nothing.
I do notice that previously the binary files, once loaded, were removed from the
loading directly. This does not happen now. Were these files previously “moved”
and now they are copied?
Has anyone see this performance issue with Dec2016-SP2 COPY BINARY INTO ….
Commands?
Thanks - Lynn
_______________________________________________
users-list mailing list
users-list@monetdb.org
https://www.monetdb.org/mailman/listinfo/users-list
-- 
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |

Re: DEC2016-SP2 and BINARY bulk load

Stefan Manegold