Get stuck when trying to copy large amounts of data
Hi, I'm trying to copy large amounts of data into MonetDB, but the processes would get stuck in the middle. The following is what I did. I have 6 tables, all merge tables partitioned by range of time. And there are hundreds of data files, each has almost 10 million rows and is of size about 3.5 GB. For each table, I use the following command to copy data: `cat file.txt | xargs -n1 -P1 -I {} sh -c "mclient -p 50000 -s \"copy 10000000 records into tbx from stdin best effort\" - < '{}'" ` where file.txt contains paths of files to be loaded. So 6 mclient connections are created to copy data in parallel. But mclient processes would get stuck after loading more than 200 million rows each table. Then I have to kill the processes and restart the database. The environment is CentOS 7.3, 512 GB RAM. MonetDB is compiled using the latest master branch. So does anyone know why this happened? Does MonetDB have a limit on the size of data one instance can hold? Or if there is something wrong with my practice? Thanks in advance! Yinjie Lin
Hi, did you try to load only one table/file at a time rather than 6 files/tables concurrently? IN case your files reside on the same machine as the MonetDB server, please also try bulk load (copy into) directly from the files rather than via STDIN of mclient; cf., https://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto Best, Stefan ----- On Apr 2, 2019, at 8:53 AM, Yinjie Lin exialin37@gmail.com wrote:
Hi,
I'm trying to copy large amounts of data into MonetDB , but the processes would get stuck in the middle.
The following is what I did. I have 6 tables, all merge tables partitioned by range of time. And there are hundreds of data files, each has almost 10 million rows and is of size about 3.5 GB. For each table, I use the following command to copy data:
`cat file. txt | xargs -n1 -P1 -I {} sh -c " mclient -p 50000 -s \"copy 10000000 records into tbx from stdin best effort\" - < '{}'" `
where file. txt contains paths of files to be loaded.
So 6 mclient connections are created to copy data in parallel. But mclient processes would get stuck after loading more than 200 million rows each table. Then I have to kill the processes and restart the database.
The environment is CentOS 7.3, 512 GB RAM. MonetDB is compiled using the latest master branch. So does anyone know why this happened? Does MonetDB have a limit on the size of data one instance can hold? Or if there is something wrong with my practice?
Thanks in advance! Yinjie Lin
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- | Stefan.Manegold@CWI.nl | DB Architectures (DA) | | www.CWI.nl/~manegold/ | Science Park 123 (L321) | | +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |
participants (2)
-
Stefan Manegold
-
Yinjie Lin