A good strategy for loading bulk data into monetdb
Hi all, I'm trying to load large amounts of data into monetdb (migrating from mysql). Since I'm using Jul2015-SP4 release, I can't create local files and I can't use external tools, my approach is as following: For each table, I partition the data based on the day, and then: 10 create one table in monetdb for each day, and a merge table that takes all these tables 20 fetch data from mysql (mysql -B -N -C ...) 30 _pipe_ data to mclient (that is invoked with mclient ... -c "COPY 2000000 RECORDS FROM STDIN INTO ..." -) 40 load data into corresponding table (the right day) 50 sleep some seconds 60 goto 20 Each daily table is around 6'000'000 rows, but I have some that have ~150'000'000 rows. However, I noticed that when I'm doing this, I quickly run out of disk space, as monetdb is creating really big logs (in /data/monetdb/some_schema/sql_logs). Size of that folder is usually 600KB, but once the process of migration starts, it goes to several GBs. Not only that, but /data/monetdb/some_schema/bat also grows a lot. The surprising part, and the motivation for this emails, is how well monetdb compresses the data. If I migrate only one day worth of data, and then restart monetdbd (once or twice) and run 'analyze some_schema', sql_logs size drops to normal levels, and I get a lot of reclaimed space (even in bat directory). I tried to put a 'sleep(120)' after each partition and running 'analyze some_schema' after each iteration, but couldn't get the same behaviour (and it is also quite slow). Is there a way to trigger the compression and reduce disk space overhead without restarting mserver, given the above constraints? Thanks, Nikola
Hi To avoid the logs you should use the LOCKED version. see https://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto On 25/04/16 11:45, Knezevic Nikola wrote:
Hi all,
I'm trying to load large amounts of data into monetdb (migrating from mysql). Since I'm using Jul2015-SP4 release, I can't create local files and I can't use external tools, my approach is as following: For each table, I partition the data based on the day, and then: 10 create one table in monetdb for each day, and a merge table that takes all these tables 20 fetch data from mysql (mysql -B -N -C ...) 30 _pipe_ data to mclient (that is invoked with mclient ... -c "COPY 2000000 RECORDS FROM STDIN INTO ..." -) 40 load data into corresponding table (the right day) 50 sleep some seconds 60 goto 20
Each daily table is around 6'000'000 rows, but I have some that have ~150'000'000 rows.
However, I noticed that when I'm doing this, I quickly run out of disk space, as monetdb is creating really big logs (in /data/monetdb/some_schema/sql_logs). Size of that folder is usually 600KB, but once the process of migration starts, it goes to several GBs. Not only that, but /data/monetdb/some_schema/bat also grows a lot.
The surprising part, and the motivation for this emails, is how well monetdb compresses the data. If I migrate only one day worth of data, and then restart monetdbd (once or twice) and run 'analyze some_schema', sql_logs size drops to normal levels, and I get a lot of reclaimed space (even in bat directory). I tried to put a 'sleep(120)' after each partition and running 'analyze some_schema' after each iteration, but couldn't get the same behaviour (and it is also quite slow). Is there a way to trigger the compression and reduce disk space overhead without restarting mserver, given the above constraints?
Thanks, Nikola _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi Martin, if I try using "LOCKED", I'm greeted with: $ mysql -C -N -B -u reader -pxxx some_schema -h mhost -e "SELECT * FROM messages WHERE mDate = '2016-03-30' LIMIT 10;"|mclient -d some_schema -s "COPY INTO \"messages__2016-03-30\" FROM STDIN USING DELIMITERS '\t','\n' LOCKED" - COPY INTO: insufficient privileges: COPY INTO from .. LOCKED requires database administrator rights Ok, that means that my user doesn't have db admin rights. Since I couldn't find what is the admin role in the documentation, StackOverflow had the following: http://stackoverflow.com/questions/32165339/grant-privileges-to-users-on-mon... So, I tried: [root@wks038 ~]# mclient -d some_schema -u monetdb password: Welcome to mclient, the MonetDB/SQL interactive terminal (Jul2015-SP4) Database: MonetDB v11.21.19 (Jul2015-SP4), 'mapi:monetdb://wks038.intra:50000/some_schema' Type \q to quit, \? for a list of available commands auto commit mode: on sql>grant monetdb to writer; operation successful (2.569ms) sql>\q [root@wks038 ~]# mclient -d some_schema -u writer password: Welcome to mclient, the MonetDB/SQL interactive terminal (Jul2015-SP4) Database: MonetDB v11.21.19 (Jul2015-SP4), 'mapi:monetdb://wks038.intra:50000/some_schema' Type \q to quit, \? for a list of available commands auto commit mode: on sql>set role monetdb; Role (monetdb) missing Then I decided to use the db admin to populate the data, but that didn't go well either: $ mysql -C -N -B -u reader -pxxx some_schema -h mhost -e "SELECT * FROM messages WHERE mDate = '2016-03-30' LIMIT 10;"|mclient -d some_schema -u monetdb -s "COPY INTO \"messages__2016-03-30\" FROM STDIN USING DELIMITERS '\t','\n' LOCKED" - COPY INTO: no such table 'messages__2016-03-30' syntax error, unexpected sqlINT in: "1971276831" BTW, I thought that LOCKED is only applicable for COPY FROM 'file', not for STDIN. Any ideas how to resolve this issues? Thanks, Nikola
Gesendet: Montag, 25. April 2016 um 11:54 Uhr Von: "Martin Kersten"
An: users-list@monetdb.org Betreff: Re: A good strategy for loading bulk data into monetdb Hi
To avoid the logs you should use the LOCKED version. see https://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto
On 25/04/16 11:45, Knezevic Nikola wrote:
Hi all,
I'm trying to load large amounts of data into monetdb (migrating from mysql). Since I'm using Jul2015-SP4 release, I can't create local files and I can't use external tools, my approach is as following: For each table, I partition the data based on the day, and then: 10 create one table in monetdb for each day, and a merge table that takes all these tables 20 fetch data from mysql (mysql -B -N -C ...) 30 _pipe_ data to mclient (that is invoked with mclient ... -c "COPY 2000000 RECORDS FROM STDIN INTO ..." -) 40 load data into corresponding table (the right day) 50 sleep some seconds 60 goto 20
Each daily table is around 6'000'000 rows, but I have some that have ~150'000'000 rows.
However, I noticed that when I'm doing this, I quickly run out of disk space, as monetdb is creating really big logs (in /data/monetdb/some_schema/sql_logs). Size of that folder is usually 600KB, but once the process of migration starts, it goes to several GBs. Not only that, but /data/monetdb/some_schema/bat also grows a lot.
The surprising part, and the motivation for this emails, is how well monetdb compresses the data. If I migrate only one day worth of data, and then restart monetdbd (once or twice) and run 'analyze some_schema', sql_logs size drops to normal levels, and I get a lot of reclaimed space (even in bat directory). I tried to put a 'sleep(120)' after each partition and running 'analyze some_schema' after each iteration, but couldn't get the same behaviour (and it is also quite slow). Is there a way to trigger the compression and reduce disk space overhead without restarting mserver, given the above constraints?
Thanks, Nikola _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi,
Take a look at
https://www.monetdb.org/Documentation/SQLreference/Permissions
You'll need to create a user and have the ADMIN OPTION
Regards,
Brian Hood
On Mon, Apr 25, 2016 at 12:22 PM, Knezevic Nikola
Hi Martin,
if I try using "LOCKED", I'm greeted with:
$ mysql -C -N -B -u reader -pxxx some_schema -h mhost -e "SELECT * FROM messages WHERE mDate = '2016-03-30' LIMIT 10;"|mclient -d some_schema -s "COPY INTO \"messages__2016-03-30\" FROM STDIN USING DELIMITERS '\t','\n' LOCKED" - COPY INTO: insufficient privileges: COPY INTO from .. LOCKED requires database administrator rights
Ok, that means that my user doesn't have db admin rights. Since I couldn't find what is the admin role in the documentation, StackOverflow had the following:
http://stackoverflow.com/questions/32165339/grant-privileges-to-users-on-mon...
So, I tried: [root@wks038 ~]# mclient -d some_schema -u monetdb password: Welcome to mclient, the MonetDB/SQL interactive terminal (Jul2015-SP4) Database: MonetDB v11.21.19 (Jul2015-SP4), 'mapi:monetdb://wks038.intra:50000/some_schema' Type \q to quit, \? for a list of available commands auto commit mode: on sql>grant monetdb to writer; operation successful (2.569ms) sql>\q [root@wks038 ~]# mclient -d some_schema -u writer password: Welcome to mclient, the MonetDB/SQL interactive terminal (Jul2015-SP4) Database: MonetDB v11.21.19 (Jul2015-SP4), 'mapi:monetdb://wks038.intra:50000/some_schema' Type \q to quit, \? for a list of available commands auto commit mode: on sql>set role monetdb; Role (monetdb) missing
Then I decided to use the db admin to populate the data, but that didn't go well either:
$ mysql -C -N -B -u reader -pxxx some_schema -h mhost -e "SELECT * FROM messages WHERE mDate = '2016-03-30' LIMIT 10;"|mclient -d some_schema -u monetdb -s "COPY INTO \"messages__2016-03-30\" FROM STDIN USING DELIMITERS '\t','\n' LOCKED" - COPY INTO: no such table 'messages__2016-03-30' syntax error, unexpected sqlINT in: "1971276831"
BTW, I thought that LOCKED is only applicable for COPY FROM 'file', not for STDIN.
Any ideas how to resolve this issues?
Thanks, Nikola
Gesendet: Montag, 25. April 2016 um 11:54 Uhr Von: "Martin Kersten"
An: users-list@monetdb.org Betreff: Re: A good strategy for loading bulk data into monetdb Hi
To avoid the logs you should use the LOCKED version. see https://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto
Hi all,
I'm trying to load large amounts of data into monetdb (migrating from mysql). Since I'm using Jul2015-SP4 release, I can't create local files and I can't use external tools, my approach is as following: For each table, I partition the data based on the day, and then: 10 create one table in monetdb for each day, and a merge table that takes all these tables 20 fetch data from mysql (mysql -B -N -C ...) 30 _pipe_ data to mclient (that is invoked with mclient ... -c "COPY 2000000 RECORDS FROM STDIN INTO ..." -) 40 load data into corresponding table (the right day) 50 sleep some seconds 60 goto 20
Each daily table is around 6'000'000 rows, but I have some that have ~150'000'000 rows.
However, I noticed that when I'm doing this, I quickly run out of disk space, as monetdb is creating really big logs (in /data/monetdb/some_schema/sql_logs). Size of that folder is usually 600KB, but once the process of migration starts, it goes to several GBs. Not only
On 25/04/16 11:45, Knezevic Nikola wrote: that, but /data/monetdb/some_schema/bat also grows a lot.
The surprising part, and the motivation for this emails, is how well
monetdb compresses the data. If I migrate only one day worth of data, and then restart monetdbd (once or twice) and run 'analyze some_schema', sql_logs size drops to normal levels, and I get a lot of reclaimed space (even in bat directory). I tried to put a 'sleep(120)' after each partition and running 'analyze some_schema' after each iteration, but couldn't get the same behaviour (and it is also quite slow). Is there a way to trigger the compression and reduce disk space overhead without restarting mserver, given the above constraints?
Thanks, Nikola _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Brian Hood
-
Knezevic Nikola
-
Martin Kersten