monetdb is slow to backup - users-list - monetdb.org

newer
High amount of VIRT and high...

monetdb is slow to backup

older
GDKerror when using COPY INTO sql...

Pierre-Adrien Coustillas

24 Jul 2014 24 Jul '14

12:25 p.m.

Hello, I'm used to save monetdb using msqldump : msqldump --database=base --table=table > backup.sql This command takes 30 minutes to backup 6 GB , but 4 minutes to restore. How can I increase backup speed ? Thanks Pierre -- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35

Attachments:

attachment.html (text/html — 3.4 KB)

Reply

Sign in to reply online Use email software

Show replies by date

Martin Kersten

24 Jul 24 Jul

1:16 p.m.

Hello, Perhaps this is an option https://www.monetdb.org/Documentation/UserGuide/FastDumpRestore regards, Martin On 07/24/2014 02:25 PM, Pierre-Adrien Coustillas wrote:

Hello,

I'm used to save monetdb using msqldump :

msqldump --database=base --table=table > backup.sql

This command takes 30 minutes to backup 6 GB , but 4 minutes to restore.

How can I increase backup speed ?

Thanks

Pierre

-- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Reply

Sign in to reply online Use email software

Pierre-Adrien Coustillas

3:27 p.m.

Hello, Thank you very much for your reply. I can't use this method because it's a cold full backup and it's not possible to restore only one table. Pierre -- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35 ----- Mail original ----- De: "Martin Kersten" À: users-list@monetdb.org Envoyé: Jeudi 24 Juillet 2014 15:16:02 Objet: Re: monetdb is slow to backup Hello, Perhaps this is an option https://www.monetdb.org/Documentation/UserGuide/FastDumpRestore regards, Martin On 07/24/2014 02:25 PM, Pierre-Adrien Coustillas wrote: Hello, I'm used to save monetdb using msqldump : msqldump --database=base --table=table > backup.sql This command takes 30 minutes to backup 6 GB , but 4 minutes to restore. How can I increase backup speed ? Thanks Pierre -- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35 _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Reply

Sign in to reply online Use email software

Radovan Bičiště

3:10 p.m.

Hello, Have you deleted significant amount of rows before backup? Delete only marks rows as deleted and during backup it scans the whole table including deleted rows. I would check table sizes using sys.storage view. What is the size of the database you are backing up on the disk? It should be roughly the same as the size of the backup. Radovan On 07/24/2014 02:25 PM, Pierre-Adrien Coustillas wrote:

Hello,

I'm used to save monetdb using msqldump :

msqldump --database=base --table=table > backup.sql

This command takes 30 minutes to backup 6 GB , but 4 minutes to restore.

How can I increase backup speed ?

Thanks

Pierre

-- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Reply

Sign in to reply online Use email software

Pierre-Adrien Coustillas

25 Jul 25 Jul

9:17 a.m.

Hi, Here are the queries I use to load and backup the base with : sql>COPY INTO data_100 FROM '/root/data_maria100_utf.txt' DELIMITERS '\t' ; 103704336 affected rows (3m 10s) The imported file : ls -lh /root/data_maria100_utf.txt -rw-r--r-- 1 root root 6,4G juin 10 18:22 /root/data_maria100_utf.txt The backup command : date ; msqldump --database=test --table=data_100 > data_100.sql ; date jeudi 24 juillet 2014, 17:40:51 (UTC+0200) jeudi 24 juillet 2014, 18:09:29 (UTC+0200) ls -lh data_100.sql -rw-r--r-- 1 root users 7,0G juil. 24 18:09 data_100.sql If I use COPY query INTO file to backup instead of msqldump : sql>COPY SELECT * FROM data_100 INTO '/home/pcoustillas/dump/data_100_coyp.csv' USING DELIMITERS '\t','\n','"'; 103704336 affected rows (17m 53s) It divide the necessary time by 2 , but i don't know if it is a good practice . For reference here is the the output of sys.storage for this table : sql>SELECT * FROM sys.storage WHERE "table"='data_100'; +--------+----------+----------------------+---------+----------+-----------+-----------+------------+------------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +========+==========+======================+=========+==========+===========+===========+============+============+=========+========+ | sys | data_100 | id_visite | int | 62/6262 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | id_visiteur_unique | varchar | 60/6071 | 103704336 | 19 | 414817344 | 1286078464 | 0 | false | | sys | data_100 | visiteur_identifie | int | 63/6316 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | site_niveau1 | int | 62/6270 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | site_niveau2 | varchar | 07/764 | 103704336 | 12 | 207408672 | 10240 | 0 | false | | sys | data_100 | chapitre_niv1 | int | 60/6016 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | chapitre_niv2 | int | 56/5605 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | indicateur_global | varchar | 53/5335 | 103704336 | 1 | 207408672 | 20480 | 0 | false | | sys | data_100 | id_cmp_autopromotion | int | 17/1700 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | nb_pages_vues | int | 56/5652 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | temps_passe | int | 62/6264 | 103704336 | 4 | 414817344 | 0 | 0 | false | +--------+----------+----------------------+---------+----------+-----------+-----------+------------+------------+---------+--------+ Pierre -- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35 ----- Mail original ----- De: "Radovan Bičiště" À: users-list@monetdb.org Envoyé: Jeudi 24 Juillet 2014 17:10:25 Objet: Re: monetdb is slow to backup Hello, Have you deleted significant amount of rows before backup? Delete only marks rows as deleted and during backup it scans the whole table including deleted rows. I would check table sizes using sys.storage view. What is the size of the database you are backing up on the disk? It should be roughly the same as the size of the backup. Radovan On 07/24/2014 02:25 PM, Pierre-Adrien Coustillas wrote:

Hello,

I'm used to save monetdb using msqldump :

msqldump --database=base --table=table > backup.sql

This command takes 30 minutes to backup 6 GB , but 4 minutes to restore.

How can I increase backup speed ?

Thanks

Pierre

-- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Reply

Sign in to reply online Use email software

Radovan Bičiště

29 Jul 29 Jul

12:33 p.m.

Hi, The only thing that comes to my mind is the cache. Do you test in the exact order as below? It could be that msqldump loads the table into the memory. That is what takes the time. Right after when you do COPY command the table is already in the memory and dumping is much faster. You can try following: CASE 1 1. load the table using COPY 2. dump the table using COPY 3. check the time CASE 2 1. load the table using COPY 2. dump the table using msqldump 3. check the time CASE 3 1. load the table using COPY 2. do SELECT * FROM table LIMIT 1000 or something else to force fullscan on the table 3. dump the table using msqldump 4. check the time Regards, Radovan On 07/25/2014 11:17 AM, Pierre-Adrien Coustillas wrote:

Hi,

Here are the queries I use to load and backup the base with :

sql>COPY INTO data_100 FROM '/root/data_maria100_utf.txt' DELIMITERS '\t' ; 103704336 affected rows (3m 10s)

The imported file : ls -lh /root/data_maria100_utf.txt -rw-r--r-- 1 root root 6,4G juin 10 18:22 /root/data_maria100_utf.txt

The backup command : date ; msqldump --database=test --table=data_100 > data_100.sql ; date jeudi 24 juillet 2014, 17:40:51 (UTC+0200) jeudi 24 juillet 2014, 18:09:29 (UTC+0200)

ls -lh data_100.sql -rw-r--r-- 1 root users 7,0G juil. 24 18:09 data_100.sql

If I use COPY query INTO file to backup instead of msqldump : sql>COPY SELECT * FROM data_100 INTO '/home/pcoustillas/dump/data_100_coyp.csv' USING DELIMITERS '\t','\n','"'; 103704336 affected rows (17m 53s)

It divide the necessary timeby 2 , but i don't know if it is a good practice .

For reference here is the the output of sys.storage for this table : sql>SELECT * FROM sys.storage WHERE "table"='data_100'; +--------+----------+----------------------+---------+----------+-----------+-----------+------------+------------+---------+--------+ | schema | table | column | type | location | count | typewidth | columnsize | heapsize | indices | sorted | +========+==========+======================+=========+==========+===========+===========+============+============+=========+========+ | sys | data_100 | id_visite | int | 62/6262 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | id_visiteur_unique | varchar | 60/6071 | 103704336 | 19 | 414817344 | 1286078464 | 0 | false | | sys | data_100 | visiteur_identifie | int | 63/6316 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | site_niveau1 | int | 62/6270 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | site_niveau2 | varchar | 07/764 | 103704336 | 12 | 207408672 | 10240 | 0 | false | | sys | data_100 | chapitre_niv1 | int | 60/6016 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | chapitre_niv2 | int | 56/5605 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | indicateur_global | varchar | 53/5335 | 103704336 | 1 | 207408672 | 20480 | 0 | false | | sys | data_100 | id_cmp_autopromotion | int | 17/1700 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | nb_pages_vues | int | 56/5652 | 103704336 | 4 | 414817344 | 0 | 0 | false | | sys | data_100 | temps_passe | int | 62/6264 | 103704336 | 4 | 414817344 | 0 | 0 | false | +--------+----------+----------------------+---------+----------+-----------+-----------+------------+------------+---------+--------+

Pierre

-- 1G6 52 route de bischwiller 67300 Schiltigheim Société de Services et de Formations en Logiciels Libres http://1g6.biz Tél : 06 64 63 70 35

------------------------------------------------------------------------ *De: *"Radovan Bičiště" *À: *users-list@monetdb.org *Envoyé: *Jeudi 24 Juillet 2014 17:10:25 *Objet: *Re: monetdb is slow to backup

Hello, Have you deleted significant amount of rows before backup? Delete only marks rows as deleted and during backup it scans the whole table including deleted rows. I would check table sizes using sys.storage view. What is the size of the database you are backing up on the disk? It should be roughly the same as the size of the backup.

Radovan

Reply

Sign in to reply online Use email software

3876

Age (days ago)

3881

Last active (days ago)

Download

5 comments

3 participants

tags

participants (3)

Martin Kersten
Pierre-Adrien Coustillas
Radovan Bičiště