Slow DB performance after data insertion
Hi folks, in the following scenario I see very slow DB performance after data insertion. W/ slow performance I mean that „mclient -u XXX -d XXX" blocks for roughly 10min bevore it let me access the DB. Scenario: I parse a pcap file (40MB, 695 connection) in python w/ dpkt. For every new connection found in the pcap file I create a new entry in my connection table and two new empty packet tables (for each direction one table). The actual packet data is not directly inserted into the packet tables, but written into files to push the data later on w/ COPY BINARY INTO into the packet tables. Autocommit is off. I commit after the I read the pcap completely and before I start the COPY BINARY INTO. After the COPY BINARY INTO I’ve another commit. That’s it. The number of transactions is 3*695 (2*695 table creations for the packet tables and 695 entries into the connection table). The script itself terminates in roughly 30secs, but the DB is blocked +- 10mins. Why? What is going on internally in the DB? Can I speedup the performance? Cheers Alex
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 On 07/08/15 15:02, Zimmermann, Alexander wrote:
Hi folks,
in the following scenario I see very slow DB performance after data insertion. W/ slow performance I mean that „mclient -u XXX -d XXX" blocks for roughly 10min bevore it let me access the DB.
Scenario: I parse a pcap file (40MB, 695 connection) in python w/ dpkt. For every new connection found in the pcap file I create a new entry in my connection table and two new empty packet tables (for each direction one table).
The actual packet data is not directly inserted into the packet tables, but written into files to push the data later on w/ COPY BINARY INTO into the packet tables.
Autocommit is off. I commit after the I read the pcap completely and before I start the COPY BINARY INTO. After the COPY BINARY INTO I’ve another commit. That’s it.
The number of transactions is 3*695 (2*695 table creations for the packet tables and 695 entries into the connection table).
The script itself terminates in roughly 30secs, but the DB is blocked +- 10mins. Why? What is going on internally in the DB? Can I speedup the performance?
My guess is that the server is writing data to disk. In particular, the internal write-ahead log (WAL) will be incorporated into the base tables. This means that those base tables need to be written to disk before the WAL can be deleted. And we must be sure data is actually on disk, and not in the OS block cache, so we use fdatasync and msync system calls to tell the system to please write to disk now. And those calls are notorious for being slow. Unfortunately OSes don't provide an interface to sync a whole bunch of files at once (except the sync system call which is itself not synchronous, so we still don't know when it's done). It has to be done one file at a time. - -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJVxMrOAAoJEISMxT6LrWYgA90IAKGL5WqlKde/duQqJvcHkfVd zd02JRA87O1iGrKK5j0Hbk3iDhcivDBan4GS6TGLclhbS96t5p2oNvCOLoJ84TOv SkNLTXbkxYyiFm7Y9Zw7nG9FSnxJHzwYGLntsLbAUlwVXnPsE9e7aFZWxD7Zw4OX HlHRnkkR/SDuHRnnWQSZGX52l1+fbYDJv3W8VPvN0mfqbLBUM257/teRYDG040N3 zJElui9IsYOP5YYKTTt7afQ/RlvvTrzGJynjXqX602XsLfAVOIyVgRkiQY+eKxVz E7PJ0QT5qfaiZqPbyJ2OsItOWMIV1sks41/9VnO+4xrFy1JzRE70c1xYhN1Olyk= =tBEF -----END PGP SIGNATURE-----
Hello Alexander, What Sjoerd said is true. WAL is the usual culprit in such cases, however, 10 min. doesnt’ feel right, because the data size is not that much, right? Which version of MonetDB are you using? If not Jul2015 (NB, it’s not a release yet, it’s an RC), would you please try Jul2015? There are some transaction/WAL/BAT leak fixes in Jul2015, which might help your situation. If you’re already using Jul2015, can you please provide us a script or something with which we can reproduce your problem? With kind regards, Jennie
On Aug 07, 2015, at 17:12 , Sjoerd Mullender
wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 07/08/15 15:02, Zimmermann, Alexander wrote:
Hi folks,
in the following scenario I see very slow DB performance after data insertion. W/ slow performance I mean that „mclient -u XXX -d XXX" blocks for roughly 10min bevore it let me access the DB.
Scenario: I parse a pcap file (40MB, 695 connection) in python w/ dpkt. For every new connection found in the pcap file I create a new entry in my connection table and two new empty packet tables (for each direction one table).
The actual packet data is not directly inserted into the packet tables, but written into files to push the data later on w/ COPY BINARY INTO into the packet tables.
Autocommit is off. I commit after the I read the pcap completely and before I start the COPY BINARY INTO. After the COPY BINARY INTO I’ve another commit. That’s it.
The number of transactions is 3*695 (2*695 table creations for the packet tables and 695 entries into the connection table).
The script itself terminates in roughly 30secs, but the DB is blocked +- 10mins. Why? What is going on internally in the DB? Can I speedup the performance?
My guess is that the server is writing data to disk. In particular, the internal write-ahead log (WAL) will be incorporated into the base tables. This means that those base tables need to be written to disk before the WAL can be deleted. And we must be sure data is actually on disk, and not in the OS block cache, so we use fdatasync and msync system calls to tell the system to please write to disk now. And those calls are notorious for being slow. Unfortunately OSes don't provide an interface to sync a whole bunch of files at once (except the sync system call which is itself not synchronous, so we still don't know when it's done). It has to be done one file at a time.
- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v2
iQEcBAEBCAAGBQJVxMrOAAoJEISMxT6LrWYgA90IAKGL5WqlKde/duQqJvcHkfVd zd02JRA87O1iGrKK5j0Hbk3iDhcivDBan4GS6TGLclhbS96t5p2oNvCOLoJ84TOv SkNLTXbkxYyiFm7Y9Zw7nG9FSnxJHzwYGLntsLbAUlwVXnPsE9e7aFZWxD7Zw4OX HlHRnkkR/SDuHRnnWQSZGX52l1+fbYDJv3W8VPvN0mfqbLBUM257/teRYDG040N3 zJElui9IsYOP5YYKTTt7afQ/RlvvTrzGJynjXqX602XsLfAVOIyVgRkiQY+eKxVz E7PJ0QT5qfaiZqPbyJ2OsItOWMIV1sks41/9VnO+4xrFy1JzRE70c1xYhN1Olyk= =tBEF -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Am 09.08.2015 um 16:10 schrieb Ying Zhang
: Hello Alexander,
What Sjoerd said is true. WAL is the usual culprit in such cases, however, 10 min. doesnt’ feel right, because the data size is not that much, right?
Right. 40Mb pcap file w/ 700 connection.
Which version of MonetDB are you using?
alexandz@rx300s8-hdc-tmp:~$ /usr/bin/mserver5 --version MonetDB 5 server v11.19.15 "Oct2014-SP4" (64-bit, 64-bit oids) Copyright (c) 1993-July 2008 CWI Copyright (c) August 2008-2015 MonetDB B.V., all rights reserved Visit http://www.monetdb.org/ for further information Found 63.0GiB available memory, 40 available cpu cores Libraries: libpcre: 8.35 2014-04-04 (compiled with 8.35) openssl: OpenSSL 1.0.1f 6 Jan 2014 (compiled with OpenSSL 1.0.1f 6 Jan 2014) libxml2: 2.9.2 (compiled with 2.9.2) Compiled by: root@dev.monetdb.org (x86_64-pc-linux-gnu) Compilation: gcc -O3 -fomit-frame-pointer -pipe -g -D_FORTIFY_SOURCE=2 Linking : /usr/bin/ld -m elf_x86_64 Debian Jessie.
If not Jul2015 (NB, it’s not a release yet, it’s an RC), would you please try Jul2015? There are some transaction/WAL/BAT leak fixes in Jul2015, which might help your situation.
If we further follow the strategy to push pcap data into the DB, I will try this and come back to you. At the moment we discuss internally if we don’t switch to a completely other approach. Thanks for your time. Alex
If you’re already using Jul2015, can you please provide us a script or something with which we can reproduce your problem?
With kind regards,
Jennie
On Aug 07, 2015, at 17:12 , Sjoerd Mullender
wrote: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256
On 07/08/15 15:02, Zimmermann, Alexander wrote:
Hi folks,
in the following scenario I see very slow DB performance after data insertion. W/ slow performance I mean that „mclient -u XXX -d XXX" blocks for roughly 10min bevore it let me access the DB.
Scenario: I parse a pcap file (40MB, 695 connection) in python w/ dpkt. For every new connection found in the pcap file I create a new entry in my connection table and two new empty packet tables (for each direction one table).
The actual packet data is not directly inserted into the packet tables, but written into files to push the data later on w/ COPY BINARY INTO into the packet tables.
Autocommit is off. I commit after the I read the pcap completely and before I start the COPY BINARY INTO. After the COPY BINARY INTO I’ve another commit. That’s it.
The number of transactions is 3*695 (2*695 table creations for the packet tables and 695 entries into the connection table).
The script itself terminates in roughly 30secs, but the DB is blocked +- 10mins. Why? What is going on internally in the DB? Can I speedup the performance?
My guess is that the server is writing data to disk. In particular, the internal write-ahead log (WAL) will be incorporated into the base tables. This means that those base tables need to be written to disk before the WAL can be deleted. And we must be sure data is actually on disk, and not in the OS block cache, so we use fdatasync and msync system calls to tell the system to please write to disk now. And those calls are notorious for being slow. Unfortunately OSes don't provide an interface to sync a whole bunch of files at once (except the sync system call which is itself not synchronous, so we still don't know when it's done). It has to be done one file at a time.
- -- Sjoerd Mullender -----BEGIN PGP SIGNATURE----- Version: GnuPG v2
iQEcBAEBCAAGBQJVxMrOAAoJEISMxT6LrWYgA90IAKGL5WqlKde/duQqJvcHkfVd zd02JRA87O1iGrKK5j0Hbk3iDhcivDBan4GS6TGLclhbS96t5p2oNvCOLoJ84TOv SkNLTXbkxYyiFm7Y9Zw7nG9FSnxJHzwYGLntsLbAUlwVXnPsE9e7aFZWxD7Zw4OX HlHRnkkR/SDuHRnnWQSZGX52l1+fbYDJv3W8VPvN0mfqbLBUM257/teRYDG040N3 zJElui9IsYOP5YYKTTt7afQ/RlvvTrzGJynjXqX602XsLfAVOIyVgRkiQY+eKxVz E7PJ0QT5qfaiZqPbyJ2OsItOWMIV1sks41/9VnO+4xrFy1JzRE70c1xYhN1Olyk= =tBEF -----END PGP SIGNATURE----- _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Sjoerd Mullender
-
Ying Zhang
-
Zimmermann, Alexander