[MonetDB-users] mserver crash under load -16G data, 10 queries each from 2 remote clients
I have a database with Single table. 15 million rows. 16G of data. 58 columns. MonetDB stores it consuming 32G of data. mserver Machine specs: Main memory: 4G Processor: Intel(R) Xeon(R) CPU E5645 @ 2.40GHz. Single core. I am performing a test to see how much time it takes if I fire this query multiple times from different machines. Case 1: I fire an aggregation query 10 times from a remote machine with 5sec interval between each firing. The result hs 17000 rows with around 4 MB of data. Memory usage of mserver process reahes a peak of 73% mclients finish within 43 to 48 seconds. Case 2: I do the same as above but now from 2 remote machines simultaneously. Memory usage of mserver process reahes a peak of 73% again. It keeps fluctuating between 70 and 73%. But after some time mserver restarts and the clients terminate with all the 10 clients on each of both machines saying: ERROR = !Connection terminated MAPI = tapomay@54.251.11.181:3306 ACTION= read_line QUERY = select id1, id2, id3, id4, id5, count(*), sum(metric1), avg(metric2), sum(metric3), sum(metric4), sum(metric5), sum(metric6), sum(metric7), sum(metric8), sum(metric9), sum(metric9), sum(metric10), sum(metric11), sum(metric12), sum(metric13) from tablename group by id1, id2, id3, id4, id5; Each query terminates within 5 to 50 seconds with the above error. In both cases intial state of mserver consumes around 0.5% memory. Also tested the same under other competing columnar DBs. They complete the case 2 in around 350 or more seconds average. They complete case 1 in 200 seconds average. Following is a last few lines of tail of merovingian.log 2012-07-05 06:30:42 MSG merovingian[31153]: proxying client 1.compute.amazonaws.com:45476 for database 'tablename' to mapi:monetdb:/$ 2012-07-05 06:30:42 MSG merovingian[31153]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying 2012-07-05 06:30:45 MSG merovingian[31153]: proxying client 2.amazonaws.com:46225 for database 'tablename' to mapi:monetdb:///mnt/tapomay$ 2012-07-05 06:30:45 MSG merovingian[31153]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying 2012-07-05 06:30:47 MSG merovingian[31153]: proxying client 1.compute.amazonaws.com:45477 for database 'tablename' to mapi:monetdb:/$ 2012-07-05 06:30:47 MSG merovingian[31153]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying 2012-07-05 06:30:50 MSG merovingian[31153]: proxying client 2.amazonaws.com:46226 for database 'tablename' to mapi:monetdb:///mnt/tapomay$ 2012-07-05 06:30:50 MSG merovingian[31153]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying 2012-07-05 06:30:52 MSG merovingian[31153]: proxying client 1.compute.amazonaws.com:45478 for database 'tablename' to mapi:monetdb:/$ 2012-07-05 06:30:52 MSG merovingian[31153]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying 2012-07-05 06:30:55 MSG merovingian[31153]: proxying client 2.amazonaws.com:46241 for database 'tablename' to mapi:monetdb:///mnt/tapomay$ 2012-07-05 06:30:55 MSG merovingian[31153]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying 2012-07-05 06:30:59 MSG merovingian[31153]: database 'tablename' (1048) was killed by signal SIGKILL Is there a config that would ask mserver to not kill itself and keep processing at the cost of query execution time. A few more tens of seconds won't harm much. Thanks and Regards, Tapomay.
On 05-07-2012 00:28:15 -0700, Tapomay Dey wrote:
2012-07-05 06:30:59 MSG merovingian[31153]: database 'tablename' (1048) was killed by signal SIGKILL
mserver5 gets KILLed here. (SIGKILL is usually 9, deadly, cannot be handled by the process). Most likely your operating system doesn't like mserver5 here, and just kills it. Check your /var/log/messages (or alike) for OOM kills.
Is there a config that would ask mserver to not kill itself and keep processing at the cost of query execution time.
It doesn't kill itself, you or your operating system are killing it. In the upcoming Jul2012 release we did a fair bit of work on memory management, and better behaviour by letting the operating system choose for itself what it seems best to do. It would be interesting if you could try out the (still under development, no release candidates yet -- should come really soon) Jul2012 branch, e.g. by compiling http://monetdb.cwi.nl/testweb/web/44451:5aebbb6f96b0/MonetDB-11.11.0.tar.bz2 Looking forward hearing if that makes any difference for you, Fabian
An update:
Tested firing 50 of the same failing query from a single machine.
mserver didn't crash. However some of the clients ended without dumping the results.
Rest of the clients completed successfully taking time from 8 to 200 seconds.
Why does it crash with 2 machines for just 20 queries?
Regards,
Tapomay.
________________________________
From: Tapomay Dey
Sorry my bad observation.
The server does crash at points where the clients dumped empty results.
Regards,
Tapomay.
________________________________
From: Tapomay Dey
Sorry my bad again :(
As Fabian pointed out
http://sourceforge.net/mailarchive/forum.php?thread_name=20120705085653.GQ24104%40cwi.nl&forum_name=monetdb-users
its unfair to say server crashed. It got killed by the OS.
Regards,
Tapomay.
________________________________
From: Tapomay Dey
i met this problem too, my server machine specs:xeon 2*4 2.12MHz, 32G memory, 4*1T raid0, windows 2008R2 64bit, the table is simple with less than 10 columns and about 0.7 billions rows, the query is simple too, just select some rows from the table with "where" condition; it take an average of 200s to return results; but when i ran the query parallelly with 2 clients, sometimes the server crashed. how many parallel query can monetdb served?
--------------------------------------------------------------------------------
------------------ 原始邮件 ------------------
发件人: Tapomay Dey ;
发送时间: 2012-07-05 17:35:52
收件人: monetdb-users@lists.sourceforge.net;
抄送: (无);
主题: Re: [MonetDB-users] mserver crash under load -16G data,10 queries each from 2 remote clients
Sorry my bad again :(
As Fabian pointed out
http://sourceforge.net/mailarchive/forum.php?thread_name=20120705085653.GQ24104%40cwi.nl&forum_name=monetdb-users
its unfair to say server crashed. It got killed by the OS.
Regards,
Tapomay.
--------------------------------------------------------------------------------
From: Tapomay Dey
Hehe, goofup on my side. Spoiler alert:Needed swap. But I think monet could do something smart about it. Further observations: Followed Fabian's idea and installed the nightly build@ http://monetdb.cwi.nl/testweb/web/44451:5aebbb6f96b0/MonetDB-11.11.0.tar.bz2 This is a different machine. Same config though.($G RAM, 2.27GHz single core Intel(R) Xeon(R)) This time it topped 65% of the RAM. However the client still failed with the same thing. Fabian pointed out the possibility of the OOM assassin. So I checked my box using the "free" command. It had no swap. Added some 4GB swap file using instructions on https://help.ubuntu.com/community/SwapFaq sudo dd if=/dev/zero of=/mnt/4096MiB.swap bs=1024 count=4194304 sudo chmod 600 /mnt/4096MiB.swap sudo mkswap /mnt/4096MiB.swap sudo swapon /mnt/4096MiB.swap Add "/mnt/4096MiB.swap none swap sw 0 0" as last line of /etc/fstab This solved the issue. Ran 20 queries from single remote machine: RAM usage topped 81% and kept fluctuating 35-70% approx. Average exec time 195 sec Ran case 2- 20 queries, 10 from one machine, 10 from another concurrently. Queries were fired at 5 sec intervals.(Case 2 if u see my last post) Average exec time 348 sec I guess its the overhead due to lack of main memory. At least its not being killed anymore. Takes up around 650 MB of swap. For comparison of the nightly build with earlier box with stable build, ran the 10 queries from single remote box ( with the given data this succeeds even without swap). New box(nightly build) with swap: 65 sec avg. Earlier box(Stable build) without swap: 47 sec avg. Two components in change involved: 1. Environment change(machine, network etc.). Best candidate causing the delay. 2. Algo changes in the nightly build. I am guessing this must be towards better.. Now, any pointers as to if and how do I optimise when the data bulges into swap. ###################### Adding swap to earlier machine (stable build) to test the case 2 again - 20 queries - two remote machines firing 10 clients each 179 sec avg. Thanks and Regards, Tapomay.
On 06-07-2012 06:12:34 -0700, Tapomay Dey wrote:
Hehe, goofup on my side. Spoiler alert:Needed swap.
But I think monet could do something smart about it.
What would you suggest?
any pointers as to if and how do I optimise when the data bulges into swap.
buy more memory ;) Seriously, not much you can do when you want to process more data than your memory can hold. You could try to put your swap on fast disks (e.g. raid-0 of SSDs), but also your dbfarm data could benefit from that, as when your data doesn't fit in memory, you'll be reading (and possibly writing) from disk frequently. Fabian
Thats true.
But isn't there a way we could figure out that OS is not gonna give us more RAM before being killed. That way it could deny further queries instead of dying out resulting in failure of all running queries. Just thinking out loud.
I stretched it further with 4GB Swap along with the 4G RAM. Firing 10 queries from 5 different machines. It ate up all the Swap and the RAM usage was at 95 % for a long time before being killed. Going ahead to set an 8G swap.
Thanks and Regards,
Tapomay.
________________________________
From: Fabian Groffen
Hehe, goofup on my side. Spoiler alert:Needed swap.
But I think monet could do something smart about it.
What would you suggest?
any pointers as to if and how do I optimise when the data bulges into swap.
buy more memory ;) Seriously, not much you can do when you want to process more data than your memory can hold. You could try to put your swap on fast disks (e.g. raid-0 of SSDs), but also your dbfarm data could benefit from that, as when your data doesn't fit in memory, you'll be reading (and possibly writing) from disk frequently. Fabian ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On 10-07-2012 06:21:19 -0700, Tapomay Dey wrote:
Thats true.
But isn't there a way we could figure out that OS is not gonna give us more RAM before being killed. That way it could deny further queries instead of dying out resulting in failure of all running queries. Just thinking out loud.
Typically, the OS tells the application it's not going to let it have more memory by making malloc calls fail. However, the OS in this case just evicts the application and kills it. There is no prior notice to the application, or anything. So nothing to be done here.
I stretched it further with 4GB Swap along with the 4G RAM. Firing 10 queries from 5 different machines. It ate up all the Swap and the RAM usage was at 95 % for a long time before being killed. Going ahead to set an 8G swap.
It feels to me like you're just overloading your relatively small machine tremendously. The more you get into swap, the more your performance is going down. You might be better off reducing the number of concurrent queries here instead. You could try and see if your application is benefitting from/working with the funnel included in monetdbd. Fabian
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA512 On 10-07-12 15:48, Fabian Groffen wrote:
Typically, the OS tells the application it's not going to let it have more memory by making malloc calls fail. However, the OS in this case just evicts the application and kills it. There is no prior notice to the application, or anything. So nothing to be done here.
Is there currently any 'native' MonetDB option, or elaborative example how limits.conf could be used in limiting the available RAM to MonetDB? In such way the operating system or MonetDBs wrapper for malloc detects that it is approaching the boundaries and can act prior to a global Out Of Memory? Stefan -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.18 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEAREKAAYFAk/8NCoACgkQYH1+F2Rqwn0PHgCfdcPA3uVebhxLTZ2QiGERy2CJ QZIAniPIQsRHuYnxWAhIL0uOSBLy/Ttu =urKn -----END PGP SIGNATURE-----
participants (4)
-
Fabian Groffen
-
Stefan de Konink
-
Tapomay Dey
-
梁猛