[MonetDB-users] Maximum Parallelism and Recommended Hardware for MonetDB
I set up a MonetDB server for a data mining team on a single node. The node's specs are 32 cores (4x8 AMD 2.3GHz) 128GiB RAM, with the dbfarm on a RAID0 2xSSD (I think they are Intel SSDs). MonetDB on this hardware works well for aggregates and joins on integers, but chugs considerably on inline views with strings and joins with strings. The team usually has 2-3 concurrent jobs at any given time, some materializing new tables, with lots of bulk loading ("copy into ... from file ..."). A typical table size is 2GiB, and queries may join 5-10 tables. When I look at process activity on the node, I usually see each DB's mserver5 process pegged on CPU, using just a single core. Since I rarely see more than one core used per mserver5, I'm starting to think either I haven't configured MonetDB to favor parallel plans, or I chose the wrong hardware for MonetDB. So, as questions to everyone on the list, I'm curious 1) Are there knobs in MonetDB to favor parallel execution plans? 2) What's the ideal hardware for a single-node MonetDB server to support a workload of 2-5 concurrent queries? 3) What's the ideal disk configuration for the above? Brien
Hi Brien On 13-07-2011 18:40:17 -0400, brien colwell wrote:
I set up a MonetDB server for a data mining team on a single node. The node's specs are 32 cores (4x8 AMD 2.3GHz) 128GiB RAM, with the dbfarm on a RAID0 2xSSD (I think they are Intel SSDs). MonetDB on this hardware works well for aggregates and joins on integers, but chugs considerably on inline views with strings and joins with strings. The team usually has 2-3 concurrent jobs at any given time, some materializing new tables, with lots of bulk loading ("copy into ... from file ..."). A typical table size is 2GiB, and queries may join 5-10 tables.
Machine specs look ok to me.
When I look at process activity on the node, I usually see each DB's mserver5 process pegged on CPU, using just a single core.
Using TRACE on such queries should give you insight on which instruction(s) take a lot of time. If you have questions about a trace, we can possibly have a look at it to see if we can point out something trivial.
Since I rarely see more than one core used per mserver5, I'm starting to think either I haven't configured MonetDB to favor parallel plans, or I chose the wrong hardware for MonetDB.
So, as questions to everyone on the list, I'm curious 1) Are there knobs in MonetDB to favor parallel execution plans?
You can force parallel execution to take place at a certain level, but we disadvise doing so, as it usually degrades performance (MonetDB tries to chose the best level of parallelism based on available cores, memory and work).
2) What's the ideal hardware for a single-node MonetDB server to support a workload of 2-5 concurrent queries?
Concurrency is always a problem for MonetDB, since in the ideal case it means the processes are fighting for resources with each other.
3) What's the ideal disk configuration for the above?
Make sure you get the fastest IO from the disk both in terms of latency as well as throughput. Regards, Fabian Groffen
participants (2)
-
brien colwell
-
Fabian Groffen