
On 14 Jun 2019, at 15:05, Roberto Cornacchia
wrote: Hi all,
I'm struggling with optimizing resource sharing of MonetDB in production environments (see also: https://www.monetdb.org/pipermail/users-list/2018-June/010276.html).
Hai Roberto, We don’t have good solution yet to share resources among MonetDB instances, but recently we have gathered some information on this topic. Let me share it here.
We run MonetDB instances for several projects / customers on each of our servers. Each MonetDB instance is a docker container (used mainly because of ease of deployment and environment isolation). It is not unusual to have 5-10 MonetDB containers on the same server. In principle, Docker does not even have much to do with this, but it puts everything in a realistic context.
** Memory mserver5 checks the system memory and calibrates on that. When 10 instances are running, they all assume they have the whole memory for themselves.
Docker allows to set limits on the container memory. It does that by using cgroups (so Docker just makes things easier, but it's really about cgroups). However, memory limits set by cgroups are not namespaced (https://ops.tips/blog/why-top-inside-container-wrong-memory/#memory-limits-s...).
We’ve often used tools such as numactrl and cgroups to limit hardware resources MonetDB can use. They indeed limit the resources available to mdb, but we only realised it recently that mdb is not aware of those limits, so it can cause various problems. This is an open issue reported here: https://www.monetdb.org/bugzilla/show_bug.cgi?id=6710 FYI, depending on the system, uses sysctl or GlobalMemoryStatusEx for memory, the former with system-dependent arguments. For number of cores mdb uses sysconf, sycctl, or GetSystemInfo. See gdk_utils.c (MT_init()) and gdk_system.c (MT_check_nr_cores()).
This means that each container will still see the whole memory and will simply get killed when the container limit has been reached (definitely not a solution).
It doesn’t solve the problem, but in this blog (especially at the end), we gave some ideas how to avoid the OOM-killer: https://www.monetdb.org/blog/limit_memory_usage_on_linux_with_cgroups However, please be aware that lowering the OOM-killer priority would just make OOM-killer choose a different victim, which can be a disaster on a production server. Docker even has an option to disable OOM-killer on a container. But the consequences may be even worse, as without a victim processes can just freeze forever. For windows, we have actually added an option *inside* mdb to limit its memory usage. I think with that one, mdb is actually aware of the limits… The code is not released yet.
So far, the only mechanism I know to obtain the correect behavior is to run actual VMs for MonetDB. But this is very cumbersome and I want to avoid that as much as possible. Should we let 10 instances believe they each have the whole memory, and let them fight for it? (well, that's what's happening now, and I know for sure it's bad). Perhaps the solution can be as easy as allowing an explicit max memory setting, together with some documentation on the consequences of using low / high values.
I’m also thinking about an explicit max-memory setting. One that’s similar --set gdk_nr_threads = N so that one can set it to the same amount of MEM as the limit in the external tools. It’s a bit hacky, but is probably the easiest to implement. Let me check with the others if this is something we can in short term. An idea solution would be to let MonetDB to also check for the resource limits set by CGroups, numactl, Docker, etc. Perhaps what we need to do is look at the resource limits (getrlimit function call) to get the (soft) limit. If they are lower than what we found by using sysctl/sysconf, we should use the lower value. Actually, the Linux cgroups manual refers to getrlimit, so they may have to do with each other. For cgroups on linux one can do amongst others: cat /proc/<PID>/cgroup to get the cgroup of the process with a specific pid. Once one knows the cgroup, one can look up the memory limits in the cgroup directory assuming sufficient permissions.
** CPU Again, Docker allows to set quotas per container. I think cgroups CPU limits are namespaced, so perhaps this would just work well, I haven't really tried yet.
I wonder if --set gdk_nr_threads = N can be of any help here.
** I/O Same issue. It would be ideal to be able to set priorities, so that mserver5 instances that do background work get a lower I/O priority than instances serving online queries.
This is probably even more difficult than MEM and CPU limitations, since MonetDB heavily relies on mmapped files and let the OS decide what’s best. And so far, we have barely received any user requests on this particular topic... I know about some research work on improving mmapped files, which allows application to assign a priority to each page. Maybe madvise can help a bit here.
Also, recommendations on swap settings would be interesting. How much swap? How to tune swappiness kernel settings?
I am very aware that there is no simple answer to most of these questions. Many variables are in the picture. Still, some general thoughts from the developers would be appreciated.
I think I have read pretty much everything has ever been written about MonetDB, but when it comes to resource utilization I have always bumped into the very unrealistic assumption that each MonetDB instance has a whole server for itself. As I mentioned above, things could get already much better with simple improvements, like allowing to set the maximum memory usable by each instance.
But more in general, I feel there is much need for some guidelines for production environments. Or at least, to start the discussion.
Let’s try to keep this discussion more active. Just my ¥0.02 Jennie
Best regards, Roberto _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list