Guidelines for MonetDB in production environments

newer
Using the Monet API to connect to...

Roberto Cornacchia

14 Jun 2019 14 Jun '19

1:05 p.m.

Hi all, I'm struggling with optimizing resource sharing of MonetDB in production environments (see also: https://www.monetdb.org/pipermail/users-list/2018-June/010276.html). We run MonetDB instances for several projects / customers on each of our servers. Each MonetDB instance is a docker container (used mainly because of ease of deployment and environment isolation). It is not unusual to have 5-10 MonetDB containers on the same server. In principle, Docker does not even have much to do with this, but it puts everything in a realistic context. ** Memory mserver5 checks the system memory and calibrates on that. When 10 instances are running, they all assume they have the whole memory for themselves. Docker allows to set limits on the container memory. It does that by using cgroups (so Docker just makes things easier, but it's really about cgroups). However, memory limits set by cgroups are not namespaced ( https://ops.tips/blog/why-top-inside-container-wrong-memory/#memory-limits-s... ). This means that each container will still see the whole memory and will simply get killed when the container limit has been reached (definitely not a solution). So far, the only mechanism I know to obtain the correect behavior is to run actual VMs for MonetDB. But this is very cumbersome and I want to avoid that as much as possible. Should we let 10 instances believe they each have the whole memory, and let them fight for it? (well, that's what's happening now, and I know for sure it's bad). Perhaps the solution can be as easy as allowing an explicit max memory setting, together with some documentation on the consequences of using low / high values. ** CPU Again, Docker allows to set quotas per container. I think cgroups CPU limits are namespaced, so perhaps this would just work well, I haven't really tried yet. ** I/O Same issue. It would be ideal to be able to set priorities, so that mserver5 instances that do background work get a lower I/O priority than instances serving online queries. Also, recommendations on swap settings would be interesting. How much swap? How to tune swappiness kernel settings? I am very aware that there is no simple answer to most of these questions. Many variables are in the picture. Still, some general thoughts from the developers would be appreciated. I think I have read pretty much everything has ever been written about MonetDB, but when it comes to resource utilization I have always bumped into the very unrealistic assumption that each MonetDB instance has a whole server for itself. As I mentioned above, things could get already much better with simple improvements, like allowing to set the maximum memory usable by each instance. But more in general, I feel there is much need for some guidelines for production environments. Or at least, to start the discussion. Best regards, Roberto

Attachments:

attachment.html (text/html — 3.5 KB)

Show replies by date

Ying Zhang

19 Jun 19 Jun

9:26 p.m.

...

On 14 Jun 2019, at 15:05, Roberto Cornacchia wrote:

Hi all,

I'm struggling with optimizing resource sharing of MonetDB in production environments (see also: https://www.monetdb.org/pipermail/users-list/2018-June/010276.html).

Hai Roberto, We don’t have good solution yet to share resources among MonetDB instances, but recently we have gathered some information on this topic. Let me share it here.

...

We run MonetDB instances for several projects / customers on each of our servers. Each MonetDB instance is a docker container (used mainly because of ease of deployment and environment isolation). It is not unusual to have 5-10 MonetDB containers on the same server. In principle, Docker does not even have much to do with this, but it puts everything in a realistic context.

** Memory mserver5 checks the system memory and calibrates on that. When 10 instances are running, they all assume they have the whole memory for themselves.

Docker allows to set limits on the container memory. It does that by using cgroups (so Docker just makes things easier, but it's really about cgroups). However, memory limits set by cgroups are not namespaced (https://ops.tips/blog/why-top-inside-container-wrong-memory/#memory-limits-s...).

We’ve often used tools such as numactrl and cgroups to limit hardware resources MonetDB can use. They indeed limit the resources available to mdb, but we only realised it recently that mdb is not aware of those limits, so it can cause various problems. This is an open issue reported here: https://www.monetdb.org/bugzilla/show_bug.cgi?id=6710 FYI, depending on the system, uses sysctl or GlobalMemoryStatusEx for memory, the former with system-dependent arguments. For number of cores mdb uses sysconf, sycctl, or GetSystemInfo. See gdk_utils.c (MT_init()) and gdk_system.c (MT_check_nr_cores()).

...

This means that each container will still see the whole memory and will simply get killed when the container limit has been reached (definitely not a solution).

It doesn’t solve the problem, but in this blog (especially at the end), we gave some ideas how to avoid the OOM-killer: https://www.monetdb.org/blog/limit_memory_usage_on_linux_with_cgroups However, please be aware that lowering the OOM-killer priority would just make OOM-killer choose a different victim, which can be a disaster on a production server. Docker even has an option to disable OOM-killer on a container. But the consequences may be even worse, as without a victim processes can just freeze forever. For windows, we have actually added an option *inside* mdb to limit its memory usage. I think with that one, mdb is actually aware of the limits… The code is not released yet.

...

So far, the only mechanism I know to obtain the correect behavior is to run actual VMs for MonetDB. But this is very cumbersome and I want to avoid that as much as possible. Should we let 10 instances believe they each have the whole memory, and let them fight for it? (well, that's what's happening now, and I know for sure it's bad). Perhaps the solution can be as easy as allowing an explicit max memory setting, together with some documentation on the consequences of using low / high values.

I’m also thinking about an explicit max-memory setting. One that’s similar --set gdk_nr_threads = N so that one can set it to the same amount of MEM as the limit in the external tools. It’s a bit hacky, but is probably the easiest to implement. Let me check with the others if this is something we can in short term. An idea solution would be to let MonetDB to also check for the resource limits set by CGroups, numactl, Docker, etc. Perhaps what we need to do is look at the resource limits (getrlimit function call) to get the (soft) limit. If they are lower than what we found by using sysctl/sysconf, we should use the lower value. Actually, the Linux cgroups manual refers to getrlimit, so they may have to do with each other. For cgroups on linux one can do amongst others: cat /proc/<PID>/cgroup to get the cgroup of the process with a specific pid. Once one knows the cgroup, one can look up the memory limits in the cgroup directory assuming sufficient permissions.

...

** CPU Again, Docker allows to set quotas per container. I think cgroups CPU limits are namespaced, so perhaps this would just work well, I haven't really tried yet.

I wonder if --set gdk_nr_threads = N can be of any help here.

...

** I/O Same issue. It would be ideal to be able to set priorities, so that mserver5 instances that do background work get a lower I/O priority than instances serving online queries.

This is probably even more difficult than MEM and CPU limitations, since MonetDB heavily relies on mmapped files and let the OS decide what’s best. And so far, we have barely received any user requests on this particular topic... I know about some research work on improving mmapped files, which allows application to assign a priority to each page. Maybe madvise can help a bit here.

...

Also, recommendations on swap settings would be interesting. How much swap? How to tune swappiness kernel settings?

I am very aware that there is no simple answer to most of these questions. Many variables are in the picture. Still, some general thoughts from the developers would be appreciated.

I think I have read pretty much everything has ever been written about MonetDB, but when it comes to resource utilization I have always bumped into the very unrealistic assumption that each MonetDB instance has a whole server for itself. As I mentioned above, things could get already much better with simple improvements, like allowing to set the maximum memory usable by each instance.

But more in general, I feel there is much need for some guidelines for production environments. Or at least, to start the discussion.

Let’s try to keep this discussion more active. Just my ¥0.02 Jennie

...

Best regards, Roberto _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Arjen de Rijke

20 Jun 20 Jun

9:30 a.m.

Hi All, To add to this from a "sysadmin" perspective, one fundamental problem is that "containers" and "virtual machines" appear to be similar, but fundamentally they are very different. And that manifests itself in the problems that you are describing. This means that things like memory limits in containers look a lot like the amount of memory in a (virtual) machine, but they are really different. If a real machine runs out of memory, the kernel tries to schedule different processes in such a way that the machine continues to work, for example by swapping to disc. Only in the last resort will it kill a process. With containers this is different. There the container runtime will kill a container that exceeds the memory limit. Because the primary use-case for containers is stateless applications, such as webservers, this behaviour is perfectly fine. But if you are running statefull applications, such as databases, in containers, this creates problems. Containers "promise" to make maintainance easier, compared to virtual machines. But this comes at a price. Building and maintaining a virtual machine is not as easy as building and maintaining a container, but the advantage of a virtual machine is that it is "exactly" the same as a real machine. So a database running inside a virtual machine behaves (more or less) the same as on physical hardaware. Under normal circumstances running a database inside a container is fine, but in the more extreme cases, you run into the problems you describe. In his email Roberto writes: "... So far, the only mechanism I know to obtain the correect behavior is to run actual VMs for MonetDB. ..." and he is right. But that implies that the container behaviour is wrong, but that is not true. The container behaves exactly as it is supposed to do. The problem is that it does is not match Roberto's intended use case. He also says: "... this is very cumbersome and I want to avoid that as much as possible. ..." and he is not wrong, but there are tools available to build vm images relatively easy. But maintaining a set of vm's with a database on a machine is not as easy as running a set of containers, that is certainly true. But it is a trade-off. You can try to work around the limitations of containers, but that will make managing the containers more difficult and more error prone, which is exactly what you wanted to avoid in the first place. You could also try to make the database aware of the containers memory limit, but then the database has to guarantee it will not exceed this limit under any circumstance, which is impossible (i guess). Some of Jenny's remarks can help running MonetDB in containers and maybe some changes to MonetDB might help as well. But the fundamental problem remains, containers and (virtual) machines are different and in certain cases you will notice the difference. In order to make the right choice, you need to know what your requirements are and how the different solutions are implemented. For many cases it is perfectly fine to run databases in containers, but unfortunately i don't think it will always work. Arjen PS, i am only talking about linux here. ----- Original Message -----

...

From: "Ying Zhang" To: "Communication channel for MonetDB users" Sent: Wednesday, June 19, 2019 11:26:37 PM Subject: Re: Guidelines for MonetDB in production environments

...

...
On 14 Jun 2019, at 15:05, Roberto Cornacchia wrote:

Hi all,

I'm struggling with optimizing resource sharing of MonetDB in production environments (see also: https://www.monetdb.org/pipermail/users-list/2018-June/010276.html).

Hai Roberto,

We don’t have good solution yet to share resources among MonetDB instances, but recently we have gathered some information on this topic. Let me share it here.

...
We run MonetDB instances for several projects / customers on each of our servers. Each MonetDB instance is a docker container (used mainly because of ease of deployment and environment isolation). It is not unusual to have 5-10 MonetDB containers on the same server. In principle, Docker does not even have much to do with this, but it puts everything in a realistic context.

** Memory mserver5 checks the system memory and calibrates on that. When 10 instances are running, they all assume they have the whole memory for themselves.

Docker allows to set limits on the container memory. It does that by using cgroups (so Docker just makes things easier, but it's really about cgroups). However, memory limits set by cgroups are not namespaced (https://ops.tips/blog/why-top-inside-container-wrong-memory/#memory-limits-s...).

We’ve often used tools such as numactrl and cgroups to limit hardware resources MonetDB can use. They indeed limit the resources available to mdb, but we only realised it recently that mdb is not aware of those limits, so it can cause various problems. This is an open issue reported here: https://www.monetdb.org/bugzilla/show_bug.cgi?id=6710

FYI, depending on the system, uses sysctl or GlobalMemoryStatusEx for memory, the former with system-dependent arguments. For number of cores mdb uses sysconf, sycctl, or GetSystemInfo. See gdk_utils.c (MT_init()) and gdk_system.c (MT_check_nr_cores()).

...
This means that each container will still see the whole memory and will simply get killed when the container limit has been reached (definitely not a solution).

It doesn’t solve the problem, but in this blog (especially at the end), we gave some ideas how to avoid the OOM-killer: https://www.monetdb.org/blog/limit_memory_usage_on_linux_with_cgroups

However, please be aware that lowering the OOM-killer priority would just make OOM-killer choose a different victim, which can be a disaster on a production server. Docker even has an option to disable OOM-killer on a container. But the consequences may be even worse, as without a victim processes can just freeze forever.

For windows, we have actually added an option *inside* mdb to limit its memory usage. I think with that one, mdb is actually aware of the limits… The code is not released yet.

...
So far, the only mechanism I know to obtain the correect behavior is to run actual VMs for MonetDB. But this is very cumbersome and I want to avoid that as much as possible. Should we let 10 instances believe they each have the whole memory, and let them fight for it? (well, that's what's happening now, and I know for sure it's bad). Perhaps the solution can be as easy as allowing an explicit max memory setting, together with some documentation on the consequences of using low / high values.

I’m also thinking about an explicit max-memory setting. One that’s similar --set gdk_nr_threads = N so that one can set it to the same amount of MEM as the limit in the external tools. It’s a bit hacky, but is probably the easiest to implement. Let me check with the others if this is something we can in short term.

An idea solution would be to let MonetDB to also check for the resource limits set by CGroups, numactl, Docker, etc. Perhaps what we need to do is look at the resource limits (getrlimit function call) to get the (soft) limit. If they are lower than what we found by using sysctl/sysconf, we should use the lower value. Actually, the Linux cgroups manual refers to getrlimit, so they may have to do with each other.

For cgroups on linux one can do amongst others: cat /proc/<PID>/cgroup to get the cgroup of the process with a specific pid. Once one knows the cgroup, one can look up the memory limits in the cgroup directory assuming sufficient permissions.

...
** CPU Again, Docker allows to set quotas per container. I think cgroups CPU limits are namespaced, so perhaps this would just work well, I haven't really tried yet.

I wonder if --set gdk_nr_threads = N can be of any help here.

...
** I/O Same issue. It would be ideal to be able to set priorities, so that mserver5 instances that do background work get a lower I/O priority than instances serving online queries.

This is probably even more difficult than MEM and CPU limitations, since MonetDB heavily relies on mmapped files and let the OS decide what’s best. And so far, we have barely received any user requests on this particular topic...

I know about some research work on improving mmapped files, which allows application to assign a priority to each page.

Maybe madvise can help a bit here.

...
Also, recommendations on swap settings would be interesting. How much swap? How to tune swappiness kernel settings?

I am very aware that there is no simple answer to most of these questions. Many variables are in the picture. Still, some general thoughts from the developers would be appreciated.

I think I have read pretty much everything has ever been written about MonetDB, but when it comes to resource utilization I have always bumped into the very unrealistic assumption that each MonetDB instance has a whole server for itself. As I mentioned above, things could get already much better with simple improvements, like allowing to set the maximum memory usable by each instance.

But more in general, I feel there is much need for some guidelines for production environments. Or at least, to start the discussion.

Let’s try to keep this discussion more active.

Just my ¥0.02

Jennie

...
Best regards, Roberto _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Roberto Cornacchia

9:46 a.m.

Arjen, Thanks for your input. It's all 100% correct. I should stress that we don't use container as "lightweight VMs". We use them mainly because of the advantages they bring in terms of ease of deployment. So we are not expecting that containers simulate a VM. The only point I disagree with is that the difference between containers and VMs is the fundamental problem here. Yes, they are different and I'm aware of that. To me, the fundamental problem is that there are no means to prevent each MonetDB instance to claim all available memory. This has nothing to do with containers, the same happens on bare metal or on VMs. Roberto On Thu, 20 Jun 2019 at 11:34, Arjen de Rijke wrote:

...

Hi All,

To add to this from a "sysadmin" perspective, one fundamental problem is that "containers" and "virtual machines" appear to be similar, but fundamentally they are very different. And that manifests itself in the problems that you are describing.

This means that things like memory limits in containers look a lot like the amount of memory in a (virtual) machine, but they are really different. If a real machine runs out of memory, the kernel tries to schedule different processes in such a way that the machine continues to work, for example by swapping to disc. Only in the last resort will it kill a process. With containers this is different. There the container runtime will kill a container that exceeds the memory limit. Because the primary use-case for containers is stateless applications, such as webservers, this behaviour is perfectly fine. But if you are running statefull applications, such as databases, in containers, this creates problems.

Containers "promise" to make maintainance easier, compared to virtual machines. But this comes at a price. Building and maintaining a virtual machine is not as easy as building and maintaining a container, but the advantage of a virtual machine is that it is "exactly" the same as a real machine. So a database running inside a virtual machine behaves (more or less) the same as on physical hardaware. Under normal circumstances running a database inside a container is fine, but in the more extreme cases, you run into the problems you describe.

In his email Roberto writes: "... So far, the only mechanism I know to obtain the correect behavior is to run actual VMs for MonetDB. ..." and he is right. But that implies that the container behaviour is wrong, but that is not true. The container behaves exactly as it is supposed to do. The problem is that it does is not match Roberto's intended use case. He also says: "... this is very cumbersome and I want to avoid that as much as possible. ..." and he is not wrong, but there are tools available to build vm images relatively easy. But maintaining a set of vm's with a database on a machine is not as easy as running a set of containers, that is certainly true. But it is a trade-off. You can try to work around the limitations of containers, but that will make managing the containers more difficult and more error prone, which is exactly what you wanted to avoid in the first place. You could also try to make the database aware of the containers memory limit, but then the database has to guarantee it will not exceed this limit under any circumstance, which is impossible (i guess).

Some of Jenny's remarks can help running MonetDB in containers and maybe some changes to MonetDB might help as well. But the fundamental problem remains, containers and (virtual) machines are different and in certain cases you will notice the difference. In order to make the right choice, you need to know what your requirements are and how the different solutions are implemented. For many cases it is perfectly fine to run databases in containers, but unfortunately i don't think it will always work.

Arjen

PS, i am only talking about linux here.

----- Original Message -----

...
From: "Ying Zhang" To: "Communication channel for MonetDB users" Sent: Wednesday, June 19, 2019 11:26:37 PM Subject: Re: Guidelines for MonetDB in production environments

...
...
On 14 Jun 2019, at 15:05, Roberto Cornacchia < roberto.cornacchia@gmail.com> wrote:

Hi all,

I'm struggling with optimizing resource sharing of MonetDB in production environments (see also: https://www.monetdb.org/pipermail/users-list/2018-June/010276.html).

Hai Roberto,

We don’t have good solution yet to share resources among MonetDB instances, but recently we have gathered some information on this topic. Let me share it here.

...
We run MonetDB instances for several projects / customers on each of our servers. Each MonetDB instance is a docker container (used mainly because of

...
deployment and environment isolation). It is not unusual to have 5-10 MonetDB containers on the same server. In principle, Docker does not even have much to do with this, but it

ease of puts

...
everything in a realistic context.

** Memory mserver5 checks the system memory and calibrates on that. When 10 instances are running, they all assume they have the whole memory for themselves.

Docker allows to set limits on the container memory. It does that by using cgroups (so Docker just makes things easier, but it's really about cgroups). However, memory limits set by cgroups are not namespaced ( https://ops.tips/blog/why-top-inside-container-wrong-memory/#memory-limits-s... ).

We’ve often used tools such as numactrl and cgroups to limit hardware resources MonetDB can use. They indeed limit the resources available to mdb, but we only realised it recently that mdb is not aware of those limits, so it can cause various problems. This is an open issue reported here: https://www.monetdb.org/bugzilla/show_bug.cgi?id=6710

FYI, depending on the system, uses sysctl or GlobalMemoryStatusEx for memory, the former with system-dependent arguments. For number of cores mdb uses sysconf, sycctl, or GetSystemInfo. See gdk_utils.c (MT_init()) and gdk_system.c (MT_check_nr_cores()).

...
This means that each container will still see the whole memory and will simply get killed when the container limit has been reached (definitely not a solution).

It doesn’t solve the problem, but in this blog (especially at the end), we gave some ideas how to avoid the OOM-killer: https://www.monetdb.org/blog/limit_memory_usage_on_linux_with_cgroups

However, please be aware that lowering the OOM-killer priority would just make OOM-killer choose a different victim, which can be a disaster on a production server. Docker even has an option to disable OOM-killer on a container. But the consequences may be even worse, as without a victim processes can just freeze forever.

For windows, we have actually added an option *inside* mdb to limit its memory usage. I think with that one, mdb is actually aware of the limits… The code is not released yet.

...
So far, the only mechanism I know to obtain the correect behavior is to run actual VMs for MonetDB. But this is very cumbersome and I want to avoid that as much as possible. Should we let 10 instances believe they each have the whole memory, and let them fight for it? (well, that's what's happening now, and I know for sure it's bad). Perhaps the solution can be as easy as allowing an explicit max memory setting, together with some documentation on the consequences of using low / high values.

I’m also thinking about an explicit max-memory setting. One that’s similar --set gdk_nr_threads = N so that one can set it to the same amount of MEM as the limit in the external tools. It’s a bit hacky, but is probably the easiest to implement. Let me check with the others if this is something we can in short term.

An idea solution would be to let MonetDB to also check for the resource limits set by CGroups, numactl, Docker, etc. Perhaps what we need to do is look at the resource limits (getrlimit function call) to get the (soft) limit. If they are lower than what we found by using sysctl/sysconf, we should use the lower value. Actually, the Linux cgroups manual refers to getrlimit, so they may have to do with each other.

For cgroups on linux one can do amongst others: cat /proc/<PID>/cgroup to get the cgroup of the process with a specific pid. Once one knows the cgroup, one can look up the memory limits in the cgroup directory assuming sufficient permissions.

...
** CPU Again, Docker allows to set quotas per container. I think cgroups CPU

...
namespaced, so perhaps this would just work well, I haven't really

limits are tried yet.

I wonder if --set gdk_nr_threads = N can be of any help here.

...
** I/O Same issue. It would be ideal to be able to set priorities, so that

mserver5

...
instances that do background work get a lower I/O priority than instances serving online queries.

This is probably even more difficult than MEM and CPU limitations, since MonetDB heavily relies on mmapped files and let the OS decide what’s best. And so far, we have barely received any user requests on this particular topic...

I know about some research work on improving mmapped files, which allows application to assign a priority to each page.

Maybe madvise can help a bit here.

...
Also, recommendations on swap settings would be interesting. How much swap? How to tune swappiness kernel settings?

I am very aware that there is no simple answer to most of these questions. Many variables are in the picture. Still, some general thoughts from the developers would be appreciated.

I think I have read pretty much everything has ever been written about MonetDB, but when it comes to resource utilization I have always bumped into the very unrealistic assumption that each MonetDB instance has a whole server for itself. As I mentioned above, things could get already much better with simple improvements, like allowing to set the maximum memory usable by each instance.

But more in general, I feel there is much need for some guidelines for production environments. Or at least, to start the discussion.

Let’s try to keep this discussion more active.

Just my ¥0.02

Jennie

...
Best regards, Roberto _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Arjen de Rijke

10:54 a.m.

Hi Roberto, You are right. If MonetDB would be able to strictly adhere to resource limits, than the problem would not exist. So that is the fundamental problem. Running the database in a vm or container is a temporary solution. My remarks were about the (dis)advanteges of these alternative solutions/workarounds. Whether or not it is possible to make MonetDB (or any other database for that matter) obey strict resource limits is an entire different question. So in the mean time we still needs some alternative. Arjen ----- Original Message -----

...

From: "Roberto Cornacchia" To: "Communication channel for MonetDB users" Sent: Thursday, June 20, 2019 11:46:14 AM Subject: Re: Guidelines for MonetDB in production environments

...

Arjen,

Thanks for your input. It's all 100% correct.

I should stress that we don't use container as "lightweight VMs". We use them mainly because of the advantages they bring in terms of ease of deployment. So we are not expecting that containers simulate a VM.

The only point I disagree with is that the difference between containers and VMs is the fundamental problem here. Yes, they are different and I'm aware of that.

To me, the fundamental problem is that there are no means to prevent each MonetDB instance to claim all available memory. This has nothing to do with containers, the same happens on bare metal or on VMs.

Roberto

On Thu, 20 Jun 2019 at 11:34, Arjen de Rijke < arjen.de.rijke@cwi.nl > wrote:

Hi All,

To add to this from a "sysadmin" perspective, one fundamental problem is that "containers" and "virtual machines" appear to be similar, but fundamentally they are very different. And that manifests itself in the problems that you are describing.

This means that things like memory limits in containers look a lot like the amount of memory in a (virtual) machine, but they are really different. If a real machine runs out of memory, the kernel tries to schedule different processes in such a way that the machine continues to work, for example by swapping to disc. Only in the last resort will it kill a process. With containers this is different. There the container runtime will kill a container that exceeds the memory limit. Because the primary use-case for containers is stateless applications, such as webservers, this behaviour is perfectly fine. But if you are running statefull applications, such as databases, in containers, this creates problems.

Containers "promise" to make maintainance easier, compared to virtual machines. But this comes at a price. Building and maintaining a virtual machine is not as easy as building and maintaining a container, but the advantage of a virtual machine is that it is "exactly" the same as a real machine. So a database running inside a virtual machine behaves (more or less) the same as on physical hardaware. Under normal circumstances running a database inside a container is fine, but in the more extreme cases, you run into the problems you describe.

In his email Roberto writes: "... So far, the only mechanism I know to obtain the correect behavior is to run actual VMs for MonetDB. ..." and he is right. But that implies that the container behaviour is wrong, but that is not true. The container behaves exactly as it is supposed to do. The problem is that it does is not match Roberto's intended use case. He also says: "... this is very cumbersome and I want to avoid that as much as possible. ..." and he is not wrong, but there are tools available to build vm images relatively easy. But maintaining a set of vm's with a database on a machine is not as easy as running a set of containers, that is certainly true. But it is a trade-off. You can try to work around the limitations of containers, but that will make managing the containers more difficult and more error prone, which is exactly what you wanted to avoid in the first place. You could also try to make the database aware of the containers memory limit, but then the database has to guarantee it will not exceed this limit under any circumstance, which is impossible (i guess).

Some of Jenny's remarks can help running MonetDB in containers and maybe some changes to MonetDB might help as well. But the fundamental problem remains, containers and (virtual) machines are different and in certain cases you will notice the difference. In order to make the right choice, you need to know what your requirements are and how the different solutions are implemented. For many cases it is perfectly fine to run databases in containers, but unfortunately i don't think it will always work.

Arjen

PS, i am only talking about linux here.

----- Original Message -----

...
From: "Ying Zhang" < Y.Zhang@cwi.nl > To: "Communication channel for MonetDB users" < users-list@monetdb.org > Sent: Wednesday, June 19, 2019 11:26:37 PM Subject: Re: Guidelines for MonetDB in production environments

...
...
On 14 Jun 2019, at 15:05, Roberto Cornacchia < roberto.cornacchia@gmail.com > wrote:

Hi all,

I'm struggling with optimizing resource sharing of MonetDB in production environments (see also: https://www.monetdb.org/pipermail/users-list/2018-June/010276.html ).

Hai Roberto,

We don’t have good solution yet to share resources among MonetDB instances, but recently we have gathered some information on this topic. Let me share it here.

...
We run MonetDB instances for several projects / customers on each of our servers. Each MonetDB instance is a docker container (used mainly because of ease of deployment and environment isolation). It is not unusual to have 5-10 MonetDB containers on the same server. In principle, Docker does not even have much to do with this, but it puts everything in a realistic context.

** Memory mserver5 checks the system memory and calibrates on that. When 10 instances are running, they all assume they have the whole memory for themselves.

Docker allows to set limits on the container memory. It does that by using cgroups (so Docker just makes things easier, but it's really about cgroups). However, memory limits set by cgroups are not namespaced ( https://ops.tips/blog/why-top-inside-container-wrong-memory/#memory-limits-s... ).

We’ve often used tools such as numactrl and cgroups to limit hardware resources MonetDB can use. They indeed limit the resources available to mdb, but we only realised it recently that mdb is not aware of those limits, so it can cause various problems. This is an open issue reported here: https://www.monetdb.org/bugzilla/show_bug.cgi?id=6710

FYI, depending on the system, uses sysctl or GlobalMemoryStatusEx for memory, the former with system-dependent arguments. For number of cores mdb uses sysconf, sycctl, or GetSystemInfo. See gdk_utils.c (MT_init()) and gdk_system.c (MT_check_nr_cores()).

...
This means that each container will still see the whole memory and will simply get killed when the container limit has been reached (definitely not a solution).

It doesn’t solve the problem, but in this blog (especially at the end), we gave some ideas how to avoid the OOM-killer: https://www.monetdb.org/blog/limit_memory_usage_on_linux_with_cgroups

However, please be aware that lowering the OOM-killer priority would just make OOM-killer choose a different victim, which can be a disaster on a production server. Docker even has an option to disable OOM-killer on a container. But the consequences may be even worse, as without a victim processes can just freeze forever.

For windows, we have actually added an option *inside* mdb to limit its memory usage. I think with that one, mdb is actually aware of the limits… The code is not released yet.

...
So far, the only mechanism I know to obtain the correect behavior is to run actual VMs for MonetDB. But this is very cumbersome and I want to avoid that as much as possible. Should we let 10 instances believe they each have the whole memory, and let them fight for it? (well, that's what's happening now, and I know for sure it's bad). Perhaps the solution can be as easy as allowing an explicit max memory setting, together with some documentation on the consequences of using low / high values.

I’m also thinking about an explicit max-memory setting. One that’s similar --set gdk_nr_threads = N so that one can set it to the same amount of MEM as the limit in the external tools. It’s a bit hacky, but is probably the easiest to implement. Let me check with the others if this is something we can in short term.

An idea solution would be to let MonetDB to also check for the resource limits set by CGroups, numactl, Docker, etc. Perhaps what we need to do is look at the resource limits (getrlimit function call) to get the (soft) limit. If they are lower than what we found by using sysctl/sysconf, we should use the lower value. Actually, the Linux cgroups manual refers to getrlimit, so they may have to do with each other.

For cgroups on linux one can do amongst others: cat /proc/<PID>/cgroup to get the cgroup of the process with a specific pid. Once one knows the cgroup, one can look up the memory limits in the cgroup directory assuming sufficient permissions.

...
** CPU Again, Docker allows to set quotas per container. I think cgroups CPU limits are namespaced, so perhaps this would just work well, I haven't really tried yet.

I wonder if --set gdk_nr_threads = N can be of any help here.

...
** I/O Same issue. It would be ideal to be able to set priorities, so that mserver5 instances that do background work get a lower I/O priority than instances serving online queries.

This is probably even more difficult than MEM and CPU limitations, since MonetDB heavily relies on mmapped files and let the OS decide what’s best. And so far, we have barely received any user requests on this particular topic...

I know about some research work on improving mmapped files, which allows application to assign a priority to each page.

Maybe madvise can help a bit here.

...
Also, recommendations on swap settings would be interesting. How much swap? How to tune swappiness kernel settings?

I am very aware that there is no simple answer to most of these questions. Many variables are in the picture. Still, some general thoughts from the developers would be appreciated.

I think I have read pretty much everything has ever been written about MonetDB, but when it comes to resource utilization I have always bumped into the very unrealistic assumption that each MonetDB instance has a whole server for itself. As I mentioned above, things could get already much better with simple improvements, like allowing to set the maximum memory usable by each instance.

But more in general, I feel there is much need for some guidelines for production environments. Or at least, to start the discussion.

Let’s try to keep this discussion more active.

Just my ¥0.02

Jennie

...
Best regards, Roberto _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list

Roberto Cornacchia

10:10 a.m.

...

In his email Roberto writes: "... So far, the only mechanism I know to obtain the correect behavior is to run actual VMs for MonetDB. ..." and he is right. But that implies that the container behaviour is wrong, but that is not true.

That is not what I was implying. I was implying that the only way is to run each MonetDB in a VM that is correctly sized for it. This is misusing a VM as a workaround for the fact that MonetDB's resource usage cannot be capped.

2089

Age (days ago)

2095

Last active (days ago)

List overview

Download

5 comments

3 participants

participants (3)

Arjen de Rijke
Roberto Cornacchia
Ying Zhang