Hello list, we were wondering about the purpose of GDK_mmap_minsize when creating transient columns. The attached patch will always *try* to malloc/realloc a transient column but still fall back to memory-mapped files if malloc should fail. This dramatically improves performance. Any good reason why this should not be the default behaviour? Thanks, Mark and Hannes
As far as I understand it, malloc on Linux will happily succeed even if there is not enough memory+swap to hold all data. So you can't rely on malloc failures to tell you to switch to mmap. On 09/20/2016 06:19 PM, Hannes Mühleisen wrote:
Hello list,
we were wondering about the purpose of GDK_mmap_minsize when creating transient columns. The attached patch will always *try* to malloc/realloc a transient column but still fall back to memory-mapped files if malloc should fail. This dramatically improves performance. Any good reason why this should not be the default behaviour?
Thanks,
Mark and Hannes
-- Sjoerd Mullender -- Sjoerd Mullender
If I may add, that is indeed the default behaviour of the kernel, which can
be disabled with
vm.overcommit_memory = 2
in /etc/sysctl.conf
Perhaps MonetDB could check this system setting and decide on which
strategy to use?
On 20 September 2016 at 18:24, Sjoerd Mullender
As far as I understand it, malloc on Linux will happily succeed even if there is not enough memory+swap to hold all data. So you can't rely on malloc failures to tell you to switch to mmap.
Hello list,
we were wondering about the purpose of GDK_mmap_minsize when creating
On 09/20/2016 06:19 PM, Hannes Mühleisen wrote: transient columns. The attached patch will always *try* to malloc/realloc a transient column but still fall back to memory-mapped files if malloc should fail. This dramatically improves performance. Any good reason why this should not be the default behaviour?
Thanks,
Mark and Hannes
-- Sjoerd Mullender
-- Sjoerd Mullender
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
It is interesting that malloc is faster than mmap.
Could you give more details about the setting of your experiment?
How many threads, how many concurrent files (BATs), size of files,
type of data, kind of accesses etc.
----- Original Message -----
From: Roberto Cornacchia
As far as I understand it, malloc on Linux will happily succeed even if there is not enough memory+swap to hold all data. So you can't rely on malloc failures to tell you to switch to mmap.
Hello list,
we were wondering about the purpose of GDK_mmap_minsize when creating
On 09/20/2016 06:19 PM, Hannes Mühleisen wrote: transient columns. The attached patch will always *try* to malloc/realloc a transient column but still fall back to memory-mapped files if malloc should fail. This dramatically improves performance. Any good reason why this should not be the default behaviour?
Thanks,
Mark and Hannes
-- Sjoerd Mullender
-- Sjoerd Mullender
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
Hi Foteini,
On 20 Sep 2016, at 23:45, Foteini Alvanaki
wrote: It is interesting that malloc is faster than mmap. Yeah this is one of the magic MonetDB parameters…
Could you give more details about the setting of your experiment?
just running select (i*2) from table; i is a column containing 100M integers. Ran on my laptop (OSX) with 4 threads.
How many threads, how many concurrent files (BATs), size of files, type of data, kind of accesses etc.
----- Original Message ----- From: Roberto Cornacchia
To: Communication channel for developers of the MonetDB suite. Sent: Tue, 20 Sep 2016 18:40:41 +0200 (CEST) Subject: Re: GDK_mmap_minsize again If I may add, that is indeed the default behaviour of the kernel, which can be disabled with
vm.overcommit_memory = 2 in /etc/sysctl.conf
Perhaps MonetDB could check this system setting and decide on which strategy to use?
On 20 September 2016 at 18:24, Sjoerd Mullender
wrote: As far as I understand it, malloc on Linux will happily succeed even if there is not enough memory+swap to hold all data. So you can't rely on malloc failures to tell you to switch to mmap.
Hello list,
we were wondering about the purpose of GDK_mmap_minsize when creating
On 09/20/2016 06:19 PM, Hannes Mühleisen wrote: transient columns. The attached patch will always *try* to malloc/realloc a transient column but still fall back to memory-mapped files if malloc should fail. This dramatically improves performance. Any good reason why this should not be the default behaviour?
Thanks,
Mark and Hannes
-- Sjoerd Mullender
-- Sjoerd Mullender
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
To expand on the previous experiment, I ran it again on three different systems. The query: SELECT MIN(i * 2) FROM integers; Where ‘integers' contains 100M randomly generated integers between 0-100. The MAL operations performed are upcasting from int to lng (to prevent multiplication from overflowing) and the actual multiplication, requiring two allocations of large transient bats. The server had either “—set gdk_mmap_minsize=1000000000000” (to force the server to use malloc for the intermediates) or no parameters (resulting in the intermediates being stored in a memory mapped file) on the Jun2016 SP2 release candidate. The results seem to be mainly operating system related. Performing the test on three operating systems (Windows 10, Fedora 24 and OSX 10.11) results in the following timings. Windows 10 mmap: 10.3 seconds malloc: 0.9 seconds Fedora 24 mmap: 2.0 seconds malloc: 1.7 seconds OSX 10.11 mmap: 4.5 seconds malloc: 1.1 seconds On Fedora, the difference between mmap and malloc does not seem to be very significant. Malloc is slightly faster, but not by much. On both Windows and OSX, mmap is very slow. Especially on Windows the performance difference is extremely noticeable. Considering malloc behaves ‘as expected’ (returns NULL if there is not enough physical memory) on Windows and OSX, I suggest setting gdk_mmap_minsize to its maximum value on those systems and letting malloc failures dictate when to switch to mmap for large files. Regards, Mark
On 22 Sep 2016, at 10:44, Hannes Mühleisen
wrote: Hi Foteini,
On 20 Sep 2016, at 23:45, Foteini Alvanaki
mailto:F.Alvanaki@cwi.nl> wrote: It is interesting that malloc is faster than mmap. Yeah this is one of the magic MonetDB parameters…
Could you give more details about the setting of your experiment?
just running select (i*2) from table; i is a column containing 100M integers. Ran on my laptop (OSX) with 4 threads.
How many threads, how many concurrent files (BATs), size of files, type of data, kind of accesses etc.
----- Original Message ----- From: Roberto Cornacchia
To: Communication channel for developers of the MonetDB suite. Sent: Tue, 20 Sep 2016 18:40:41 +0200 (CEST) Subject: Re: GDK_mmap_minsize again If I may add, that is indeed the default behaviour of the kernel, which can be disabled with
vm.overcommit_memory = 2 in /etc/sysctl.conf
Perhaps MonetDB could check this system setting and decide on which strategy to use?
On 20 September 2016 at 18:24, Sjoerd Mullender
wrote: As far as I understand it, malloc on Linux will happily succeed even if there is not enough memory+swap to hold all data. So you can't rely on malloc failures to tell you to switch to mmap.
Hello list,
we were wondering about the purpose of GDK_mmap_minsize when creating
On 09/20/2016 06:19 PM, Hannes Mühleisen wrote: transient columns. The attached patch will always *try* to malloc/realloc a transient column but still fall back to memory-mapped files if malloc should fail. This dramatically improves performance. Any good reason why this should not be the default behaviour?
Thanks,
Mark and Hannes
-- Sjoerd Mullender
-- Sjoerd Mullender
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list
_______________________________________________ developers-list mailing list developers-list@monetdb.org mailto:developers-list@monetdb.org https://www.monetdb.org/mailman/listinfo/developers-list https://www.monetdb.org/mailman/listinfo/developers-list
participants (5)
-
Foteini Alvanaki
-
Hannes Mühleisen
-
Mark Raasveldt
-
Roberto Cornacchia
-
Sjoerd Mullender