On Fri, Mar 19, 2010 at 08:27:34AM -0700, Hering Cheng wrote:
Thank you, Stefan. This is extremely helpful. In fact, this is the best description on how MonetDB uses memory that I have run across.
During a previous test with the same data, I did lift the virtual memory restriction on mserver5 and it did succeed in loading the data. However, the downside is that it seemed to end up consuming all of the physical memory on the server (a Solaris SPARC with 16 CPUs, 64 GB of RAM and 100 GB
Well, yes. For large/huge databases, we (MonetDB) assume a dedicated database server, and hence feel free to indeed use the available memory to acheive best performance ;-)
of swap). So I needed a way to ensure other applications still have memory. I had previously asked a question on how to limit the amount of memory used by MonetDB in Sourceforge Tracker (probably the wrong forum) without getting any response. Thus my use of ulimit. Another way may be to use Solaris zone or a virtual machine to contain MonetDB.
In that case, you might want to limit the process' memory resident data segment, only, not it's total virtual memory footprint (incl. memory mapped files), i.e., use `ulimit -d` instead of `ulimit -v`: $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited ^^^^^^^^^^^^^ ^^^^^^^^^ [...] $ mserver5 --version MonetDB server v5.18.2 (64-bit), based on kernel v1.36.2 (64-bit oids) Copyright (c) 1993-July 2008 CWI Copyright (c) August 2008-2010 MonetDB B.V., all rights reserved Visit http://monetdb.cwi.nl/ for further information Found 7.8GiB available memory, 4 available cpu cores ^^^^^^ [...] $ ulimit -d $[1024*1024] $ ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) 1048576 ^^^^^^^^^^^^^ ^^^^^^^ [...] $ mserver5 --version MonetDB server v5.18.2 (64-bit), based on kernel v1.36.2 (64-bit oids) Copyright (c) 1993-July 2008 CWI Copyright (c) August 2008-2010 MonetDB B.V., all rights reserved Visit http://monetdb.cwi.nl/ for further information Found 1.0GiB available memory, 4 available cpu cores ^^^^^^
From what I understand from your description, in theory I can run MonetDB on a 64-bit server with combined RAM and swap space of a few *gigabytes*, and still succeed at loading *terabytes* of data *and* performing *arbitrarily* complex queries subsequently. The only cost would be in performance. I'd be willing to pay for that performance cost if that is indeed the case.
In theory, yes ... (In both theory and practice, you'll need a disk systems that has sufficient capacity for your "terabytes".) Note though, that the performance penalty for columns that exceeding physical memory depends (also) on the performance of your I/O system and the efficiency of your OS's mmap implementation.
I like what I have seen in MonetDB and would love to use it if (a) my understanding expressed in the previous paragraph is correct, and (b) I can find out how to limit the amount of RAM and swap that mserver5 uses without limiting its virtual address space.
See above. We (MonetDB) try to avoid using swap as much as possible by allocating all chucks larger than 128 MB from memory mapped files. In case MonetDB still uses "too much" swap in your case, please report the detail and we might find a way to tune the swap usage of MonetDB. Stefan
Thanks again.
On Fri, Mar 19, 2010 at 7:59 AM, Stefan Manegold
wrote: Hering,
MonetDB requires all data that needs to be active at any given point in time to fit into the address space --- and of course to fit on the storage device, i.e., your disk system). On 32-bit systems, the address space is at most 32-bit, i.e., at most 4 GB; in practice, it is actually limited to 3 GB or even 2 GB on most systems. On 64-bit systems, the address space can theoritcally be 64-bit, but in pactice is often "limited" to 48-bit or so --- not that that makes any difference ...
MonetDB excessively uses main memory for processing, but does not require that all data fit in the available physical memory. To handle dataset that exceed the available physical memory, MonetDB does not (only) rely on the available swap space, but (also) uses memory-mapped files to exploit disk storage beyond the swap space as virtual memory.
For example, while bulk-loading data (preferably via a COPY INTO statements from a (possibly compressed) CSV file), MonetDB need to have all column of the table that is currently being loaded "active", i.e., accessable in the address space. However, during loading, parts of the data are continuously written to the persisten files on disk, i.e., the whole table does not have to fit into main memory. E.g., loading a 100 GB table works fine on a system with 8 GB RAM and 16 GB swap -- provided there is sufficient free disk space.
During query processing, MonetDB requires for each operation during the query execution that all its inputs, its outputs, and possible temporary data structure fit in the address space. MonetDB automatically resorts to virtual memory and memory mapped files for large intermedate results. Also (large) persitent tables are accessed using memory mapping.
While running, you might see your mserver5 process' virtual size grow well beyond the available physical memory and possibly also well beyond your swap space. In principle, this is no problem at all. Most of this virtual size is due to the fact that large base tables (or intermediate results) that rised as files on disk are memory-mapped into the address space. However, those parts of the data that are currently not accessed do not consume any physical memory (except possible for caching purposes).
However, if individual columns of your table(s) and/or indivdual columns of intermediate results exceed the size of the available physical memory, the performance of MonetDB might (will) decrease due to increased I/O requirements.
In your case, you indeed limit the virtual size (including memory-mapped files) of your mserver5 to 32 GB, and hence cannot load your 115 GB table.
Did you also try without limiting the virtual size? Did that work?
If that does not work, either, could you please report in detail what happends (errors? crashes?) and share some details about your data, in particular the schema of your table (number and types of columns; number of records to be loaded; whether there are any constraints (primary-/foreign-keys, etc.) defined on the table, etc.) --- knowing your "create table" and "copy into" statements would help us a lot, then.
Hope this clearifies some things and helps you further.
Don't hesitate to ask, in case not, or in case you have more questions.
Kind regards, Stefan
On Fri, Mar 19, 2010 at 07:18:21AM -0700, Hering Cheng wrote:
Hi,
I am trying to understand the limitations of MonetDB with regard to memory. It seems that the amount of virtual memory (RAM plus swap space) available to MonetDB must be greater than the amount of data. Before I give up on MonetDB, can someone please confirm or correct my understanding? I did not understand the virtual memory requirement when I read the following in its documentation (
):
*The maximal database size supported by MonetBD depends on the underlying processing platform, e.g., a 32- or 64-bit processor, and storage device, e.g., the file system and disk raids. *
For example, I started up merovingian (using the Feb 2010 release),
http://monetdb.cwi.nl/MonetDB/Documentation/Size-Limitations.html#Size-Limit... limiting
it and its child processes to 32 GB of virtual memory (via ulimit -v). I then try to load 115 GB of data into a single table. mclient reports the following:
MAPI = chenher@myserver50000 COPY 472112651 RECORDS INTO mytable FROM 'mydata.txt' DELIMITERS '|', '\n'; ERROR = !SQLException:importTable:failed to import table !ERROR: HEAPextend: failed to extend to 1888450604 for 17/1734tail !ERROR: TABLETcreate_bats: Failed to create bat of size 472112651
I understand if MonetDB must be able to fit all its data within virtual memory. But if this is the case, then unfortunately I cannot use it, since I have terabytes of data and I do not have that much virtual memory on my hardware.
Thank you. Hering
------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/http://www.cwi.nl/%7Emanegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |
-- | Dr. Stefan Manegold | mailto:Stefan.Manegold@cwi.nl | | CWI, P.O.Box 94079 | http://www.cwi.nl/~manegold/ | | 1090 GB Amsterdam | Tel.: +31 (20) 592-4212 | | The Netherlands | Fax : +31 (20) 592-4199 |