Hi, It seems to be using quite a bit of physical memory. We just tried to load on a machine with 24GB of memory and eventually the Linux OOM killer was invoked. We see resident set sizes > 20GB, and virtual memory usage in only a bit higher. Here are some more details. We have a very wide table, >20,000 columns (all bigint), that we are trying to load. We have about as many rows. Our input files are gzipped and split into 200 parts (so there are only about 100 lines per file). Here is our load script: #!/bin/bash # USAGE: ./load [db] [table] /full/path/to/files/* set -e set -u MDB_DATABASE=$1 MDB_TABLE=$2 LAST=$BASH_ARGV FILE='' while [[ "$FILE" != "$LAST" ]]; do shift || break FILE=$2 NOW=$(date +"%c") echo "Loading file ${FILE} at $NOW" COUNT=$(zcat $FILE | wc -l) STMT="COPY $COUNT RECORDS INTO $MDB_TABLE FROM '$FILE';" echo $STMT | mclient $MDB_DATABASE done What we see is that memory usage starts off low, but rises as we load more files. Sometimes it decreases a bit (like something is being flushed to disk) but it subsequently rises higher than before, until we run out of memory. We got through about 70 files on our 24GB machine. Previously, we've loaded similar data on a 34GB machine, 100 files at a time, restarting in between. We're planning on trying the following: 1. Load one large file. 2. Load a few files, stop and start the db and dbfarm, load a few more, etc. Thanks, Andrew Duffey ________________________________________ From: users-list-bounces+aduffey=merkleinc.com@monetdb.org [users-list-bounces+aduffey=merkleinc.com@monetdb.org] on behalf of Stefan Manegold [Stefan.Manegold@cwi.nl] Sent: Monday, February 04, 2013 1:26 PM To: Communication channel for MonetDB users Subject: Re: High memory usage while loading data with many columns Hi Andrew, There is no option to reduce the (virtual) memory usage of MonetDB during bulk load. MonetDB needs to have all columns active the its address space concurrently during bulkload, and hence need as much address sapce as required to hold the entire table. See also http://www.monetdb.org/Documentation/UserGuide/resources However, address space does not necessarily mean physical memory. We easily load 100 GB TPCH (including a 70+GB lineitem table) on an 8 GB machine. If you're loading in isolation (i.e., without concurrent queries), you might what to consider using the "LOCKED" option of copy into to avoid one extra copy; cf., http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto . Also, in case your table has constraints like primary (or foreign, if you had more than one table) keys or "not NULL", we advise to initially create the table without all constraints, then bulkload the data via COPY INTO, and only then add the constraints one by one using ALTER TABLE ... ADD CONSTRAINT. We'd be curios to learn more about the crash with the "select *" query. Is there any error message on the mserver5 console or in the monetdbd / merovingian log file? Could you possible build a debugging version of MonetDB (configured with --disable-optimize --enable-debug --enable-assert), run it the a debugger (e.g., gdb), and check where (and why?) it crashes? Thanks! Best, Stefan ----- Original Message -----
Hi,
I'm loading data into a table with >20,000 columns and several hundred thousand rows using "COPY INTO". It works, but I can't seem to make it take less than about 30GB of memory. I've tried specifying the number of records as well. Is there any way to reduce memory usage when loading? Is this much memory usage expected? It's a shame because our queries take very little memory. We have to size our machine simply to handle the load.
It's impressive (and very convenient) that I can actually have a table with so many columns, but are there any other limitations I might come across? I've just been trying to keep the column count in the queries low. A "select *" seems to crash MonetDB.
Thanks, Andrew
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org http://mail.monetdb.org/mailman/listinfo/users-list