Hi Stefan,

Thanks a lot for the help. I really appreciate it.

On Jun 2, 2011, at 7:23 AM, Stefan Manegold wrote:

2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed
2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap
                                                                                        ^^^^^^^^^^^^^
This suggests that MonetDB "for some reason" (possibly wrongly) expects some
(intermediate) column to grow up to 3 TB in size, hence, tries to alloced
the respective memory, but fails to do so successfully.

We can try to investigate where this happens, but as much information about
your usage of MonetDB (DB schema, data, query workload) as possible would be
very helpful for us to be able to locate the origin of the problem.

3TB seems quite crazy, I wonder how I end up triggering this with a 4.3G database
(as measured by "du -hs" on disk).

The workload is a serie of updates to our reference data in a database. I am 
simulating some upserts with a combination of DELETE/COPY INTO: I want to 
refresh data about some "sessions" for a "merchant". Each merchant has a 
dedicated table, named "<merchant id>_sessions";

To refresh the session metrics I execute:

-- for each day:
   -- for each merchant

       DELETE FROM "<merchant_id>_sessions" WHERE session_start BETWEEN <start of day timestamp> AND <end of day timestamp>;
       COPY INTO "<merchant_id>_sessions" FROM STDIN USING DELIMITERS '\\t','\\n';
       ... (up to 8000 session rows for this merchant and day)

This is my poor man's way of simulating upserts as my email on the topic did 
not generate many suggestions ;-) 
http://sourceforge.net/mailarchive/forum.php?thread_name=BANLkTi%3DdX-1DFka5NRnZUEj%3DVdi3Sz-Kkg%40mail.gmail.com&forum_name=monetdb-users

I am willing to try other ways to accomplish the same thing as long as it is
performant for a bulk upserts.

I'm afraid, though, we might need to be able to replay your complete
scenario and trigger the some error with us to be able to locate and fix the
problem.

I will try again tomorrow. I will try adding an explicit maximum number of records 
with COPY 8000 RECORDS INTO ... to see if it makes any difference.

I see from your logs that you are using the latest Apr2011-SP1 release
(64-bit on a 64-bit Linux system). Did you experience the problem also with
earlier releases of MonetDB?

Yes indeed it first happened with Apr2011. I then upgraded to Apr2011-SP1, 
started with a fresh database, reran the import/refresh from scratch and was 
able to reproduce the problem again.


The server crashes with a segmentation fault, and we'd need to know where in
the code (and why) this happens.
I only(?) way to find out would be to start the server by hand in a
debugger, using the same commandline options as monetdbd (merovingian) uses
(see your log below), e.g.,

It seems to SIGSEGV when processing the sql logs:

(gdb) run
Starting program: /opt/local/bin/mserver5 --set gdk_dbfarm /mnt/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open false --set mapi_port 0 --set mapi_usock /mnt/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key /mnt/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon yes
[Thread debugging using libthread_db enabled]
ERROR: wrong format gdk_dbfarm
ERROR: wrong format merovingian_uri
ERROR: wrong format mapi_open
ERROR: wrong format mapi_port
ERROR: wrong format mapi_usock
ERROR: wrong format monet_vault_key
ERROR: wrong format monet_daemon
# MonetDB 5 server v11.3.3 "Apr2011-SP1"
# Serving database 'prod_reporting', using 2 threads
# Compiled for x86_64-unknown-linux-gnu/64bit with 64bit OIDs dynamically linked
# Found 7.294 GiB available main-memory.
# Copyright (c) 1993-July 2008 CWI.
# Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved
# Visit http://monetdb.cwi.nl/ for further information
[New Thread 0x7fffea474700 (LWP 15213)]
# Listening for connection requests on mapi:monetdb://127.0.0.1:50000/
# MonetDB/GIS module loaded
# MonetDB/SQL module loaded

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6788a13 in logger_new (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0, 
    prefuncp=0x7fffea594f8c <bl_preversion>, postfuncp=0x7fffea59503d <bl_postversion>) at gdk_logger.mx:1192
1192 BATloop(b, p, q) {
(gdb) where
#0  0x00007ffff6788a13 in logger_new (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0, 
    prefuncp=0x7fffea594f8c <bl_preversion>, postfuncp=0x7fffea59503d <bl_postversion>) at gdk_logger.mx:1192
#1  0x00007ffff678925e in logger_create (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0, 
    prefuncp=0x7fffea594f8c <bl_preversion>, postfuncp=0x7fffea59503d <bl_postversion>) at gdk_logger.mx:1254
#2  0x00007fffea595604 in bl_create (logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", cat_version=51100) at bat_logger.c:130
#3  0x00007fffea581bcb in store_init (debug=0, store=store_bat, logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", stk=0) at store.c:1315
#4  0x00007fffea518c4d in mvc_init (dbname=0x1010cf8 "prod_reporting", debug=0, store=store_bat, stk=0) at sql_mvc.c:51
#5  0x00007fffea4eaaa1 in SQLinit () at sql_scenario.mx:272
#6  0x00007fffea4ea717 in SQLprelude () at sql_scenario.mx:199
#7  0x00007ffff6dc7268 in runMALsequence (cntxt=0x606580, mb=0x102bd48, startpc=1, stoppc=0, stk=0x19cce48, env=0x0, pcicaller=0x0) at mal_interpreter.mx:2052
#8  0x00007ffff6db95a4 in runMAL (cntxt=0x606580, mb=0x102bd48, startpc=1, mbcaller=0x0, env=0x0, pcicaller=0x0) at mal_interpreter.mx:341
#9  0x00007ffff6e0dd1a in MALengine (c=0x606580) at mal_session.mx:680
#10 0x00007ffff6e0c450 in malBootstrap () at mal_session.mx:95
#11 0x00007ffff6d9690b in mal_init () at mal.mx:383
#12 0x000000000040331d in main (argc=23, av=0x7fffffffe4d8) at mserver5.c:546
(gdb)  thr app all bt

Thread 2 (Thread 0x7fffea474700 (LWP 15213)):
#0  0x00007ffff3c8867e in __lll_lock_wait_private () from /lib/libpthread.so.0
#1  0x00007ffff3c8208e in _L_lock_4442 () from /lib/libpthread.so.0
#2  0x00007ffff3c81c3e in start_thread () from /lib/libpthread.so.0
#3  0x00007ffff39dd92d in clone () from /lib/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7ffff7fe4720 (LWP 15210)):
#0  0x00007ffff6788a13 in logger_new (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0, 
    prefuncp=0x7fffea594f8c <bl_preversion>, postfuncp=0x7fffea59503d <bl_postversion>) at gdk_logger.mx:1192
#1  0x00007ffff678925e in logger_create (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0, 
    prefuncp=0x7fffea594f8c <bl_preversion>, postfuncp=0x7fffea59503d <bl_postversion>) at gdk_logger.mx:1254
#2  0x00007fffea595604 in bl_create (logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", cat_version=51100) at bat_logger.c:130
#3  0x00007fffea581bcb in store_init (debug=0, store=store_bat, logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", stk=0) at store.c:1315
#4  0x00007fffea518c4d in mvc_init (dbname=0x1010cf8 "prod_reporting", debug=0, store=store_bat, stk=0) at sql_mvc.c:51
#5  0x00007fffea4eaaa1 in SQLinit () at sql_scenario.mx:272
#6  0x00007fffea4ea717 in SQLprelude () at sql_scenario.mx:199
#7  0x00007ffff6dc7268 in runMALsequence (cntxt=0x606580, mb=0x102bd48, startpc=1, stoppc=0, stk=0x19cce48, env=0x0, pcicaller=0x0) at mal_interpreter.mx:2052
#8  0x00007ffff6db95a4 in runMAL (cntxt=0x606580, mb=0x102bd48, startpc=1, mbcaller=0x0, env=0x0, pcicaller=0x0) at mal_interpreter.mx:341
#9  0x00007ffff6e0dd1a in MALengine (c=0x606580) at mal_session.mx:680
#10 0x00007ffff6e0c450 in malBootstrap () at mal_session.mx:95
#11 0x00007ffff6d9690b in mal_init () at mal.mx:383
#12 0x000000000040331d in main (argc=23, av=0x7fffffffe4d8) at mserver5.c:546


Thanks a lot for your help,
- Philippe