Hi,
I am trying to load 77 GB of data (or 34,274,958 records) into MonetDB, using the Feb2010 SP1 source code compiled on Solaris SPARC with Sun Studio 12.1 compiler with debugging enabled, and the database crashes 5 hours into the loading. Here is some information about my configuration:
I start up with Merovingian like this:
$ ( ulimit -d $[32*1024*1024]; export LD_PRELOAD_64=/usr/lib/64/libumem.so:${LD_PRELOAD_64}; export LD_PRELOAD=/usr/lib/libumem.so:${LD_PRELOAD}; export MONETDB5CONF=/GAAL/chenher/rdcuxsrv220-local-disk/chenher/monetdb/feb2010sp1/etc/monetdb5.conf; /GAAL/chenher/share/monetdb/distro-sparc-feb2010-sp1-64bit/bin/merovingian; )
Here are the relevent log sections:
2010-04-16 13:25:28 MSG merovingian[4302]: starting database 'tar', up min/avg/max: 4m/4m/4m, crash average: 0.00 0.00 0.00 (1-1=0)
2010-04-16 13:25:29 MSG tar[15594]: arguments: /GAAL/chenher/share/monetdb/distro-sparc-feb2010-sp1-64bit/bin/mserver5 --config=/GAAL/chenher/rdcuxsrv220-local-disk/chenher/monetdb/feb2010sp1/etc/monetdb5.conf --dbname=tar --dbinit=include sql; --set merovingian_uri=mapi:monetdb://rdcuxsrv220:50000/tar --set monet_daemon=yes --set mapi_open=false --set mapi_autosense=true --set mapi_port=50001 --set monet_vault_key=/GAAL/chenher/rdcuxsrv220-local-disk/chenher/monetdb/feb2010sp1/var/MonetDB5/dbfarm/tar/.vaultkey --set sql_optimizer=default_pipe
2010-04-16 13:25:34 MSG tar[15594]: # MonetDB server v5.18.3, based on kernel v1.36.3
2010-04-16 13:25:34 MSG tar[15594]: # Serving database 'tar', using 16 threads
2010-04-16 13:25:34 MSG tar[15594]: # Compiled for sparc-sun-solaris2.10/64bit with 64bit OIDs dynamically linked
2010-04-16 13:25:34 MSG tar[15594]: # Found 32.000 GiB available main-memory.
2010-04-16 13:25:34 MSG tar[15594]: # Copyright (c) 1993-July 2008 CWI.
2010-04-16 13:25:34 MSG tar[15594]: # Copyright (c) August 2008-2010 MonetDB B.V., all rights reserved
2010-04-16 13:25:34 MSG tar[15594]: # Visit http://monetdb.cwi.nl/ for further information
2010-04-16 13:25:34 MSG tar[15594]: # Listening for connection requests on mapi:monetdb://127.0.0.1:50001/
2010-04-16 13:25:35 MSG tar[15594]: # MonetDB/SQL module v2.36.3 loaded
2010-04-16 13:25:35 MSG control[4302]: rdcuxsrv220:58146: started database 'tar'
...
2010-04-21 10:23:37 MSG merovingian[4302]: database 'tar' (15594) has crashed (dumped core)
Using Sun's DBX tool to examine the crash:
...
t@212 (l@212) terminated by signal SEGV (no mapping at the fault address)
Current function is temp_create
87 temp_dup(b->batCacheid);
(dbx) where
current thread: t@212
=>[1] temp_create(b = (nil)), line 87 in "bat_utils.c"
[2] ebat2real(b = 5714, ibase = 0), line 162 in "bat_utils.c"
[3] delta_append_bat(bat = 0x104923c98, i = 0x10143ec98), line 234 in "bat_storage.c"
[4] append_col(tr = 0x104fe1e98, c = 0x1045c7af8, i = 0x10143ec98, tpe = 5), line 267 in "bat_storage.c"
[5] mvc_append_wrap(cntxt = 0xffffffff7f352c00, mb = 0x1050d5d98, stk = 0x101816018, pci = 0x1056a7c18), line 1120 in "sql.c"
[6] runMALsequence(cntxt = 0xffffffff7f352c00, mb = 0x1050d5d98, startpc = 1, stoppc = 0, stk = 0x101816018, env = (nil), pcicaller = (nil)), line 2908 in "mal_interpreter.c"
[7] callMAL(cntxt = 0xffffffff7f352c00, mb = 0x1050d5d98, env = 0xfffffffddabff908, argv = 0xfffffffddabff968, debug = '\0'), line 402 in "mal_interpreter.c"
[8] SQLexecutePrepared(c = 0xffffffff7f352c00, be = 0x1050f0ec8, q = 0x104fe1b18), line 1196 in "sql_scenario.c"
[9] SQLengineIntern(c = 0xffffffff7f352c00, be = 0x1050f0ec8), line 1249 in "sql_scenario.c"
[10] SQLengine(c = 0xffffffff7f352c00), line 1349 in "sql_scenario.c"
[11] runPhase(c = 0xffffffff7f352c00, phase = 4), line 363 in "mal_scenario.c"
[12] runScenarioBody(c = 0xffffffff7f352c00), line 412 in "mal_scenario.c"
[13] runScenario(c = 0xffffffff7f352c00), line 438 in "mal_scenario.c"
[14] MSserveClient(dummy = 0xffffffff7f352c00), line 368 in "mal_session.c"
(dbx) threads
t@1 a l@1 ?() LWP suspended in __pollsys()
t@2 a l@2 SERVERlistenThread() LWP suspended in __pollsys()
t@3 a l@3 mvc_logmanager() LWP suspended in __pollsys()
t@4 b l@4 umem_update_thread() sleep on (unknown) in __lwp_park()
t@13 a l@13 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@14 a l@14 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@15 a l@15 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@16 a l@16 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@17 a l@17 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@18 a l@18 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@19 a l@19 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@20 a l@20 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@21 a l@21 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@22 a l@22 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@23 a l@23 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@24 a l@24 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@25 a l@25 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@26 a l@26 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@27 a l@27 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
t@28 a l@28 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park()
o> t@212 a l@212 ?() signal SIGSEGV in temp_create()
With Feb2010 release, I managed to load a much larger database without a crash. I will try loading this smaller data set again with the Feb2010 release and report what happens. In the meantime, can you give me some suggestions?
Thanks.
Hering