[MonetDB-users] Core Dump upon COPY on SPARC
Hi, I am trying to load 77 GB of data (or 34,274,958 records) into MonetDB, using the Feb2010 SP1 source code compiled on Solaris SPARC with Sun Studio 12.1 compiler with debugging enabled, and the database crashes 5 hours into the loading. Here is some information about my configuration: I start up with Merovingian like this: $ ( ulimit -d $[32*1024*1024]; export LD_PRELOAD_64=/usr/lib/64/libumem.so:${LD_PRELOAD_64}; export LD_PRELOAD=/usr/lib/libumem.so:${LD_PRELOAD}; export MONETDB5CONF=/GAAL/chenher/rdcuxsrv220-local-disk/chenher/monetdb/feb2010sp1/etc/monetdb5.conf; /GAAL/chenher/share/monetdb/distro-sparc-feb2010-sp1-64bit/bin/merovingian; ) Here are the relevent log sections: 2010-04-16 13:25:28 MSG merovingian[4302]: starting database 'tar', up min/avg/max: 4m/4m/4m, crash average: 0.00 0.00 0.00 (1-1=0) 2010-04-16 13:25:29 MSG tar[15594]: arguments: /GAAL/chenher/share/monetdb/distro-sparc-feb2010-sp1-64bit/bin/mserver5 --config=/GAAL/chenher/rdcuxsrv220-local-disk/chenher/monetdb/feb2010sp1/etc/monetdb5.conf --dbname=tar --dbinit=include sql; --set merovingian_uri=mapi:monetdb://rdcuxsrv220:50000/tar --set monet_daemon=yes --set mapi_open=false --set mapi_autosense=true --set mapi_port=50001 --set monet_vault_key=/GAAL/chenher/rdcuxsrv220-local-disk/chenher/monetdb/feb2010sp1/var/MonetDB5/dbfarm/tar/.vaultkey --set sql_optimizer=default_pipe 2010-04-16 13:25:34 MSG tar[15594]: # MonetDB server v5.18.3, based on kernel v1.36.3 2010-04-16 13:25:34 MSG tar[15594]: # Serving database 'tar', using 16 threads 2010-04-16 13:25:34 MSG tar[15594]: # Compiled for sparc-sun-solaris2.10/64bit with 64bit OIDs dynamically linked 2010-04-16 13:25:34 MSG tar[15594]: # Found 32.000 GiB available main-memory. 2010-04-16 13:25:34 MSG tar[15594]: # Copyright (c) 1993-July 2008 CWI. 2010-04-16 13:25:34 MSG tar[15594]: # Copyright (c) August 2008-2010 MonetDB B.V., all rights reserved 2010-04-16 13:25:34 MSG tar[15594]: # Visit http://monetdb.cwi.nl/ for further information 2010-04-16 13:25:34 MSG tar[15594]: # Listening for connection requests on mapi:monetdb://127.0.0.1:50001/ 2010-04-16 13:25:35 MSG tar[15594]: # MonetDB/SQL module v2.36.3 loaded 2010-04-16 13:25:35 MSG control[4302]: rdcuxsrv220:58146: started database 'tar' ... 2010-04-21 10:23:37 MSG merovingian[4302]: database 'tar' (15594) has crashed (dumped core) Using Sun's DBX tool to examine the crash: ... t@212 (l@212) terminated by signal SEGV (no mapping at the fault address) Current function is temp_create 87 temp_dup(b->batCacheid); (dbx) where current thread: t@212 =>[1] temp_create(b = (nil)), line 87 in "bat_utils.c" [2] ebat2real(b = 5714, ibase = 0), line 162 in "bat_utils.c" [3] delta_append_bat(bat = 0x104923c98, i = 0x10143ec98), line 234 in "bat_storage.c" [4] append_col(tr = 0x104fe1e98, c = 0x1045c7af8, i = 0x10143ec98, tpe = 5), line 267 in "bat_storage.c" [5] mvc_append_wrap(cntxt = 0xffffffff7f352c00, mb = 0x1050d5d98, stk = 0x101816018, pci = 0x1056a7c18), line 1120 in "sql.c" [6] runMALsequence(cntxt = 0xffffffff7f352c00, mb = 0x1050d5d98, startpc = 1, stoppc = 0, stk = 0x101816018, env = (nil), pcicaller = (nil)), line 2908 in "mal_interpreter.c" [7] callMAL(cntxt = 0xffffffff7f352c00, mb = 0x1050d5d98, env = 0xfffffffddabff908, argv = 0xfffffffddabff968, debug = '\0'), line 402 in "mal_interpreter.c" [8] SQLexecutePrepared(c = 0xffffffff7f352c00, be = 0x1050f0ec8, q = 0x104fe1b18), line 1196 in "sql_scenario.c" [9] SQLengineIntern(c = 0xffffffff7f352c00, be = 0x1050f0ec8), line 1249 in "sql_scenario.c" [10] SQLengine(c = 0xffffffff7f352c00), line 1349 in "sql_scenario.c" [11] runPhase(c = 0xffffffff7f352c00, phase = 4), line 363 in "mal_scenario.c" [12] runScenarioBody(c = 0xffffffff7f352c00), line 412 in "mal_scenario.c" [13] runScenario(c = 0xffffffff7f352c00), line 438 in "mal_scenario.c" [14] MSserveClient(dummy = 0xffffffff7f352c00), line 368 in "mal_session.c" (dbx) threads t@1 a l@1 ?() LWP suspended in __pollsys() t@2 a l@2 SERVERlistenThread() LWP suspended in __pollsys() t@3 a l@3 mvc_logmanager() LWP suspended in __pollsys() t@4 b l@4 umem_update_thread() sleep on (unknown) in __lwp_park() t@13 a l@13 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@14 a l@14 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@15 a l@15 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@16 a l@16 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@17 a l@17 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@18 a l@18 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@19 a l@19 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@20 a l@20 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@21 a l@21 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@22 a l@22 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@23 a l@23 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@24 a l@24 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@25 a l@25 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@26 a l@26 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@27 a l@27 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@28 a l@28 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() o> t@212 a l@212 ?() signal SIGSEGV in temp_create() With Feb2010 release, I managed to load a much larger database without a crash. I will try loading this smaller data set again with the Feb2010 release and report what happens. In the meantime, can you give me some suggestions? Thanks. Hering
On Wed, Apr 21, 2010 at 10:48:48AM -0700, Hering Cheng wrote:
Hi,
I am trying to load 77 GB of data (or 34,274,958 records) into MonetDB, using the Feb2010 SP1 source code compiled on Solaris SPARC with Sun Studio 12.1 compiler with debugging enabled, and the database crashes 5 hours into the loading. Here is some information about my configuration:
I start up with Merovingian like this:
$ ( ulimit -d $[32*1024*1024]; export LD_PRELOAD_64=/usr/lib/64/ libumem.so:${LD_PRELOAD_64}; export LD_PRELOAD=/usr/lib/libumem.so:$ {LD_PRELOAD}; export MONETDB5CONF=/GAAL/chenher/rdcuxsrv220-local-disk/ chenher/monetdb/feb2010sp1/etc/monetdb5.conf; /GAAL/chenher/share/ monetdb/distro-sparc-feb2010-sp1-64bit/bin/merovingian; )
Here are the relevent log sections:
2010-04-16 13:25:28 MSG merovingian[4302]: starting database 'tar', up min/avg/max: 4m/4m/4m, crash average: 0.00 0.00 0.00 (1-1=0) 2010-04-16 13:25:29 MSG tar[15594]: arguments: /GAAL/chenher/share/ monetdb/distro-sparc-feb2010-sp1-64bit/bin/mserver5 --config=/GAAL/ chenher/rdcuxsrv220-local-disk/chenher/monetdb/feb2010sp1/etc/ monetdb5.conf --dbname=tar --dbinit=include sql; --set merovingian_uri= mapi:monetdb://rdcuxsrv220:50000/tar --set monet_daemon=yes --set mapi_open=false --set mapi_autosense=true --set mapi_port=50001 --set monet_vault_key=/GAAL/chenher/rdcuxsrv220-local-disk/chenher/monetdb/ feb2010sp1/var/MonetDB5/dbfarm/tar/.vaultkey --set sql_optimizer= default_pipe 2010-04-16 13:25:34 MSG tar[15594]: # MonetDB server v5.18.3, based on kernel v1.36.3 2010-04-16 13:25:34 MSG tar[15594]: # Serving database 'tar', using 16 threads 2010-04-16 13:25:34 MSG tar[15594]: # Compiled for sparc-sun-solaris2.10/64bit with 64bit OIDs dynamically linked 2010-04-16 13:25:34 MSG tar[15594]: # Found 32.000 GiB available main-memory. 2010-04-16 13:25:34 MSG tar[15594]: # Copyright (c) 1993-July 2008 CWI. 2010-04-16 13:25:34 MSG tar[15594]: # Copyright (c) August 2008-2010 MonetDB B.V., all rights reserved 2010-04-16 13:25:34 MSG tar[15594]: # Visit http://monetdb.cwi.nl/ for further information 2010-04-16 13:25:34 MSG tar[15594]: # Listening for connection requests on mapi:monetdb://127.0.0.1:50001/ 2010-04-16 13:25:35 MSG tar[15594]: # MonetDB/SQL module v2.36.3 loaded 2010-04-16 13:25:35 MSG control[4302]: rdcuxsrv220:58146: started database 'tar' ... 2010-04-21 10:23:37 MSG merovingian[4302]: database 'tar' (15594) has crashed (dumped core)
Using Sun's DBX tool to examine the crash:
... t@212 (l@212) terminated by signal SEGV (no mapping at the fault address) Current function is temp_create 87 temp_dup(b->batCacheid); (dbx) where current thread: t@212 =>[1] temp_create(b = (nil)), line 87 in "bat_utils.c" [2] ebat2real(b = 5714, ibase = 0), line 162 in "bat_utils.c" [3] delta_append_bat(bat = 0x104923c98, i = 0x10143ec98), line 234 in "bat_storage.c" [4] append_col(tr = 0x104fe1e98, c = 0x1045c7af8, i = 0x10143ec98, tpe = 5), line 267 in "bat_storage.c" [5] mvc_append_wrap(cntxt = 0xffffffff7f352c00, mb = 0x1050d5d98, stk = 0x101816018, pci = 0x1056a7c18), line 1120 in "sql.c" [6] runMALsequence(cntxt = 0xffffffff7f352c00, mb = 0x1050d5d98, startpc = 1, stoppc = 0, stk = 0x101816018, env = (nil), pcicaller = (nil)), line 2908 in "mal_interpreter.c" [7] callMAL(cntxt = 0xffffffff7f352c00, mb = 0x1050d5d98, env = 0xfffffffddabff908, argv = 0xfffffffddabff968, debug = '\0'), line 402 in "mal_interpreter.c" [8] SQLexecutePrepared(c = 0xffffffff7f352c00, be = 0x1050f0ec8, q = 0x104fe1b18), line 1196 in "sql_scenario.c" [9] SQLengineIntern(c = 0xffffffff7f352c00, be = 0x1050f0ec8), line 1249 in "sql_scenario.c" [10] SQLengine(c = 0xffffffff7f352c00), line 1349 in "sql_scenario.c" [11] runPhase(c = 0xffffffff7f352c00, phase = 4), line 363 in "mal_scenario.c" [12] runScenarioBody(c = 0xffffffff7f352c00), line 412 in "mal_scenario.c" [13] runScenario(c = 0xffffffff7f352c00), line 438 in "mal_scenario.c" [14] MSserveClient(dummy = 0xffffffff7f352c00), line 368 in "mal_session.c" (dbx) threads t@1 a l@1 ?() LWP suspended in __pollsys() t@2 a l@2 SERVERlistenThread() LWP suspended in __pollsys () t@3 a l@3 mvc_logmanager() LWP suspended in __pollsys() t@4 b l@4 umem_update_thread() sleep on (unknown) in __lwp_park() t@13 a l@13 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@14 a l@14 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@15 a l@15 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@16 a l@16 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@17 a l@17 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@18 a l@18 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@19 a l@19 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@20 a l@20 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@21 a l@21 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@22 a l@22 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@23 a l@23 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@24 a l@24 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@25 a l@25 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@26 a l@26 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@27 a l@27 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() t@28 a l@28 runDFLOWworker() sleep on 0x101ea9a48 in __lwp_park() o> t@212 a l@212 ?() signal SIGSEGV in temp_create()
With Feb2010 release, I managed to load a much larger database without a crash. I will try loading this smaller data set again with the Feb2010 release and report what happens. In the meantime, can you give me some suggestions?
Thanks. Hering
Are you loading with a single COPY into or with several insert into? Niels
------------------------------------------------------------------------------
_______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- Niels Nes, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: http://www.cwi.nl/~niels e-mail: Niels.Nes@cwi.nl
participants (2)
-
Hering Cheng
-
Niels Nes