Re: [MonetDB-users] Server Crash Upon Connect
Resent, with some of thread details removed due to limits on message length.
On Thu, May 27, 2010 at 2:49 PM, Hering Cheng
Hi,
MonetDB core dumped after I had 10 Java threads retrieving a total of 23 million records from a single table via JDBC. The threads actually all completed successfully. The crash seems to occur when another process tried to open a connection to the same database. I can reproduce the crash consistently.
After the core file is produced, processes can connect to MonetDB successfully, with Merovingian starting up the dead mserver5 automatically.
$ tail ~/gaal/rdcuxsrv220-local-disk/chenher/monetdb/feb2010/var/log/MonetDB/merovingian.log 2010-05-27 14:27:39 MSG merovingian[19478]: client rdcuxsrv220:61292 has disconnected from proxy 2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61296 has disconnected from proxy 2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61294 has disconnected from proxy 2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61298 has disconnected from proxy 2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61288 has disconnected from proxy 2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61304 has disconnected from proxy 2010-05-27 14:27:42 MSG merovingian[19478]: client rdcuxsrv220:61302 has disconnected from proxy 2010-05-27 14:32:33 MSG merovingian[19478]: proxying client rdcuxsrv220:61784 for database 'taq' to mapi:monetdb://127.0.0.1:50001/taq 2010-05-27 14:33:01 MSG merovingian[19478]: client rdcuxsrv220:61784 has disconnected from proxy 2010-05-27 14:34:06 MSG merovingian[19478]: database 'taq' (17289) has crashed (dumped core)
This is a Solaris SPARC server and MonetDB was built from the Feb2010 source code using Sun Studio 12.1:
$ ( ulimit -d $[32*1024*1024] && ulimit -n $[10*1024]; export LD_PRELOAD_64=/usr/lib/64/libumem.so:${LD_PRELOAD_64}; export LD_PRELOAD=/usr/lib/libumem.so:${LD_PRELOAD}; export MONETDB5CONF=/GAAL/chenher/rdcuxsrv220-local-disk/chenher/monetdb/feb2010/etc/monetdb5.conf; /GAAL/chenher/share/monetdb/distro-sparc-feb2010-64bit-debug/bin/mserver5 --version; ) MonetDB server v5.18.1 (64-bit), based on kernel v1.36.1 (64-bit oids) Copyright (c) 1993-July 2008 CWI Copyright (c) August 2008-2010 MonetDB B.V., all rights reserved Visit http://monetdb.cwi.nl/ for further information Found 32.0GiB available memory, 16 available cpu cores Configured for prefix: /GAAL/chenher/share/monetdb/distro-sparc-feb2010-64bit-debug Libraries: libpcre: 8.01 2010-01-19 (compiled with 8.01) openssl: OpenSSL 0.9.8k 25 Mar 2009 (compiled with ) libxml2: 2.6.23 (compiled with 2.6.23) Compiled by: chenher@rdcuxsrv220 (sparc-sun-solaris2.10) Compilation: cc -m64 -xcode=pic32 -I/GAAL/chenher/share/hans_boehm_gc/distro-sparc-6.8-64bit/include/ -g Linking : /usr/ccs/bin/ld -m64 -L/GAAL/chenher/share/openssl-64bit/lib -L/GAAL/chenher/share/pcre-64bit/lib -L/GAAL/chenher/share/hans_boehm_gc/distro-sparc-6.8-64bit/lib/
$ dbx ~/gaal/share/monetdb/distro-sparc-feb2010-64bit-debug/bin/mserver5 ~/gaal/rdcuxsrv220-local-disk/chenher/monetdb/feb2010/var/MonetDB5/dbfarm/taq/core For information about new features see `help changes' To remove this message, put `dbxenv suppress_startup_message 7.7' in your .dbxrc Reading mserver5 ... Reading lib_logger.so.5.18.1 Reading libuuid.so.1 t@4 (l@4) terminated by signal SEGV (no mapping at the fault address) Current function is putName 174 for(l= nme[0]; l && namespace.nme[l]; l= namespace.link[l]){ (dbx) where current thread: t@4 =>[1] putName(nme = 0xffffffff6aec3928 "exportValue", len = 11U), line 174 in "mal_namespace.c" [2] initSQLreferences(), line 49 in "sql_gencode.c" [3] SQLinitClient(c = 0xffffffff7f352628), line 379 in "sql_scenario.c" [4] runPhase(c = 0xffffffff7f352628, phase = 5), line 363 in "mal_scenario.c" [5] runScenarioBody(c = 0xffffffff7f352628), line 392 in "mal_scenario.c" [6] runScenario(c = 0xffffffff7f352628), line 438 in "mal_scenario.c" [7] MSserveClient(dummy = 0xffffffff7f352628), line 368 in "mal_session.c" (dbx) threads t@1 a l@1 ?() LWP suspended in __pollsys() t@2 a l@2 SERVERlistenThread() LWP suspended in __pollsys() t@3 a l@3 mvc_logmanager() LWP suspended in __pollsys() o> t@4 a l@4 ?() signal SIGSEGV in putName() t@5 a l@5 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@6 a l@6 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@7 a l@7 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@8 a l@8 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@9 a l@9 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@10 a l@10 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@11 a l@11 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@12 a l@12 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@13 a l@13 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@14 a l@14 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@15 a l@15 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@16 a l@16 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@17 a l@17 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@18 a l@18 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@19 a l@19 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@20 a l@20 runDFLOWworker() sleep on 0x100c14b08 in __lwp_park() t@21 a l@21 ?() sleep on 0xffffffff7f352a98 in __lwp_park() t@22 a l@22 ?() sleep on 0xffffffff7f352d58 in __lwp_park() t@23 a l@23 ?() sleep on 0xffffffff7f353018 in __lwp_park() t@24 a l@24 ?() sleep on 0xffffffff7f3532d8 in __lwp_park() t@25 a l@25 ?() sleep on 0xffffffff7f353598 in __lwp_park() t@26 a l@26 ?() sleep on 0xffffffff7f353858 in __lwp_park() t@27 a l@27 ?() sleep on 0xffffffff7f353b18 in __lwp_park() t@28 a l@28 ?() sleep on 0xffffffff7f353dd8 in __lwp_park() t@29 a l@29 ?() sleep on 0xffffffff7f354098 in __lwp_park() t@30 a l@30 ?() sleep on 0xffffffff7f354358 in __lwp_park() t@31 a l@31 runDFLOWworker() sleep on 0x1012f8988 in __lwp_park() t@32 a l@32 runDFLOWworker() sleep on 0x1012f8988 in __lwp_park() t@33 a l@33 runDFLOWworker() sleep on 0x1012f8988 in __lwp_park()
...
t@174 a l@174 runDFLOWworker() sleep on 0x102c9f908 in __lwp_park() t@175 a l@175 runDFLOWworker() sleep on 0x102c9f908 in __lwp_park()
Thanks. Hering
participants (1)
-
Hering Cheng