
My database is crashing quite systematically in production during some data import (which is in the form as a serie of DELETE ... WHERE / COPY INTO). The relevant information from the merovigian.log seems to be:
2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap ^^^^^^^^^^^^^ This suggests that MonetDB "for some reason" (possibly wrongly) expects some (intermediate) column to grow up to 3 TB in size, hence, tries to alloced
Philippe, On Wed, Jun 01, 2011 at 10:40:29PM -0700, Philippe Hanrigou wrote: the respective memory, but fails to do so successfully. We can try to investigate where this happens, but as much information about your usage of MonetDB (DB schema, data, query workload) as possible would be very helpful for us to be able to locate the origin of the problem. I'm afraid, though, we might need to be able to replay your complete scenario and trigger the some error with us to be able to locate and fix the problem. I see from your logs that you are using the latest Apr2011-SP1 release (64-bit on a 64-bit Linux system). Did you experience the problem also with earlier releases of MonetDB?
A) What does this error mean? -----------------------------------------
A lack of memory? I am quite confused as the entire database is barely 5G on disk and there is 7G of RAM on this machine, which is dedicated solely to MonetDB. Moreover I only DELETE/COPY INTO in a single table at a time and the biggest table is barely 1.6G. Cleary there is some memory dynamic that I am not understanding.
B) How do I recover the crashed database? -----------------------------------------------------------
It would not start anymore:
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB/SQL module loaded 2011-06-02 04:55:19 MSG merovingian[1644]: database 'prod_reporting' (1665) was killed by signal SIGSEGV 2011-06-02 04:55:29 ERR control[1644]: (local): failed to fork mserver: database 'prod_reporting' has crashed after starting, manual intervention needed, check merovingian's logfile for details
The server crashes with a segmentation fault, and we'd need to know where in the code (and why) this happens. I only(?) way to find out would be to start the server by hand in a debugger, using the same commandline options as monetdbd (merovingian) uses (see your log below), e.g., gdb --args /usr/bin/mserver5 --set gdk_dbfarm=/var/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri=mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open=false --set mapi_port=0 --set mapi_usock=/var/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key=/var/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon=yes Of course, this only provided useful information if you did (or could) compile MonetDB yourself, configured with debugging enabled (--enable-debug) ... Stefan
Thanks in advance, - Philippe
-- more complete extract from merovigian.log ---
2011-06-02 04:40:38 MSG merovingian[639]: proxying client ip-10-204-61-105.ec2.internal:52864 for database 'prod_reporting' to mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock?database=prod_reporting 2011-06-02 04:40:38 MSG merovingian[639]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying 2011-06-02 04:53:09 MSG control[639]: (local): served status list 2011-06-02 04:53:23 MSG merovingian[639]: caught SIGTERM, starting shutdown sequence 2011-06-02 04:53:24 MSG control[639]: control channel closed 2011-06-02 04:53:27 MSG merovingian[639]: sending process 25449 (database 'prod_reporting') the TERM signal 2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap 2011-06-02 04:53:27 MSG merovingian[639]: database 'prod_reporting' (25449) has exited with exit status 0 2011-06-02 04:53:27 MSG merovingian[639]: database 'prod_reporting' has shut down 2011-06-02 04:53:27 MSG merovingian[639]: Merovingian 1.4 stopped 2011-06-02 04:55:16 MSG merovingian[1644]: Merovingian 1.4 (Apr2011-SP1) starting 2011-06-02 04:55:16 MSG merovingian[1644]: monitoring dbfarm /var/monetdb5/dbfarm 2011-06-02 04:55:16 MSG merovingian[1644]: accepting connections on TCP socket 0.0.0.0:50000 2011-06-02 04:55:16 MSG merovingian[1644]: accepting connections on UNIX domain socket /tmp/.s.monetdb.50000 2011-06-02 04:55:16 MSG discovery[1644]: listening for UDP messages on 0.0.0.0:50000 2011-06-02 04:55:16 MSG control[1644]: accepting connections on UNIX domain socket /tmp/.s.merovingian.50001 2011-06-02 04:55:16 MSG control[1644]: accepting connections on TCP socket 0.0.0.0:50001 2011-06-02 04:55:16 MSG discovery[1644]: new neighbour ip-10-32-111-2 (ip-10-32-111-2.ec2.internal) 2011-06-02 04:55:16 MSG discovery[1644]: new database mapi:monetdb://ip-10-32-111-2:50000/prod_reporting (ttl=660s) 2011-06-02 04:55:16 MSG discovery[1644]: registered neighbour ip-10-32-111-2:50001 2011-06-02 04:55:19 MSG control[1644]: (local): served status list 2011-06-02 04:55:19 MSG merovingian[1644]: starting database 'prod_reporting', up min/avg/max: 1m/31m/1h, crash average: 0.00 0.10 0.03 (6-5=1) 2011-06-02 04:55:19 MSG prod_reporting[1665]: arguments: /usr/bin/mserver5 --set gdk_dbfarm=/var/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri=mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open=false --set mapi_port=0 --set mapi_usock=/var/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key=/var/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon=yes 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB 5 server v11.3.3 "Apr2011-SP1" 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Serving database 'prod_reporting', using 2 threads 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Compiled for x86_64-pc-linux-gnu/64bit with 64bit OIDs dynamically linked 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Found 7.294 GiB available main-memory. 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Copyright (c) 1993-July 2008 CWI. 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB/SQL module loaded 2011-06-02 04:55:19 MSG merovingian[1644]: database 'prod_reporting' (1665) was killed by signal SIGSEGV 2011-06-02 04:55:29 ERR control[1644]: (local): failed to fork mserver: database 'prod_reporting' has crashed after starting, manual intervention needed, check merovingian's logfile for details 2011-06-02 05:03:18 MSG merovingian[1644]: database 'prod_reporting' has crashed after start on 2011-06-02 04:55:19, attempting restart, up min/avg/max: 1m/31m/1h, crash average: 1.00 0.20 0.07 (7-5=2)
------------------------------------------------------------------------------ Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |