[MonetDB-users] Production Database Crash - Analysis / Recovery? - users-list

2 Jun 2011

      My database is crashing quite systematically in production during some data import (which is in the form as a serie of DELETE ... WHERE / COPY INTO).  The relevant information from the merovigian.log seems to be:

2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed
2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap

A) What does this error mean? 
-----------------------------------------

A lack of memory? I am quite confused as the entire database is barely 5G on disk and there is 7G of RAM on this machine, which is dedicated solely to MonetDB. Moreover I only DELETE/COPY INTO in a single table at a time and the biggest table is barely 1.6G. Cleary there is some memory dynamic that I am not understanding.

B) How do I recover the crashed database? 
-----------------------------------------------------------

It would not start anymore:

2011-06-02 04:55:19 MSG prod_reporting[1665]: # Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock
2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB/SQL module loaded
2011-06-02 04:55:19 MSG merovingian[1644]: database 'prod_reporting' (1665) was killed by signal SIGSEGV
2011-06-02 04:55:29 ERR control[1644]: (local): failed to fork mserver: database 'prod_reporting' has crashed after starting, manual intervention needed, check merovingian's logfile for details

Thanks in advance,
- Philippe

-- more complete extract from merovigian.log ---

2011-06-02 04:40:38 MSG merovingian[639]: proxying client ip-10-204-61-105.ec2.internal:52864 for database 'prod_reporting' to mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock?database=prod_reporting
2011-06-02 04:40:38 MSG merovingian[639]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying
2011-06-02 04:53:09 MSG control[639]: (local): served status list
2011-06-02 04:53:23 MSG merovingian[639]: caught SIGTERM, starting shutdown sequence
2011-06-02 04:53:24 MSG control[639]: control channel closed
2011-06-02 04:53:27 MSG merovingian[639]: sending process 25449 (database 'prod_reporting') the TERM signal
2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed
2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap
2011-06-02 04:53:27 MSG merovingian[639]: database 'prod_reporting' (25449) has exited with exit status 0
2011-06-02 04:53:27 MSG merovingian[639]: database 'prod_reporting' has shut down
2011-06-02 04:53:27 MSG merovingian[639]: Merovingian 1.4 stopped
2011-06-02 04:55:16 MSG merovingian[1644]: Merovingian 1.4 (Apr2011-SP1) starting
2011-06-02 04:55:16 MSG merovingian[1644]: monitoring dbfarm /var/monetdb5/dbfarm
2011-06-02 04:55:16 MSG merovingian[1644]: accepting connections on TCP socket 0.0.0.0:50000
2011-06-02 04:55:16 MSG merovingian[1644]: accepting connections on UNIX domain socket /tmp/.s.monetdb.50000
2011-06-02 04:55:16 MSG discovery[1644]: listening for UDP messages on 0.0.0.0:50000
2011-06-02 04:55:16 MSG control[1644]: accepting connections on UNIX domain socket /tmp/.s.merovingian.50001
2011-06-02 04:55:16 MSG control[1644]: accepting connections on TCP socket 0.0.0.0:50001
2011-06-02 04:55:16 MSG discovery[1644]: new neighbour ip-10-32-111-2 (ip-10-32-111-2.ec2.internal)
2011-06-02 04:55:16 MSG discovery[1644]: new database mapi:monetdb://ip-10-32-111-2:50000/prod_reporting (ttl=660s)
2011-06-02 04:55:16 MSG discovery[1644]: registered neighbour ip-10-32-111-2:50001
2011-06-02 04:55:19 MSG control[1644]: (local): served status list
2011-06-02 04:55:19 MSG merovingian[1644]: starting database 'prod_reporting', up min/avg/max: 1m/31m/1h, crash average: 0.00 0.10 0.03 (6-5=1)
2011-06-02 04:55:19 MSG prod_reporting[1665]: arguments: /usr/bin/mserver5 --set gdk_dbfarm=/var/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri=mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open=false --set mapi_port=0 --set mapi_usock=/var/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key=/var/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon=yes
2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB 5 server v11.3.3 "Apr2011-SP1"
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Serving database 'prod_reporting', using 2 threads
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Compiled for x86_64-pc-linux-gnu/64bit with 64bit OIDs dynamically linked
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Found 7.294 GiB available main-memory.
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Copyright (c) 1993-July 2008 CWI.
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved

2011-06-02 04:55:19 MSG prod_reporting[1665]: # Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock
2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB/SQL module loaded
2011-06-02 04:55:19 MSG merovingian[1644]: database 'prod_reporting' (1665) was killed by signal SIGSEGV
2011-06-02 04:55:29 ERR control[1644]: (local): failed to fork mserver: database 'prod_reporting' has crashed after starting, manual intervention needed, check merovingian's logfile for details
2011-06-02 05:03:18 MSG merovingian[1644]: database 'prod_reporting' has crashed after start on 2011-06-02 04:55:19, attempting restart, up min/avg/max: 1m/31m/1h, crash average: 1.00 0.20 0.07 (7-5=2)

[MonetDB-users] Production Database Crash - Analysis / Recovery?

Philippe Hanrigou

Stefan Manegold

Martin Kersten

Philippe Hanrigou

Philippe Hanrigou

Philippe Hanrigou

Philippe Hanrigou

Fabian Groffen

Philippe Hanrigou

tags

participants (4)