[MonetDB-users] Production Database Crash - Analysis / Recovery?

My database is crashing quite systematically in production during some data import (which is in the form as a serie of DELETE ... WHERE / COPY INTO). The relevant information from the merovigian.log seems to be: 2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap A) What does this error mean? ----------------------------------------- A lack of memory? I am quite confused as the entire database is barely 5G on disk and there is 7G of RAM on this machine, which is dedicated solely to MonetDB. Moreover I only DELETE/COPY INTO in a single table at a time and the biggest table is barely 1.6G. Cleary there is some memory dynamic that I am not understanding. B) How do I recover the crashed database? ----------------------------------------------------------- It would not start anymore: 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB/SQL module loaded 2011-06-02 04:55:19 MSG merovingian[1644]: database 'prod_reporting' (1665) was killed by signal SIGSEGV 2011-06-02 04:55:29 ERR control[1644]: (local): failed to fork mserver: database 'prod_reporting' has crashed after starting, manual intervention needed, check merovingian's logfile for details Thanks in advance, - Philippe -- more complete extract from merovigian.log --- 2011-06-02 04:40:38 MSG merovingian[639]: proxying client ip-10-204-61-105.ec2.internal:52864 for database 'prod_reporting' to mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock?database=prod_reporting 2011-06-02 04:40:38 MSG merovingian[639]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying 2011-06-02 04:53:09 MSG control[639]: (local): served status list 2011-06-02 04:53:23 MSG merovingian[639]: caught SIGTERM, starting shutdown sequence 2011-06-02 04:53:24 MSG control[639]: control channel closed 2011-06-02 04:53:27 MSG merovingian[639]: sending process 25449 (database 'prod_reporting') the TERM signal 2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap 2011-06-02 04:53:27 MSG merovingian[639]: database 'prod_reporting' (25449) has exited with exit status 0 2011-06-02 04:53:27 MSG merovingian[639]: database 'prod_reporting' has shut down 2011-06-02 04:53:27 MSG merovingian[639]: Merovingian 1.4 stopped 2011-06-02 04:55:16 MSG merovingian[1644]: Merovingian 1.4 (Apr2011-SP1) starting 2011-06-02 04:55:16 MSG merovingian[1644]: monitoring dbfarm /var/monetdb5/dbfarm 2011-06-02 04:55:16 MSG merovingian[1644]: accepting connections on TCP socket 0.0.0.0:50000 2011-06-02 04:55:16 MSG merovingian[1644]: accepting connections on UNIX domain socket /tmp/.s.monetdb.50000 2011-06-02 04:55:16 MSG discovery[1644]: listening for UDP messages on 0.0.0.0:50000 2011-06-02 04:55:16 MSG control[1644]: accepting connections on UNIX domain socket /tmp/.s.merovingian.50001 2011-06-02 04:55:16 MSG control[1644]: accepting connections on TCP socket 0.0.0.0:50001 2011-06-02 04:55:16 MSG discovery[1644]: new neighbour ip-10-32-111-2 (ip-10-32-111-2.ec2.internal) 2011-06-02 04:55:16 MSG discovery[1644]: new database mapi:monetdb://ip-10-32-111-2:50000/prod_reporting (ttl=660s) 2011-06-02 04:55:16 MSG discovery[1644]: registered neighbour ip-10-32-111-2:50001 2011-06-02 04:55:19 MSG control[1644]: (local): served status list 2011-06-02 04:55:19 MSG merovingian[1644]: starting database 'prod_reporting', up min/avg/max: 1m/31m/1h, crash average: 0.00 0.10 0.03 (6-5=1) 2011-06-02 04:55:19 MSG prod_reporting[1665]: arguments: /usr/bin/mserver5 --set gdk_dbfarm=/var/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri=mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open=false --set mapi_port=0 --set mapi_usock=/var/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key=/var/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon=yes 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB 5 server v11.3.3 "Apr2011-SP1" 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Serving database 'prod_reporting', using 2 threads 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Compiled for x86_64-pc-linux-gnu/64bit with 64bit OIDs dynamically linked 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Found 7.294 GiB available main-memory. 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Copyright (c) 1993-July 2008 CWI. 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB/SQL module loaded 2011-06-02 04:55:19 MSG merovingian[1644]: database 'prod_reporting' (1665) was killed by signal SIGSEGV 2011-06-02 04:55:29 ERR control[1644]: (local): failed to fork mserver: database 'prod_reporting' has crashed after starting, manual intervention needed, check merovingian's logfile for details 2011-06-02 05:03:18 MSG merovingian[1644]: database 'prod_reporting' has crashed after start on 2011-06-02 04:55:19, attempting restart, up min/avg/max: 1m/31m/1h, crash average: 1.00 0.20 0.07 (7-5=2)

My database is crashing quite systematically in production during some data import (which is in the form as a serie of DELETE ... WHERE / COPY INTO). The relevant information from the merovigian.log seems to be:
2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap ^^^^^^^^^^^^^ This suggests that MonetDB "for some reason" (possibly wrongly) expects some (intermediate) column to grow up to 3 TB in size, hence, tries to alloced
Philippe, On Wed, Jun 01, 2011 at 10:40:29PM -0700, Philippe Hanrigou wrote: the respective memory, but fails to do so successfully. We can try to investigate where this happens, but as much information about your usage of MonetDB (DB schema, data, query workload) as possible would be very helpful for us to be able to locate the origin of the problem. I'm afraid, though, we might need to be able to replay your complete scenario and trigger the some error with us to be able to locate and fix the problem. I see from your logs that you are using the latest Apr2011-SP1 release (64-bit on a 64-bit Linux system). Did you experience the problem also with earlier releases of MonetDB?
A) What does this error mean? -----------------------------------------
A lack of memory? I am quite confused as the entire database is barely 5G on disk and there is 7G of RAM on this machine, which is dedicated solely to MonetDB. Moreover I only DELETE/COPY INTO in a single table at a time and the biggest table is barely 1.6G. Cleary there is some memory dynamic that I am not understanding.
B) How do I recover the crashed database? -----------------------------------------------------------
It would not start anymore:
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB/SQL module loaded 2011-06-02 04:55:19 MSG merovingian[1644]: database 'prod_reporting' (1665) was killed by signal SIGSEGV 2011-06-02 04:55:29 ERR control[1644]: (local): failed to fork mserver: database 'prod_reporting' has crashed after starting, manual intervention needed, check merovingian's logfile for details
The server crashes with a segmentation fault, and we'd need to know where in the code (and why) this happens. I only(?) way to find out would be to start the server by hand in a debugger, using the same commandline options as monetdbd (merovingian) uses (see your log below), e.g., gdb --args /usr/bin/mserver5 --set gdk_dbfarm=/var/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri=mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open=false --set mapi_port=0 --set mapi_usock=/var/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key=/var/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon=yes Of course, this only provided useful information if you did (or could) compile MonetDB yourself, configured with debugging enabled (--enable-debug) ... Stefan
Thanks in advance, - Philippe
-- more complete extract from merovigian.log ---
2011-06-02 04:40:38 MSG merovingian[639]: proxying client ip-10-204-61-105.ec2.internal:52864 for database 'prod_reporting' to mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock?database=prod_reporting 2011-06-02 04:40:38 MSG merovingian[639]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying 2011-06-02 04:53:09 MSG control[639]: (local): served status list 2011-06-02 04:53:23 MSG merovingian[639]: caught SIGTERM, starting shutdown sequence 2011-06-02 04:53:24 MSG control[639]: control channel closed 2011-06-02 04:53:27 MSG merovingian[639]: sending process 25449 (database 'prod_reporting') the TERM signal 2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap 2011-06-02 04:53:27 MSG merovingian[639]: database 'prod_reporting' (25449) has exited with exit status 0 2011-06-02 04:53:27 MSG merovingian[639]: database 'prod_reporting' has shut down 2011-06-02 04:53:27 MSG merovingian[639]: Merovingian 1.4 stopped 2011-06-02 04:55:16 MSG merovingian[1644]: Merovingian 1.4 (Apr2011-SP1) starting 2011-06-02 04:55:16 MSG merovingian[1644]: monitoring dbfarm /var/monetdb5/dbfarm 2011-06-02 04:55:16 MSG merovingian[1644]: accepting connections on TCP socket 0.0.0.0:50000 2011-06-02 04:55:16 MSG merovingian[1644]: accepting connections on UNIX domain socket /tmp/.s.monetdb.50000 2011-06-02 04:55:16 MSG discovery[1644]: listening for UDP messages on 0.0.0.0:50000 2011-06-02 04:55:16 MSG control[1644]: accepting connections on UNIX domain socket /tmp/.s.merovingian.50001 2011-06-02 04:55:16 MSG control[1644]: accepting connections on TCP socket 0.0.0.0:50001 2011-06-02 04:55:16 MSG discovery[1644]: new neighbour ip-10-32-111-2 (ip-10-32-111-2.ec2.internal) 2011-06-02 04:55:16 MSG discovery[1644]: new database mapi:monetdb://ip-10-32-111-2:50000/prod_reporting (ttl=660s) 2011-06-02 04:55:16 MSG discovery[1644]: registered neighbour ip-10-32-111-2:50001 2011-06-02 04:55:19 MSG control[1644]: (local): served status list 2011-06-02 04:55:19 MSG merovingian[1644]: starting database 'prod_reporting', up min/avg/max: 1m/31m/1h, crash average: 0.00 0.10 0.03 (6-5=1) 2011-06-02 04:55:19 MSG prod_reporting[1665]: arguments: /usr/bin/mserver5 --set gdk_dbfarm=/var/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri=mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open=false --set mapi_port=0 --set mapi_usock=/var/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key=/var/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon=yes 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB 5 server v11.3.3 "Apr2011-SP1" 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Serving database 'prod_reporting', using 2 threads 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Compiled for x86_64-pc-linux-gnu/64bit with 64bit OIDs dynamically linked 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Found 7.294 GiB available main-memory. 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Copyright (c) 1993-July 2008 CWI. 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB/SQL module loaded 2011-06-02 04:55:19 MSG merovingian[1644]: database 'prod_reporting' (1665) was killed by signal SIGSEGV 2011-06-02 04:55:29 ERR control[1644]: (local): failed to fork mserver: database 'prod_reporting' has crashed after starting, manual intervention needed, check merovingian's logfile for details 2011-06-02 05:03:18 MSG merovingian[1644]: database 'prod_reporting' has crashed after start on 2011-06-02 04:55:19, attempting restart, up min/avg/max: 1m/31m/1h, crash average: 1.00 0.20 0.07 (7-5=2)
------------------------------------------------------------------------------ Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
-- | Stefan.Manegold @ CWI.nl | DB Architectures (INS1) | | http://CWI.nl/~manegold/ | Science Park 123 (L321) | | Tel.: +31 (0)20 592-4212 | 1098 XG Amsterdam (NL) |

Philippe, If this happens during the COPY into statement, then most likely it is related to enforcing integrity constraints e.g. foreign keys. see http://www.monetdb.org/Documentation/Manuals/SQLreference/CopyInto Alternatively, a previous system failure (e.g. kill by the OS left it in an inconsistent state. If you can simply start the server and perform a simple query 'select 1;' then you at least know that the database itself is not corrupted. On 6/2/11 4:23 PM, Stefan Manegold wrote:
Philippe,
My database is crashing quite systematically in production during some data import (which is in the form as a serie of DELETE ... WHERE / COPY INTO). The relevant information from the merovigian.log seems to be:
2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap ^^^^^^^^^^^^^ This suggests that MonetDB "for some reason" (possibly wrongly) expects some (intermediate) column to grow up to 3 TB in size, hence, tries to alloced
On Wed, Jun 01, 2011 at 10:40:29PM -0700, Philippe Hanrigou wrote: the respective memory, but fails to do so successfully.
We can try to investigate where this happens, but as much information about your usage of MonetDB (DB schema, data, query workload) as possible would be very helpful for us to be able to locate the origin of the problem.
I'm afraid, though, we might need to be able to replay your complete scenario and trigger the some error with us to be able to locate and fix the problem.
I see from your logs that you are using the latest Apr2011-SP1 release (64-bit on a 64-bit Linux system). Did you experience the problem also with earlier releases of MonetDB?
A) What does this error mean? -----------------------------------------
A lack of memory? I am quite confused as the entire database is barely 5G on disk and there is 7G of RAM on this machine, which is dedicated solely to MonetDB. Moreover I only DELETE/COPY INTO in a single table at a time and the biggest table is barely 1.6G. Cleary there is some memory dynamic that I am not understanding.
B) How do I recover the crashed database? -----------------------------------------------------------
It would not start anymore:
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB/SQL module loaded 2011-06-02 04:55:19 MSG merovingian[1644]: database 'prod_reporting' (1665) was killed by signal SIGSEGV 2011-06-02 04:55:29 ERR control[1644]: (local): failed to fork mserver: database 'prod_reporting' has crashed after starting, manual intervention needed, check merovingian's logfile for details
The server crashes with a segmentation fault, and we'd need to know where in the code (and why) this happens. I only(?) way to find out would be to start the server by hand in a debugger, using the same commandline options as monetdbd (merovingian) uses (see your log below), e.g.,
gdb --args /usr/bin/mserver5 --set gdk_dbfarm=/var/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri=mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open=false --set mapi_port=0 --set mapi_usock=/var/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key=/var/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon=yes
Of course, this only provided useful information if you did (or could) compile MonetDB yourself, configured with debugging enabled (--enable-debug) ...
Stefan
Thanks in advance, - Philippe
-- more complete extract from merovigian.log ---
2011-06-02 04:40:38 MSG merovingian[639]: proxying client ip-10-204-61-105.ec2.internal:52864 for database 'prod_reporting' to mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock?database=prod_reporting 2011-06-02 04:40:38 MSG merovingian[639]: target connection is on local UNIX domain socket, passing on filedescriptor instead of proxying 2011-06-02 04:53:09 MSG control[639]: (local): served status list 2011-06-02 04:53:23 MSG merovingian[639]: caught SIGTERM, starting shutdown sequence 2011-06-02 04:53:24 MSG control[639]: control channel closed 2011-06-02 04:53:27 MSG merovingian[639]: sending process 25449 (database 'prod_reporting') the TERM signal 2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap 2011-06-02 04:53:27 MSG merovingian[639]: database 'prod_reporting' (25449) has exited with exit status 0 2011-06-02 04:53:27 MSG merovingian[639]: database 'prod_reporting' has shut down 2011-06-02 04:53:27 MSG merovingian[639]: Merovingian 1.4 stopped 2011-06-02 04:55:16 MSG merovingian[1644]: Merovingian 1.4 (Apr2011-SP1) starting 2011-06-02 04:55:16 MSG merovingian[1644]: monitoring dbfarm /var/monetdb5/dbfarm 2011-06-02 04:55:16 MSG merovingian[1644]: accepting connections on TCP socket 0.0.0.0:50000 2011-06-02 04:55:16 MSG merovingian[1644]: accepting connections on UNIX domain socket /tmp/.s.monetdb.50000 2011-06-02 04:55:16 MSG discovery[1644]: listening for UDP messages on 0.0.0.0:50000 2011-06-02 04:55:16 MSG control[1644]: accepting connections on UNIX domain socket /tmp/.s.merovingian.50001 2011-06-02 04:55:16 MSG control[1644]: accepting connections on TCP socket 0.0.0.0:50001 2011-06-02 04:55:16 MSG discovery[1644]: new neighbour ip-10-32-111-2 (ip-10-32-111-2.ec2.internal) 2011-06-02 04:55:16 MSG discovery[1644]: new database mapi:monetdb://ip-10-32-111-2:50000/prod_reporting (ttl=660s) 2011-06-02 04:55:16 MSG discovery[1644]: registered neighbour ip-10-32-111-2:50001 2011-06-02 04:55:19 MSG control[1644]: (local): served status list 2011-06-02 04:55:19 MSG merovingian[1644]: starting database 'prod_reporting', up min/avg/max: 1m/31m/1h, crash average: 0.00 0.10 0.03 (6-5=1) 2011-06-02 04:55:19 MSG prod_reporting[1665]: arguments: /usr/bin/mserver5 --set gdk_dbfarm=/var/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri=mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open=false --set mapi_port=0 --set mapi_usock=/var/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key=/var/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon=yes 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB 5 server v11.3.3 "Apr2011-SP1" 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Serving database 'prod_reporting', using 2 threads 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Compiled for x86_64-pc-linux-gnu/64bit with 64bit OIDs dynamically linked 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Found 7.294 GiB available main-memory. 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Copyright (c) 1993-July 2008 CWI. 2011-06-02 04:55:19 MSG prod_reporting[1665]: # Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved
2011-06-02 04:55:19 MSG prod_reporting[1665]: # Listening for UNIX domain connection requests on mapi:monetdb:///var/monetdb5/dbfarm/prod_reporting/.mapi.sock 2011-06-02 04:55:19 MSG prod_reporting[1665]: # MonetDB/SQL module loaded 2011-06-02 04:55:19 MSG merovingian[1644]: database 'prod_reporting' (1665) was killed by signal SIGSEGV 2011-06-02 04:55:29 ERR control[1644]: (local): failed to fork mserver: database 'prod_reporting' has crashed after starting, manual intervention needed, check merovingian's logfile for details 2011-06-02 05:03:18 MSG merovingian[1644]: database 'prod_reporting' has crashed after start on 2011-06-02 04:55:19, attempting restart, up min/avg/max: 1m/31m/1h, crash average: 1.00 0.20 0.07 (7-5=2)
------------------------------------------------------------------------------ Simplify data backup and recovery for your virtual environment with vRanger. Installation's a snap, and flexible recovery options mean your data is safe, secure and there when you need it. Data protection magic? Nope - It's vRanger. Get your free trial download today. http://p.sf.net/sfu/quest-sfdev2dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users

Hi Martin, Thanks for the help. On Jun 2, 2011, at 7:52 AM, Martin Kersten wrote:
If this happens during the COPY into statement, then most likely it is related to enforcing integrity constraints e.g. foreign keys.
Good to know. Unlikely to be the problem in my case though, as I actually decided *not* to define any constraint on this table at all, not even a primary key. Is there any performance benefit in MonetDB in defining primary keys or foreign keys?
Alternatively, a previous system failure (e.g. kill by the OS left it in an inconsistent state. If you can simply start the server and perform a simple query 'select 1;' then you at least know that the database itself is not corrupted.
This one will go in my fresh collection of MonetDB troubleshooting tips ;-) Definitely useful. I can not even start start the database at all (SIGSEGV) though. Stefan helped me generate a more meaningful stack trace in case you are interested: http://sourceforge.net/mailarchive/message.php?msg_id=27593226 Thanks again, - Philippe

Hi Stefan, Thanks a lot for the help. I really appreciate it. On Jun 2, 2011, at 7:23 AM, Stefan Manegold wrote:
2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap ^^^^^^^^^^^^^ This suggests that MonetDB "for some reason" (possibly wrongly) expects some (intermediate) column to grow up to 3 TB in size, hence, tries to alloced the respective memory, but fails to do so successfully.
We can try to investigate where this happens, but as much information about your usage of MonetDB (DB schema, data, query workload) as possible would be very helpful for us to be able to locate the origin of the problem.
3TB seems quite crazy, I wonder how I end up triggering this with a 4.3G database
(as measured by "du -hs" on disk).
The workload is a serie of updates to our reference data in a database. I am
simulating some upserts with a combination of DELETE/COPY INTO: I want to
refresh data about some "sessions" for a "merchant". Each merchant has a
dedicated table, named "<merchant id>_sessions";
To refresh the session metrics I execute:
-- for each day:
-- for each merchant
DELETE FROM "
I'm afraid, though, we might need to be able to replay your complete scenario and trigger the some error with us to be able to locate and fix the problem.
I will try again tomorrow. I will try adding an explicit maximum number of records with COPY 8000 RECORDS INTO ... to see if it makes any difference.
I see from your logs that you are using the latest Apr2011-SP1 release (64-bit on a 64-bit Linux system). Did you experience the problem also with earlier releases of MonetDB?
Yes indeed it first happened with Apr2011. I then upgraded to Apr2011-SP1, started with a fresh database, reran the import/refresh from scratch and was able to reproduce the problem again.
The server crashes with a segmentation fault, and we'd need to know where in the code (and why) this happens. I only(?) way to find out would be to start the server by hand in a debugger, using the same commandline options as monetdbd (merovingian) uses (see your log below), e.g.,
It seems to SIGSEGV when processing the sql logs:
(gdb) run
Starting program: /opt/local/bin/mserver5 --set gdk_dbfarm /mnt/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open false --set mapi_port 0 --set mapi_usock /mnt/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key /mnt/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon yes
[Thread debugging using libthread_db enabled]
ERROR: wrong format gdk_dbfarm
ERROR: wrong format merovingian_uri
ERROR: wrong format mapi_open
ERROR: wrong format mapi_port
ERROR: wrong format mapi_usock
ERROR: wrong format monet_vault_key
ERROR: wrong format monet_daemon
# MonetDB 5 server v11.3.3 "Apr2011-SP1"
# Serving database 'prod_reporting', using 2 threads
# Compiled for x86_64-unknown-linux-gnu/64bit with 64bit OIDs dynamically linked
# Found 7.294 GiB available main-memory.
# Copyright (c) 1993-July 2008 CWI.
# Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved
# Visit http://monetdb.cwi.nl/ for further information
[New Thread 0x7fffea474700 (LWP 15213)]
# Listening for connection requests on mapi:monetdb://127.0.0.1:50000/
# MonetDB/GIS module loaded
# MonetDB/SQL module loaded
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff6788a13 in logger_new (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0,
prefuncp=0x7fffea594f8c

FYI I tried deleting all the files in sql_logs/sql and restarting the database, it still crashes... On Jun 2, 2011, at 2:01 PM, Philippe Hanrigou wrote:
Hi Stefan,
Thanks a lot for the help. I really appreciate it.
On Jun 2, 2011, at 7:23 AM, Stefan Manegold wrote:
2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap ^^^^^^^^^^^^^ This suggests that MonetDB "for some reason" (possibly wrongly) expects some (intermediate) column to grow up to 3 TB in size, hence, tries to alloced the respective memory, but fails to do so successfully.
We can try to investigate where this happens, but as much information about your usage of MonetDB (DB schema, data, query workload) as possible would be very helpful for us to be able to locate the origin of the problem.
3TB seems quite crazy, I wonder how I end up triggering this with a 4.3G database (as measured by "du -hs" on disk).
The workload is a serie of updates to our reference data in a database. I am simulating some upserts with a combination of DELETE/COPY INTO: I want to refresh data about some "sessions" for a "merchant". Each merchant has a dedicated table, named "<merchant id>_sessions";
To refresh the session metrics I execute:
-- for each day: -- for each merchant
DELETE FROM "
_sessions" WHERE session_start BETWEEN <start of day timestamp> AND <end of day timestamp>; COPY INTO " _sessions" FROM STDIN USING DELIMITERS '\\t','\\n'; ... (up to 8000 session rows for this merchant and day) This is my poor man's way of simulating upserts as my email on the topic did not generate many suggestions ;-) http://sourceforge.net/mailarchive/forum.php?thread_name=BANLkTi%3DdX-1DFka5NRnZUEj%3DVdi3Sz-Kkg%40mail.gmail.com&forum_name=monetdb-users
I am willing to try other ways to accomplish the same thing as long as it is performant for a bulk upserts.
I'm afraid, though, we might need to be able to replay your complete scenario and trigger the some error with us to be able to locate and fix the problem.
I will try again tomorrow. I will try adding an explicit maximum number of records with COPY 8000 RECORDS INTO ... to see if it makes any difference.
I see from your logs that you are using the latest Apr2011-SP1 release (64-bit on a 64-bit Linux system). Did you experience the problem also with earlier releases of MonetDB?
Yes indeed it first happened with Apr2011. I then upgraded to Apr2011-SP1, started with a fresh database, reran the import/refresh from scratch and was able to reproduce the problem again.
The server crashes with a segmentation fault, and we'd need to know where in the code (and why) this happens. I only(?) way to find out would be to start the server by hand in a debugger, using the same commandline options as monetdbd (merovingian) uses (see your log below), e.g.,
It seems to SIGSEGV when processing the sql logs:
(gdb) run Starting program: /opt/local/bin/mserver5 --set gdk_dbfarm /mnt/monetdb5/dbfarm --dbname=prod_reporting --set merovingian_uri mapi:monetdb://ip-10-32-111-2:50000/prod_reporting --set mapi_open false --set mapi_port 0 --set mapi_usock /mnt/monetdb5/dbfarm/prod_reporting/.mapi.sock --set monet_vault_key /mnt/monetdb5/dbfarm/prod_reporting/.vaultkey --set monet_daemon yes [Thread debugging using libthread_db enabled] ERROR: wrong format gdk_dbfarm ERROR: wrong format merovingian_uri ERROR: wrong format mapi_open ERROR: wrong format mapi_port ERROR: wrong format mapi_usock ERROR: wrong format monet_vault_key ERROR: wrong format monet_daemon # MonetDB 5 server v11.3.3 "Apr2011-SP1" # Serving database 'prod_reporting', using 2 threads # Compiled for x86_64-unknown-linux-gnu/64bit with 64bit OIDs dynamically linked # Found 7.294 GiB available main-memory. # Copyright (c) 1993-July 2008 CWI. # Copyright (c) August 2008-2011 MonetDB B.V., all rights reserved # Visit http://monetdb.cwi.nl/ for further information [New Thread 0x7fffea474700 (LWP 15213)] # Listening for connection requests on mapi:monetdb://127.0.0.1:50000/ # MonetDB/GIS module loaded # MonetDB/SQL module loaded
Program received signal SIGSEGV, Segmentation fault. 0x00007ffff6788a13 in logger_new (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0, prefuncp=0x7fffea594f8c
, postfuncp=0x7fffea59503d ) at gdk_logger.mx:1192 1192 BATloop(b, p, q) { (gdb) where #0 0x00007ffff6788a13 in logger_new (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0, prefuncp=0x7fffea594f8c , postfuncp=0x7fffea59503d ) at gdk_logger.mx:1192 #1 0x00007ffff678925e in logger_create (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0, prefuncp=0x7fffea594f8c , postfuncp=0x7fffea59503d ) at gdk_logger.mx:1254 #2 0x00007fffea595604 in bl_create (logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", cat_version=51100) at bat_logger.c:130 #3 0x00007fffea581bcb in store_init (debug=0, store=store_bat, logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", stk=0) at store.c:1315 #4 0x00007fffea518c4d in mvc_init (dbname=0x1010cf8 "prod_reporting", debug=0, store=store_bat, stk=0) at sql_mvc.c:51 #5 0x00007fffea4eaaa1 in SQLinit () at sql_scenario.mx:272 #6 0x00007fffea4ea717 in SQLprelude () at sql_scenario.mx:199 #7 0x00007ffff6dc7268 in runMALsequence (cntxt=0x606580, mb=0x102bd48, startpc=1, stoppc=0, stk=0x19cce48, env=0x0, pcicaller=0x0) at mal_interpreter.mx:2052 #8 0x00007ffff6db95a4 in runMAL (cntxt=0x606580, mb=0x102bd48, startpc=1, mbcaller=0x0, env=0x0, pcicaller=0x0) at mal_interpreter.mx:341 #9 0x00007ffff6e0dd1a in MALengine (c=0x606580) at mal_session.mx:680 #10 0x00007ffff6e0c450 in malBootstrap () at mal_session.mx:95 #11 0x00007ffff6d9690b in mal_init () at mal.mx:383 #12 0x000000000040331d in main (argc=23, av=0x7fffffffe4d8) at mserver5.c:546 (gdb) thr app all bt Thread 2 (Thread 0x7fffea474700 (LWP 15213)): #0 0x00007ffff3c8867e in __lll_lock_wait_private () from /lib/libpthread.so.0 #1 0x00007ffff3c8208e in _L_lock_4442 () from /lib/libpthread.so.0 #2 0x00007ffff3c81c3e in start_thread () from /lib/libpthread.so.0 #3 0x00007ffff39dd92d in clone () from /lib/libc.so.6 #4 0x0000000000000000 in ?? ()
Thread 1 (Thread 0x7ffff7fe4720 (LWP 15210)): #0 0x00007ffff6788a13 in logger_new (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0, prefuncp=0x7fffea594f8c
, postfuncp=0x7fffea59503d ) at gdk_logger.mx:1192 #1 0x00007ffff678925e in logger_create (debug=0, fn=0x7fffea5cb7a5 "sql", logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", version=51100, phandler=0, prefuncp=0x7fffea594f8c , postfuncp=0x7fffea59503d ) at gdk_logger.mx:1254 #2 0x00007fffea595604 in bl_create (logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", cat_version=51100) at bat_logger.c:130 #3 0x00007fffea581bcb in store_init (debug=0, store=store_bat, logdir=0x7fffea5b1be8 "sql_logs", dbname=0x1010cf8 "prod_reporting", stk=0) at store.c:1315 #4 0x00007fffea518c4d in mvc_init (dbname=0x1010cf8 "prod_reporting", debug=0, store=store_bat, stk=0) at sql_mvc.c:51 #5 0x00007fffea4eaaa1 in SQLinit () at sql_scenario.mx:272 #6 0x00007fffea4ea717 in SQLprelude () at sql_scenario.mx:199 #7 0x00007ffff6dc7268 in runMALsequence (cntxt=0x606580, mb=0x102bd48, startpc=1, stoppc=0, stk=0x19cce48, env=0x0, pcicaller=0x0) at mal_interpreter.mx:2052 #8 0x00007ffff6db95a4 in runMAL (cntxt=0x606580, mb=0x102bd48, startpc=1, mbcaller=0x0, env=0x0, pcicaller=0x0) at mal_interpreter.mx:341 #9 0x00007ffff6e0dd1a in MALengine (c=0x606580) at mal_session.mx:680 #10 0x00007ffff6e0c450 in malBootstrap () at mal_session.mx:95 #11 0x00007ffff6d9690b in mal_init () at mal.mx:383 #12 0x000000000040331d in main (argc=23, av=0x7fffffffe4d8) at mserver5.c:546 Thanks a lot for your help, - Philippe

Hi I spent more time analyzing how the original SIGSEGV occurs. I hope somebody could help me push the analysis further. The SIGSEGV is always happening in a DELETE statement: delete from \"20789445e300fa1e535f3027d5d63dc9_sessions\" where session_start between 1280361600000 and 1280447999999; and is triggered on line 498 of gdk_setop.mx: HASHloop@4(ri, r->H->hash, s2, h) { The problem seems to be with `r->H->hash` whose value is 0x0 I have no idea of where r->H->hash should have been set, or how to push the investigation further. Any help would be greatly appreciated. I have included below a capture of my gdb session which will provide more information. Thanks in advance, - Philippe --------------------- gdb session -------------------- Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7fd9acbb7700 (LWP 3270)] 0x00007fd9b974c8e8 in BATins_kdiff (bn=0x24daa0c8, l=0x1dd47858, r=0x37498ea0) at gdk_setop.mx:498 498 HASHloop@4(ri, r->H->hash, s2, h) { (gdb) (gdb) bt #0 0x00007fd9b974c8e8 in BATins_kdiff (bn=0x24daa0c8, l=0x1dd47858, r=0x37498ea0) at gdk_setop.mx:498 #1 0x00007fd9b9760865 in BATkdiff (l=0x1dd47858, r=0x37498ea0) at gdk_setop.mx:827 #2 0x00007fd9ba8be632 in CMDkdiff (result=0x7fd9acbb6838, left=0x1dd47858, right=0x37498ea0) at algebra.mx:1586 #3 0x00007fd9ba8cebce in ALGkdiff (result=0x24ad2ec8, lid=0x24ad2e98, rid=0x24ad2c28) at algebra.mx:3018 #4 0x00007fd9ba1aa5da in DFLOWstep (t=0x21bd2c8, fs=0x7fd9acfb7de0) at mal_interpreter.mx:2058 #5 0x00007fd9ba1afee3 in runDFLOWworker (t=0x21bd2c8) at mal_interpreter.mx:1174 #6 0x00007fd9b6e0c971 in start_thread () from /lib/libpthread.so.0 #7 0x00007fd9b6b6892d in clone () from /lib/libc.so.6 #8 0x0000000000000000 in ?? () (gdb) info threads 6 Thread 0x7fd9acfb9700 (LWP 4846) 0x00007fd9b6e12da0 in sem_wait () from /lib/libpthread.so.0 5 Thread 0x7fd9ad3bb700 (LWP 3266) 0x00007fd9b6b612c3 in select () from /lib/libc.so.6 4 Thread 0x7fd9ad1ba700 (LWP 3267) 0x00007fd9b6b612c3 in select () from /lib/libc.so.6 3 Thread 0x7fd9acdb8700 (LWP 3269) 0x00007fff89fff818 in gettimeofday () * 2 Thread 0x7fd9acbb7700 (LWP 3270) 0x00007fd9b974c8e8 in BATins_kdiff (bn=0x24daa0c8, l=0x1dd47858, r=0x37498ea0) at gdk_setop.mx:498 1 Thread 0x7fd9bb3b1720 (LWP 3263) 0x00007fd9b6b612c3 in select () from /lib/libc.so.6 (gdb) thread 6 [Switching to thread 6 (Thread 0x7fd9acfb9700 (LWP 4846))]#0 0x00007fd9b6e12da0 in sem_wait () from /lib/libpthread.so.0 (gdb) bt #0 0x00007fd9b6e12da0 in sem_wait () from /lib/libpthread.so.0 #1 0x00007fd9ba1a5b3f in q_dequeue (q=0x1eaca38) at mal_interpreter.mx:960 #2 0x00007fd9ba1b0a6c in DFLOWscheduler (flow=0x25e1c38) at mal_interpreter.mx:1385 #3 0x00007fd9ba1b1c07 in runMALdataflow (cntxt=0x606898, mb=0x275f9a48, startpc=2, stoppc=59, stk=0x24ad2af8, env=0x0, pcicaller=0x409cad8) at mal_interpreter.mx:1583 #4 0x00007fd9bacb47e0 in MALstartDataflow (cntxt=0x606898, mb=0x275f9a48, stk=0x24ad2af8, pci=0x409cad8) at language.mx:268 #5 0x00007fd9ba192333 in runMALsequence (cntxt=0x606898, mb=0x275f9a48, startpc=1, stoppc=0, stk=0x24ad2af8, env=0x0, pcicaller=0x0) at mal_interpreter.mx:2168 #6 0x00007fd9ba1866ec in callMAL (cntxt=0x606898, mb=0x275f9a48, env=0x7fd9acfb8c80, argv=0x7fd9acfb8c40, debug=0 '\000') at mal_interpreter.mx:429 #7 0x00007fd9ad435c9f in SQLexecutePrepared (c=0x606898, be=0x278fefc8, q=0x18c31ce8) at sql_scenario.mx:1490 #8 0x00007fd9ad435f12 in SQLengineIntern (c=0x606898, be=0x278fefc8) at sql_scenario.mx:1543 #9 0x00007fd9ad436441 in SQLengine (c=0x606898) at sql_scenario.mx:1652 #10 0x00007fd9ba1d9114 in runPhase (c=0x606898, phase=4) at mal_scenario.mx:604 #11 0x00007fd9ba1d92eb in runScenarioBody (c=0x606898) at mal_scenario.mx:655 #12 0x00007fd9ba1d94d3 in runScenario (c=0x606898) at mal_scenario.mx:682 #13 0x00007fd9ba1da40d in MSserveClient (dummy=0x606898) at mal_session.mx:486 #14 0x00007fd9b6e0c971 in start_thread () from /lib/libpthread.so.0 #15 0x00007fd9b6b6892d in clone () from /lib/libc.so.6 #16 0x0000000000000000 in ?? () (gdb) frame 7 #7 0x00007fd9ad435c9f in SQLexecutePrepared (c=0x606898, be=0x278fefc8, q=0x18c31ce8) at sql_scenario.mx:1490 1490 ret= callMAL(c, mb, &glb, argv, (m->emod & mod_debug?'n':0)); (gdb) print *q $1 = {next = 0x4ec3398, type = 2, sa = 0x1b6bf468, s = 0x2426b268, params = 0x2427a878, paramlen = 2, stk = 615328504, code = 0x1b77a958, id = 58, key = 5856, codestring = 0x20083678 "delete from \"20789445e300fa1e535f3027d5d63dc9_sessions\" where session_start between 1280361600000 and 1280447999999;", name = 0x39bc698 "s58_1", count = 18} (gdb) (gdb) thread 2 [Switching to thread 2 (Thread 0x7fd9acbb7700 (LWP 3270))]#0 0x00007fd9b974c8e8 in BATins_kdiff (bn=0x24daa0c8, l=0x1dd47858, r=0x37498ea0) at gdk_setop.mx:498 498 HASHloop@4(ri, r->H->hash, s2, h) { (gdb) l 493 BATloop(l, p1, q1) { 494 h = BUNh@2(li, p1); 495 t = BUNtail(li, p1); 496 ins = TRUE; 497 if (@6) /* check for not-nil (nils don't match anyway) */ 498 HASHloop@4(ri, r->H->hash, s2, h) { 499 if (EQUAL@5(t, BUNtail(ri, s2))) { 500 HIT@1(h, t); 501 ins = FALSE; 502 break; (gdb) p ri $14 = {b = 0x37498ea0, hvid = 0, tvid = 0} (gdb) p s2 $15 = 9223372036854775807 (gdb) p h $16 = (ptr) 0x7fd9acbb3f88 (gdb) p r->H->hash $17 = (Hash *) 0x0 (gdb) p *r->H $18 = {id = 0x7fd9b9c48f7f "t", width = 8, type = 7 '\a', shift = 3 '\003', sorted = 0 '\000', varsized = 0, key = 0, dense = 0, nonil = 1, nil = 0, unused = 0, align = 0, nosorted_rev = 0, nokey = {0, 0}, nosorted = 0, nodense = 182, seq = 0, heap = {maxsize = 157280, free = 137000, size = 157280, base = 0x15c6b2b8 "", filename = 0x374990d8 "12/40/124015.tail", storage = 0 '\000', copied = 0, hashash = 0, forcemap = 0, newstorage = 0 '\000', dirty = 0 '\000', parentid = 0}, vheap = 0x0, hash = 0x0, props = 0x0} (gdb) p *r $19 = {batCacheid = -43021, H = 0x37498f58, T = 0x37498ec8, P = 0x37498fe8, U = 0x37499000} Structure of the table: sql>\d "20789445e300fa1e535f3027d5d63dc9_sessions" CREATE TABLE "reporting"."20789445e300fa1e535f3027d5d63dc9_sessions" ( "session_start" BIGINT, "session_id" CHAR(51), "the_day" CHAR(10), "cart" BOOLEAN, "purchased" BOOLEAN, "merchant_total_dollars" INTEGER, "co_total_dollars" INTEGER, "baseline_dollars" INTEGER, "billing_baseline_dollars" INTEGER, "promo_determination" VARCHAR(20), "session_enabled" BOOLEAN, "co_enabled" BOOLEAN, "co_managed" BOOLEAN, "promo" VARCHAR(100), "sushi" VARCHAR(100), "url_referrer" VARCHAR(1024) ); On Jun 2, 2011, at 2:01 PM, Philippe Hanrigou wrote:
Hi Stefan,
Thanks a lot for the help. I really appreciate it.
On Jun 2, 2011, at 7:23 AM, Stefan Manegold wrote:
2011-06-02 04:53:27 MSG prod_reporting[25449]: !SQLException:SQLinit:Catalogue initialization failed 2011-06-02 04:53:27 MSG prod_reporting[25449]: !ERROR: HEAPextend: failed to extend to 3316460814336 for 11/40/114026theap ^^^^^^^^^^^^^ This suggests that MonetDB "for some reason" (possibly wrongly) expects some (intermediate) column to grow up to 3 TB in size, hence, tries to alloced the respective memory, but fails to do so successfully.
We can try to investigate where this happens, but as much information about your usage of MonetDB (DB schema, data, query workload) as possible would be very helpful for us to be able to locate the origin of the problem.
3TB seems quite crazy, I wonder how I end up triggering this with a 4.3G database (as measured by "du -hs" on disk).
The workload is a serie of updates to our reference data in a database. I am simulating some upserts with a combination of DELETE/COPY INTO: I want to refresh data about some "sessions" for a "merchant". Each merchant has a dedicated table, named "<merchant id>_sessions";
To refresh the session metrics I execute:
-- for each day: -- for each merchant
DELETE FROM "
_sessions" WHERE session_start BETWEEN <start of day timestamp> AND <end of day timestamp>; COPY INTO " _sessions" FROM STDIN USING DELIMITERS '\\t','\\n'; ... (up to 8000 session rows for this merchant and day) This is my poor man's way of simulating upserts as my email on the topic did not generate many suggestions ;-) http://sourceforge.net/mailarchive/forum.php?thread_name=BANLkTi%3DdX-1DFka5NRnZUEj%3DVdi3Sz-Kkg%40mail.gmail.com&forum_name=monetdb-users
I am willing to try other ways to accomplish the same thing as long as it is performant for a bulk upserts.
I'm afraid, though, we might need to be able to replay your complete scenario and trigger the some error with us to be able to locate and fix the problem.
I will try again tomorrow. I will try adding an explicit maximum number of records with COPY 8000 RECORDS INTO ... to see if it makes any difference.
I see from your logs that you are using the latest Apr2011-SP1 release (64-bit on a 64-bit Linux system). Did you experience the problem also with earlier releases of MonetDB?
Yes indeed it first happened with Apr2011. I then upgraded to Apr2011-SP1, started with a fresh database, reran the import/refresh from scratch and was able to reproduce the problem again.
The server crashes with a segmentation fault, and we'd need to know where in the code (and why) this happens. I only(?) way to find out would be to start the server by hand in a debugger, using the same commandline options as monetdbd (merovingian) uses (see your log below), e.g.,

Dear Philippe, On 10-06-2011 22:02:31 -0700, Philippe Hanrigou wrote:
Hi I spent more time analyzing how the original SIGSEGV occurs. I hope somebody
could help me push the analysis further.
The SIGSEGV is always happening in a DELETE statement:
delete from \"20789445e300fa1e535f3027d5d63dc9_sessions\" where session_start between 1280361600000 and 1280447999999;
and is triggered on line 498 of gdk_setop.mx:
HASHloop@4(ri, r->H->hash, s2, h) {
The problem seems to be with `r->H->hash` whose value is 0x0
I have no idea of where r->H->hash should have been set, or how to
push the investigation further. Any help would be greatly appreciated.
I have included below a capture of my gdb session which will provide
Please enter this concrete information in a bugreport at http://bugs.monetdb.org/ It is easier working with issues like those from there. Thanks in advance!

Thanks Fabian, I have created bug 2820 with all the information: http://bugs.monetdb.org/show_bug.cgi?id=2820 Cheers, - Philippe On Jun 10, 2011, at 11:28 PM, Fabian Groffen wrote:
Dear Philippe,
On 10-06-2011 22:02:31 -0700, Philippe Hanrigou wrote:
Hi I spent more time analyzing how the original SIGSEGV occurs. I hope somebody
could help me push the analysis further.
The SIGSEGV is always happening in a DELETE statement:
delete from \"20789445e300fa1e535f3027d5d63dc9_sessions\" where session_start between 1280361600000 and 1280447999999;
and is triggered on line 498 of gdk_setop.mx:
HASHloop@4(ri, r->H->hash, s2, h) {
The problem seems to be with `r->H->hash` whose value is 0x0
I have no idea of where r->H->hash should have been set, or how to
push the investigation further. Any help would be greatly appreciated.
I have included below a capture of my gdb session which will provide
Please enter this concrete information in a bugreport at http://bugs.monetdb.org/ It is easier working with issues like those from there.
Thanks in advance!
------------------------------------------------------------------------------ EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
participants (4)
-
Fabian Groffen
-
Martin Kersten
-
Philippe Hanrigou
-
Stefan Manegold