Hi Stefan,

Thanks for the quick reply!  So, I tried 11.13.9 and got the same results.

The interesting thing is that this happens right at startup, which made me suspect that the data file itself is already corrupted due to an earlier issue.

Going through my logs, I see the following:

2013-02-22 13:59:34 MSG merovingian[1742]: sending process 20151 (database 'click') the TERM signal
2013-02-22 13:59:34 ERR merovingian[1742]: timeout of 0 seconds expired, sending process 20151 (database 'click') the KILL signal
2013-02-22 13:59:34 MSG control[1742]: (local): stopped database 'click'
2013-02-22 13:59:34 MSG merovingian[1742]: database 'click' (20151) was killed by signal SIGKILL
2013-02-22 13:59:34 MSG merovingian[1742]: database 'click' has crashed after start on 2013-02-22 13:59:23, attempting restart, up min/avg/max: 0s/0s/0s, crash average: 1.00 0.10 0.03 (1-0=1)

This was done in response to issuing a "monetdb stop" command.  The exittimeout parameter for the monetdbd daemon is set to 0.  I interpreted the man page to mean that this is an unlimited timeout:

"A  time-out  value  of 0 means no mservers will be shut down, and hence they will continue  to  run  after  monetdbd has shut down."

I'm now setting the exittimeout to a high value to see if that does the trick.  I'll let you know how it goes after we finish running our test scenario.

Thanks!
Percy



On Mon, Feb 25, 2013 at 11:58 AM, Stefan Manegold <Stefan.Manegold@cwi.nl> wrote:
Hi Percy,

your segfault does not instantly ring a bell with me, but before you/we dive deeply into this, I am wondering whether you could upgrade from you Oct2012-SP2 to Oct2012-SP3 or even Feb2013, and check whether the crash(es) still occur?

Stefan

----- Original Message -----
> Dear Monet Team,
>
> We're having an issue in which certain data is causing mserver5 to
> crash.
>  After this condition is hit, mserver5 crashes at every startup and
>  always
> dumps an identical core.
>
> We're running Monet 11.13.7 on Ubuntu Linux 64 bit.
>
> The error is happening on line 1142 of gdk_atoms.c:
>
> if (GDK_STRCMP(v, (str) (next + 1) + extralen) == 0) {
>
> Examining the core dump revealed that (next + 1) + extralen is
> referring to
> an out of bounds address.  Here's the backtrace:
>
> #0  0x00007faf58414829 in strPut (h=0x1e2d180, dst=0x7fff592cf8f8,
> v=0x314dac0 "SAD014H1") at gdk_atoms.c:1142
> #1  0x00007faf582dc935 in BATappend (b=0x1e2cf90, n=0x32dfdb0,
> force=1
> '\001') at gdk_batop.c:578
> #2  0x00007faf584c301e in la_bat_updates (lg=0x2d9b030, la=0x2c3ef48)
> at
> gdk_logger.c:429
> #3  0x00007faf584c3cf9 in la_apply (lg=0x2d9b030, c=0x2c3ef48) at
> gdk_logger.c:645
> #4  0x00007faf584c3f26 in tr_commit (lg=0x2d9b030, tr=0x2e247d0) at
> gdk_logger.c:705
> #5  0x00007faf584c4533 in logger_readlog (lg=0x2d9b030,
>     filename=0x7fff592d1e80
> "/opt/clicksecurity/data/_monetdb/click/sql_logs/sql/log.56") at
> gdk_logger.c:823
> #6  0x00007faf584c482a in logger_readlogs (lg=0x2d9b030,
> fp=0x2d9b160,
>     filename=0x7fff592d3f90
> "/opt/clicksecurity/data/_monetdb/click/sql_logs/sql/log") at
> gdk_logger.c:896
> #7  0x00007faf584c6f3e in logger_new (debug=0, fn=0x7faf500adfa8
> "sql",
> logdir=0x7faf50090a08 "sql_logs", dbname=0x1fa3da0 "click",
>     version=52001, prefuncp=0x7faf500746a1 <bl_preversion>,
> postfuncp=0x7faf500747ed <bl_postversion>) at gdk_logger.c:1420
> #8  0x00007faf584c704e in logger_create (debug=0, fn=0x7faf500adfa8
> "sql",
> logdir=0x7faf50090a08 "sql_logs", dbname=0x1fa3da0 "click",
>     version=52001, prefuncp=0x7faf500746a1 <bl_preversion>,
> postfuncp=0x7faf500747ed <bl_postversion>) at gdk_logger.c:1446
> #9  0x00007faf50075b19 in bl_create (logdir=0x7faf50090a08
> "sql_logs",
> dbname=0x1fa3da0 "click", cat_version=52001) at bat_logger.c:249
> #10 0x00007faf50060ce4 in store_init (debug=0, store=store_bat,
> logdir=0x7faf50090a08 "sql_logs", dbname=0x1fa3da0 "click", stk=0)
>     at store.c:1287
> #11 0x00007faf4ffe3d3c in mvc_init (dbname=0x1fa3da0 "click",
> debug=0,
> store=store_bat, stk=0) at sql_mvc.c:51
> #12 0x00007faf4ff66874 in SQLinit () at sql_scenario.c:230
> #13 0x00007faf4ff6651f in SQLprelude () at sql_scenario.c:159
> #14 0x00007faf58b3085d in malCommandCall (stk=0x2d36e80,
> pci=0x2ea5520) at
> mal_interpreter.c:137
> #15 0x00007faf58b331b5 in runMALsequence (cntxt=0x7faf5988c020,
> mb=0x1e04310, startpc=1, stoppc=0, stk=0x2d36e80, env=0x0,
> pcicaller=0x0)
>     at mal_interpreter.c:710
> #16 0x00007faf58b323c1 in runMAL (cntxt=0x7faf5988c020, mb=0x1e04310,
> startpc=1, mbcaller=0x0, env=0x0, pcicaller=0x0)
>     at mal_interpreter.c:454
> #17 0x00007faf58b60a08 in MALengine (c=0x7faf5988c020) at
> mal_session.c:619
> #18 0x00007faf58b5f21f in malBootstrap () at mal_session.c:64
> #19 0x00007faf58b1313b in mal_init () at mal.c:244
> #20 0x000000000040340e in main (argc=22, av=0x7fff592db568) at
> mserver5.c:582
>
> I'm digging into it now, but I was hoping that it might ring some
> bells.
>
> Thanks,
>
> --
>
> Percy Wegmann
> +1 512 637 8500 ext 148
>
> _______________________________________________
> users-list mailing list
> users-list@monetdb.org
> http://mail.monetdb.org/mailman/listinfo/users-list
>

--
| Stefan.Manegold@CWI.nl | DB Architectures   (DA) |
| www.CWI.nl/~manegold/  | Science Park 123 (L321) |
| +31 (0)20 592-4212     | 1098 XG Amsterdam  (NL) |

_______________________________________________
users-list mailing list
users-list@monetdb.org
http://mail.monetdb.org/mailman/listinfo/users-list



--

Percy Wegmann
+1 512 637 8500 ext 148