Hi all, We are using MoentDB since 6 months with quite good results.(thanks to the dev team for their good work) Unfortunately yesterday we came across an issue for which we can't find a solution. Our database crashed for an unknown reason (RAM was fine, Disk space was fine) and now each time we re-use the same table with a query we used during the crash, the database crash with a seg fault Query : select * from storage(); OS Version : CentOS release 6.7 (Final) MonetDB Version : MonetDB Database Server Toolkit v1.1 (Jul2015) tail /var/log/messages : kernel: mserver5[9945]: segfault at 18 ip 00007f9e86fd534b sp 00007f9e862912e0 error 4 in lib_sql.so[7f9e86eb1000+182000] merovingian.log : database 'BI-DEV' (9972) was killed by signal SIGSEGV Log from gbd : Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffa24edc700 (LWP 4683)] 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/lib_sql.so (gdb) bt #0 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/lib_sql.so #1 0x00007ffa25b20fee in sql_storage () from /usr/lib64/monetdb5/lib_sql.so #2 0x00007ffa321aea45 in runMALsequence () from /usr/lib64/libmonetdb5.so.19 #3 0x00007ffa321afe29 in callMAL () from /usr/lib64/libmonetdb5.so.19 #4 0x00007ffa25b369f7 in SQLexecutePrepared () from /usr/lib64/monetdb5/lib_sql.so #5 0x00007ffa25b36efc in SQLengineIntern () from /usr/lib64/monetdb5/lib_sql.so #6 0x00007ffa321c87dd in runScenarioBody () from /usr/lib64/libmonetdb5.so.19 #7 0x00007ffa321c886f in runScenario () from /usr/lib64/libmonetdb5.so.19 #8 0x00007ffa321c9738 in MSserveClient () from /usr/lib64/libmonetdb5.so.19 #9 0x00007ffa321ca816 in MSscheduleClient () from /usr/lib64/libmonetdb5.so.19 #10 0x00007ffa32266e0f in doChallenge () from /usr/lib64/libmonetdb5.so.19 #11 0x00007ffa31d2adcf in thread_starter () from /usr/lib64/libbat.so.12 #12 0x00007ffa2f97da51 in start_thread (arg=0x7ffa24edc700) at pthread_create.c:301 #13 0x00007ffa2f6ca93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115 Any help to understand and correct what is going on would be nice. Regards Mathieu
Hai Mathieu, Thanks for using MonetDB, and sorry for the crash. If possible, can you please give us the necessary data to reproduce the crash? They include: - the schema - a small set of (anonymised) data - the queries You can also compile MonetDB from source with the --enable-debug option, so that GDB can give you the exact line where the crash has happend, and the value of the variable/statement/function/etc that has caused the crash. Regards, Jennie
On Oct 15, 2015, at 17:50, Mathieu Raillard
wrote: Hi all,
We are using MoentDB since 6 months with quite good results.(thanks to the dev team for their good work)
Unfortunately yesterday we came across an issue for which we can't find a solution. Our database crashed for an unknown reason (RAM was fine, Disk space was fine) and now each time we re-use the same table with a query we used during the crash, the database crash with a seg fault
Query : select * from storage(); OS Version : CentOS release 6.7 (Final) MonetDB Version : MonetDB Database Server Toolkit v1.1 (Jul2015)
tail /var/log/messages : kernel: mserver5[9945]: segfault at 18 ip 00007f9e86fd534b sp 00007f9e862912e0 error 4 in lib_sql.so[7f9e86eb1000+182000]
merovingian.log : database 'BI-DEV' (9972) was killed by signal SIGSEGV
Log from gbd :
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffa24edc700 (LWP 4683)] 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/lib_sql.so (gdb) bt #0 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/lib_sql.so #1 0x00007ffa25b20fee in sql_storage () from /usr/lib64/monetdb5/lib_sql.so #2 0x00007ffa321aea45 in runMALsequence () from /usr/lib64/libmonetdb5.so.19 #3 0x00007ffa321afe29 in callMAL () from /usr/lib64/libmonetdb5.so.19 #4 0x00007ffa25b369f7 in SQLexecutePrepared () from /usr/lib64/monetdb5/lib_sql.so #5 0x00007ffa25b36efc in SQLengineIntern () from /usr/lib64/monetdb5/lib_sql.so #6 0x00007ffa321c87dd in runScenarioBody () from /usr/lib64/libmonetdb5.so.19 #7 0x00007ffa321c886f in runScenario () from /usr/lib64/libmonetdb5.so.19 #8 0x00007ffa321c9738 in MSserveClient () from /usr/lib64/libmonetdb5.so.19 #9 0x00007ffa321ca816 in MSscheduleClient () from /usr/lib64/libmonetdb5.so.19 #10 0x00007ffa32266e0f in doChallenge () from /usr/lib64/libmonetdb5.so.19 #11 0x00007ffa31d2adcf in thread_starter () from /usr/lib64/libbat.so.12 #12 0x00007ffa2f97da51 in start_thread (arg=0x7ffa24edc700) at pthread_create.c:301 #13 0x00007ffa2f6ca93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Any help to understand and correct what is going on would be nice.
Regards
Mathieu _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi all,
We have managed to recompile MonetDB using the source code from
MonetDB-11.21.5.tar.xz and with the option " --enable-debug"
We also have a backup of the database (600MB) that can be provided for
debugging purpose as the crash is perfectly reproducible.
When running "select * from sys.storage();", here is the information
gathered in gdb:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f3c24aad700 (LWP 46812)]
0x00007f3c25834fd9 in delta_bind_bat (bat=0x234b480, access=0, temp=0) at
bat_storage.c:166
166 bat_set_access(b, BAT_READ);
(gdb)
Continuing.
[Thread 0x7f3c25ce5700 (LWP 46809) exited]
[Thread 0x7f3c24eaf700 (LWP 46810) exited]
[Thread 0x7f3c24cae700 (LWP 46811) exited]
[Thread 0x7f3c245ab700 (LWP 46817) exited]
[Thread 0x7f3c247ac700 (LWP 46822) exited]
[Thread 0x7f3c17fff700 (LWP 46823) exited]
[Thread 0x7f3c24aad700 (LWP 46812) exited]
Program terminated with signal SIGSEGV, Segmentation fault.
The program no longer exists.
(gdb)
The program is not being run.
(gdb)
The program is not being run.
(gdb) backtrace
No stack.
(gdb)
Regards
Mathieu
On Sun, Oct 18, 2015 at 6:03 PM, Ying Zhang
Hai Mathieu,
Thanks for using MonetDB, and sorry for the crash.
If possible, can you please give us the necessary data to reproduce the crash? They include: - the schema - a small set of (anonymised) data - the queries
You can also compile MonetDB from source with the --enable-debug option, so that GDB can give you the exact line where the crash has happend, and the value of the variable/statement/function/etc that has caused the crash.
Regards,
Jennie
On Oct 15, 2015, at 17:50, Mathieu Raillard
wrote: Hi all,
We are using MoentDB since 6 months with quite good results.(thanks to the dev team for their good work)
Unfortunately yesterday we came across an issue for which we can't find a solution. Our database crashed for an unknown reason (RAM was fine, Disk space was fine) and now each time we re-use the same table with a query we used during the crash, the database crash with a seg fault
Query : select * from storage(); OS Version : CentOS release 6.7 (Final) MonetDB Version : MonetDB Database Server Toolkit v1.1 (Jul2015)
tail /var/log/messages : kernel: mserver5[9945]: segfault at 18 ip 00007f9e86fd534b sp 00007f9e862912e0 error 4 in lib_sql.so[7f9e86eb1000+182000]
merovingian.log : database 'BI-DEV' (9972) was killed by signal SIGSEGV
Log from gbd :
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7ffa24edc700 (LWP 4683)] 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/lib_sql.so (gdb) bt #0 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/lib_sql.so #1 0x00007ffa25b20fee in sql_storage () from /usr/lib64/monetdb5/lib_sql.so #2 0x00007ffa321aea45 in runMALsequence () from /usr/lib64/libmonetdb5.so.19 #3 0x00007ffa321afe29 in callMAL () from /usr/lib64/libmonetdb5.so.19 #4 0x00007ffa25b369f7 in SQLexecutePrepared () from /usr/lib64/monetdb5/lib_sql.so #5 0x00007ffa25b36efc in SQLengineIntern () from /usr/lib64/monetdb5/lib_sql.so #6 0x00007ffa321c87dd in runScenarioBody () from /usr/lib64/libmonetdb5.so.19 #7 0x00007ffa321c886f in runScenario () from /usr/lib64/libmonetdb5.so.19 #8 0x00007ffa321c9738 in MSserveClient () from /usr/lib64/libmonetdb5.so.19 #9 0x00007ffa321ca816 in MSscheduleClient () from /usr/lib64/libmonetdb5.so.19 #10 0x00007ffa32266e0f in doChallenge () from /usr/lib64/libmonetdb5.so.19 #11 0x00007ffa31d2adcf in thread_starter () from /usr/lib64/libbat.so.12 #12 0x00007ffa2f97da51 in start_thread (arg=0x7ffa24edc700) at pthread_create.c:301 #13 0x00007ffa2f6ca93d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Any help to understand and correct what is going on would be nice.
Regards
Mathieu _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On Wed, Oct 21, 2015 at 09:53:31AM +0200, Mathieu Raillard wrote:
Hi all,
We have managed to recompile MonetDB using the source code from MonetDB-11.21.5.tar.xz and with the option " --enable-debug" We also have a backup of the database (600MB) that can be provided for debugging purpose as the crash is perfectly reproducible. When running "select * from sys.storage();", here is the information gathered in gdb:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f3c24aad700 (LWP 46812)] 0x00007f3c25834fd9 in delta_bind_bat (bat=0x234b480, access=0, temp=0) at bat_storage.c:166 166 bat_set_access(b, BAT_READ); We need the back trace and probably from the calling function (ie above delta_bind_bat) you could print the column structure (print *c).
Niels
(gdb) Continuing. [Thread 0x7f3c25ce5700 (LWP 46809) exited] [Thread 0x7f3c24eaf700 (LWP 46810) exited] [Thread 0x7f3c24cae700 (LWP 46811) exited] [Thread 0x7f3c245ab700 (LWP 46817) exited] [Thread 0x7f3c247ac700 (LWP 46822) exited] [Thread 0x7f3c17fff700 (LWP 46823) exited] [Thread 0x7f3c24aad700 (LWP 46812) exited]
Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. (gdb) The program is not being run. (gdb) The program is not being run. (gdb) backtrace No stack. (gdb)
Regards
Mathieu
On Sun, Oct 18, 2015 at 6:03 PM, Ying Zhang
wrote: Hai Mathieu,
Thanks for using MonetDB, and sorry for the crash.
If possible, can you please give us the necessary data to reproduce the crash? They include: - the schema - a small set of (anonymised) data - the queries
You can also compile MonetDB from source with the --enable-debug option, so that GDB can give you the exact line where the crash has happend, and the value of the variable/statement/function/etc that has caused the crash.
Regards,
Jennie
> On Oct 15, 2015, at 17:50, Mathieu Raillard < mraillard@data-mat.fr> wrote: > > Hi all, > > We are using MoentDB since 6 months with quite good results. (thanks to the dev team for their good work) > > Unfortunately yesterday we came across an issue for which we can't find a solution. > Our database crashed for an unknown reason (RAM was fine, Disk space was fine) and now each time we re-use the same table with a query we used during the crash, the database crash with a seg fault > > Query : select * from storage(); > OS Version : CentOS release 6.7 (Final) > MonetDB Version : MonetDB Database Server Toolkit v1.1 (Jul2015) > > tail /var/log/messages : > kernel: mserver5[9945]: segfault at 18 ip 00007f9e86fd534b sp 00007f9e862912e0 error 4 in lib_sql.so[7f9e86eb1000+182000] > > merovingian.log : > database 'BI-DEV' (9972) was killed by signal SIGSEGV > > Log from gbd : > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffa24edc700 (LWP 4683)] > 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/ lib_sql.so > (gdb) bt > #0 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/ monetdb5/lib_sql.so > #1 0x00007ffa25b20fee in sql_storage () from /usr/lib64/monetdb5 /lib_sql.so > #2 0x00007ffa321aea45 in runMALsequence () from /usr/lib64/ libmonetdb5.so.19 > #3 0x00007ffa321afe29 in callMAL () from /usr/lib64/ libmonetdb5.so.19 > #4 0x00007ffa25b369f7 in SQLexecutePrepared () from /usr/lib64/ monetdb5/lib_sql.so > #5 0x00007ffa25b36efc in SQLengineIntern () from /usr/lib64/ monetdb5/lib_sql.so > #6 0x00007ffa321c87dd in runScenarioBody () from /usr/lib64/ libmonetdb5.so.19 > #7 0x00007ffa321c886f in runScenario () from /usr/lib64/ libmonetdb5.so.19 > #8 0x00007ffa321c9738 in MSserveClient () from /usr/lib64/ libmonetdb5.so.19 > #9 0x00007ffa321ca816 in MSscheduleClient () from /usr/lib64/ libmonetdb5.so.19 > #10 0x00007ffa32266e0f in doChallenge () from /usr/lib64/ libmonetdb5.so.19 > #11 0x00007ffa31d2adcf in thread_starter () from /usr/lib64/ libbat.so.12 > #12 0x00007ffa2f97da51 in start_thread (arg=0x7ffa24edc700) at pthread_create.c:301 > #13 0x00007ffa2f6ca93d in clone () at ../sysdeps/unix/sysv/linux/ x86_64/clone.S:115 > > > Any help to understand and correct what is going on would be nice. > > Regards > > Mathieu > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl
Hi all,
As gdb isn't providing any stacktrace when segfault occurs, we ve launched
mserver5 with valgrind.
Here is the valgrind output when the server is crashing:
==1995== Thread 8:
==1995== Syscall param write(buf) points to uninitialised byte(s)
==1995== at 0x71D877D: ??? (in /lib64/libpthread-2.12.so)
==1995== by 0x562BD6A: GDKsave (gdk_storage.c:369)
==1995== by 0x552BF93: HEAPsave_intern (gdk_heap.c:708)
==1995== by 0x552BFD0: HEAPsave (gdk_heap.c:714)
==1995== by 0x57B561E: BATimprints (gdk_imprints.c:847)
==1995== by 0x54FD095: BAT_scanselect (gdk_select.c:910)
==1995== by 0x5504180: BATsubselect (gdk_select.c:1719)
==1995== by 0x4EE0D53: ALGsubselect2 (algebra.c:341)
==1995== by 0x4E76C22: malCommandCall (mal_interpreter.c:119)
==1995== by 0x4E790CA: runMALsequence (mal_interpreter.c:655)
==1995== by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376)
==1995== by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so)
==1995== Address 0x14724820 is 64 bytes inside a block of size 1,760
alloc'd
==1995== at 0x4C27A2E: malloc (vg_replace_malloc.c:270)
==1995== by 0x559286F: GDKmalloc_prefixsize (gdk_utils.c:641)
==1995== by 0x55928D8: GDKmallocmax (gdk_utils.c:667)
==1995== by 0x552A212: HEAPalloc (gdk_heap.c:105)
==1995== by 0x57B3F72: BATimprints (gdk_imprints.c:770)
==1995== by 0x54FD095: BAT_scanselect (gdk_select.c:910)
==1995== by 0x5504180: BATsubselect (gdk_select.c:1719)
==1995== by 0x4EE0D53: ALGsubselect2 (algebra.c:341)
==1995== by 0x4E76C22: malCommandCall (mal_interpreter.c:119)
==1995== by 0x4E790CA: runMALsequence (mal_interpreter.c:655)
==1995== by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376)
==1995== by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so)
==1995==
==1995== Thread 5:
==1995== Invalid read of size 8
==1995== at 0x11713FD9: delta_bind_bat (bat_storage.c:166)
==1995== by 0x11714127: bind_col (bat_storage.c:185)
==1995== by 0x115E98AB: sql_storage (sql.c:4742)
==1995== by 0x4E790A5: runMALsequence (mal_interpreter.c:631)
==1995== by 0x4E78596: callMAL (mal_interpreter.c:447)
==1995== by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328)
==1995== by 0x115F2143: SQLengineIntern (sql_execute.c:390)
==1995== by 0x115F0DD1: SQLengine (sql_scenario.c:1307)
==1995== by 0x4E9691A: runPhase (mal_scenario.c:515)
==1995== by 0x4E96AE4: runScenarioBody (mal_scenario.c:560)
==1995== by 0x4E96BF3: runScenario (mal_scenario.c:579)
==1995== by 0x4E97B97: MSserveClient (mal_session.c:439)
==1995== Address 0x18 is not stack'd, malloc'd or (recently) free'd
==1995==
==1995==
==1995== Process terminating with default action of signal 11 (SIGSEGV)
==1995== Access not within mapped region at address 0x18
==1995== at 0x11713FD9: delta_bind_bat (bat_storage.c:166)
==1995== by 0x11714127: bind_col (bat_storage.c:185)
==1995== by 0x115E98AB: sql_storage (sql.c:4742)
==1995== by 0x4E790A5: runMALsequence (mal_interpreter.c:631)
==1995== by 0x4E78596: callMAL (mal_interpreter.c:447)
==1995== by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328)
==1995== by 0x115F2143: SQLengineIntern (sql_execute.c:390)
==1995== by 0x115F0DD1: SQLengine (sql_scenario.c:1307)
==1995== by 0x4E9691A: runPhase (mal_scenario.c:515)
==1995== by 0x4E96AE4: runScenarioBody (mal_scenario.c:560)
==1995== by 0x4E96BF3: runScenario (mal_scenario.c:579)
==1995== by 0x4E97B97: MSserveClient (mal_session.c:439)
==1995== If you believe this happened as a result of a stack
==1995== overflow in your program's main thread (unlikely but
==1995== possible), you can try to increase the size of the
==1995== main thread stack using the --main-stacksize= flag.
==1995== The main thread stack size used in this run was 10485760.
==1995==
==1995== HEAP SUMMARY:
==1995== in use at exit: 41,189,795 bytes in 200,474 blocks
==1995== total heap usage: 279,871 allocs, 79,397 frees, 66,479,732 bytes
allocated
==1995==
==1995== LEAK SUMMARY:
==1995== definitely lost: 2,432 bytes in 74 blocks
==1995== indirectly lost: 0 bytes in 0 blocks
==1995== possibly lost: 40,090,357 bytes in 200,294 blocks
==1995== still reachable: 1,097,006 bytes in 106 blocks
==1995== suppressed: 0 bytes in 0 blocks
==1995== Rerun with --leak-check=full to see details of leaked memory
==1995==
==1995== For counts of detected and suppressed errors, rerun with: -v
==1995== Use --track-origins=yes to see where uninitialised values come from
==1995== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 21 from 9)
Regards
On Wed, Oct 21, 2015 at 10:18 AM, Niels Nes
On Wed, Oct 21, 2015 at 09:53:31AM +0200, Mathieu Raillard wrote:
Hi all,
We have managed to recompile MonetDB using the source code from MonetDB-11.21.5.tar.xz and with the option " --enable-debug" We also have a backup of the database (600MB) that can be provided for debugging purpose as the crash is perfectly reproducible. When running "select * from sys.storage();", here is the information gathered in gdb:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f3c24aad700 (LWP 46812)] 0x00007f3c25834fd9 in delta_bind_bat (bat=0x234b480, access=0, temp=0) at bat_storage.c:166 166 bat_set_access(b, BAT_READ); We need the back trace and probably from the calling function (ie above delta_bind_bat) you could print the column structure (print *c).
Niels
(gdb) Continuing. [Thread 0x7f3c25ce5700 (LWP 46809) exited] [Thread 0x7f3c24eaf700 (LWP 46810) exited] [Thread 0x7f3c24cae700 (LWP 46811) exited] [Thread 0x7f3c245ab700 (LWP 46817) exited] [Thread 0x7f3c247ac700 (LWP 46822) exited] [Thread 0x7f3c17fff700 (LWP 46823) exited] [Thread 0x7f3c24aad700 (LWP 46812) exited]
Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. (gdb) The program is not being run. (gdb) The program is not being run. (gdb) backtrace No stack. (gdb)
Regards
Mathieu
On Sun, Oct 18, 2015 at 6:03 PM, Ying Zhang
wrote: Hai Mathieu,
Thanks for using MonetDB, and sorry for the crash.
If possible, can you please give us the necessary data to reproduce the crash? They include: - the schema - a small set of (anonymised) data - the queries
You can also compile MonetDB from source with the --enable-debug option, so that GDB can give you the exact line where the crash has happend, and the value of the variable/statement/function/etc that has caused the crash.
Regards,
Jennie
> On Oct 15, 2015, at 17:50, Mathieu Raillard < mraillard@data-mat.fr> wrote: > > Hi all, > > We are using MoentDB since 6 months with quite good results. (thanks to the dev team for their good work) > > Unfortunately yesterday we came across an issue for which we can't find a solution. > Our database crashed for an unknown reason (RAM was fine, Disk space was fine) and now each time we re-use the same table with a query we used during the crash, the database crash with a seg fault > > Query : select * from storage(); > OS Version : CentOS release 6.7 (Final) > MonetDB Version : MonetDB Database Server Toolkit v1.1 (Jul2015) > > tail /var/log/messages : > kernel: mserver5[9945]: segfault at 18 ip 00007f9e86fd534b sp 00007f9e862912e0 error 4 in lib_sql.so[7f9e86eb1000+182000] > > merovingian.log : > database 'BI-DEV' (9972) was killed by signal SIGSEGV > > Log from gbd : > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffa24edc700 (LWP 4683)] > 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/ lib_sql.so > (gdb) bt > #0 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/ monetdb5/lib_sql.so > #1 0x00007ffa25b20fee in sql_storage () from /usr/lib64/monetdb5 /lib_sql.so > #2 0x00007ffa321aea45 in runMALsequence () from /usr/lib64/ libmonetdb5.so.19 > #3 0x00007ffa321afe29 in callMAL () from /usr/lib64/ libmonetdb5.so.19 > #4 0x00007ffa25b369f7 in SQLexecutePrepared () from /usr/lib64/ monetdb5/lib_sql.so > #5 0x00007ffa25b36efc in SQLengineIntern () from /usr/lib64/ monetdb5/lib_sql.so > #6 0x00007ffa321c87dd in runScenarioBody () from /usr/lib64/ libmonetdb5.so.19 > #7 0x00007ffa321c886f in runScenario () from /usr/lib64/ libmonetdb5.so.19 > #8 0x00007ffa321c9738 in MSserveClient () from /usr/lib64/ libmonetdb5.so.19 > #9 0x00007ffa321ca816 in MSscheduleClient () from /usr/lib64/ libmonetdb5.so.19 > #10 0x00007ffa32266e0f in doChallenge () from /usr/lib64/ libmonetdb5.so.19 > #11 0x00007ffa31d2adcf in thread_starter () from /usr/lib64/ libbat.so.12 > #12 0x00007ffa2f97da51 in start_thread (arg=0x7ffa24edc700) at pthread_create.c:301 > #13 0x00007ffa2f6ca93d in clone () at ../sysdeps/unix/sysv/linux/ x86_64/clone.S:115 > > > Any help to understand and correct what is going on would be nice. > > Regards > > Mathieu > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
On Oct 22, 2015, at 18:04, Mathieu Raillard
wrote: Hi all,
As gdb isn't providing any stacktrace when segfault occurs,
Hai Mathieu, Is this because you (probably accidentally) pressed the key “C” at some moment during the execution? Because in your GDB info. of yesterday morning, I see that GDB immediately continues after having received the SIGSEGV. Would you please retry running MonetDB in GDB to see if GDB will stop at the SIGSEGV? Then execute the following commands in GD: print *bat up print *c Thanks! Jennie
we ve launched mserver5 with valgrind. Here is the valgrind output when the server is crashing:
==1995== Thread 8: ==1995== Syscall param write(buf) points to uninitialised byte(s) ==1995== at 0x71D877D: ??? (in /lib64/libpthread-2.12.so) ==1995== by 0x562BD6A: GDKsave (gdk_storage.c:369) ==1995== by 0x552BF93: HEAPsave_intern (gdk_heap.c:708) ==1995== by 0x552BFD0: HEAPsave (gdk_heap.c:714) ==1995== by 0x57B561E: BATimprints (gdk_imprints.c:847) ==1995== by 0x54FD095: BAT_scanselect (gdk_select.c:910) ==1995== by 0x5504180: BATsubselect (gdk_select.c:1719) ==1995== by 0x4EE0D53: ALGsubselect2 (algebra.c:341) ==1995== by 0x4E76C22: malCommandCall (mal_interpreter.c:119) ==1995== by 0x4E790CA: runMALsequence (mal_interpreter.c:655) ==1995== by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376) ==1995== by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so) ==1995== Address 0x14724820 is 64 bytes inside a block of size 1,760 alloc'd ==1995== at 0x4C27A2E: malloc (vg_replace_malloc.c:270) ==1995== by 0x559286F: GDKmalloc_prefixsize (gdk_utils.c:641) ==1995== by 0x55928D8: GDKmallocmax (gdk_utils.c:667) ==1995== by 0x552A212: HEAPalloc (gdk_heap.c:105) ==1995== by 0x57B3F72: BATimprints (gdk_imprints.c:770) ==1995== by 0x54FD095: BAT_scanselect (gdk_select.c:910) ==1995== by 0x5504180: BATsubselect (gdk_select.c:1719) ==1995== by 0x4EE0D53: ALGsubselect2 (algebra.c:341) ==1995== by 0x4E76C22: malCommandCall (mal_interpreter.c:119) ==1995== by 0x4E790CA: runMALsequence (mal_interpreter.c:655) ==1995== by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376) ==1995== by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so) ==1995== ==1995== Thread 5: ==1995== Invalid read of size 8 ==1995== at 0x11713FD9: delta_bind_bat (bat_storage.c:166) ==1995== by 0x11714127: bind_col (bat_storage.c:185) ==1995== by 0x115E98AB: sql_storage (sql.c:4742) ==1995== by 0x4E790A5: runMALsequence (mal_interpreter.c:631) ==1995== by 0x4E78596: callMAL (mal_interpreter.c:447) ==1995== by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328) ==1995== by 0x115F2143: SQLengineIntern (sql_execute.c:390) ==1995== by 0x115F0DD1: SQLengine (sql_scenario.c:1307) ==1995== by 0x4E9691A: runPhase (mal_scenario.c:515) ==1995== by 0x4E96AE4: runScenarioBody (mal_scenario.c:560) ==1995== by 0x4E96BF3: runScenario (mal_scenario.c:579) ==1995== by 0x4E97B97: MSserveClient (mal_session.c:439) ==1995== Address 0x18 is not stack'd, malloc'd or (recently) free'd ==1995== ==1995== ==1995== Process terminating with default action of signal 11 (SIGSEGV) ==1995== Access not within mapped region at address 0x18 ==1995== at 0x11713FD9: delta_bind_bat (bat_storage.c:166) ==1995== by 0x11714127: bind_col (bat_storage.c:185) ==1995== by 0x115E98AB: sql_storage (sql.c:4742) ==1995== by 0x4E790A5: runMALsequence (mal_interpreter.c:631) ==1995== by 0x4E78596: callMAL (mal_interpreter.c:447) ==1995== by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328) ==1995== by 0x115F2143: SQLengineIntern (sql_execute.c:390) ==1995== by 0x115F0DD1: SQLengine (sql_scenario.c:1307) ==1995== by 0x4E9691A: runPhase (mal_scenario.c:515) ==1995== by 0x4E96AE4: runScenarioBody (mal_scenario.c:560) ==1995== by 0x4E96BF3: runScenario (mal_scenario.c:579) ==1995== by 0x4E97B97: MSserveClient (mal_session.c:439) ==1995== If you believe this happened as a result of a stack ==1995== overflow in your program's main thread (unlikely but ==1995== possible), you can try to increase the size of the ==1995== main thread stack using the --main-stacksize= flag. ==1995== The main thread stack size used in this run was 10485760. ==1995== ==1995== HEAP SUMMARY: ==1995== in use at exit: 41,189,795 bytes in 200,474 blocks ==1995== total heap usage: 279,871 allocs, 79,397 frees, 66,479,732 bytes allocated ==1995== ==1995== LEAK SUMMARY: ==1995== definitely lost: 2,432 bytes in 74 blocks ==1995== indirectly lost: 0 bytes in 0 blocks ==1995== possibly lost: 40,090,357 bytes in 200,294 blocks ==1995== still reachable: 1,097,006 bytes in 106 blocks ==1995== suppressed: 0 bytes in 0 blocks ==1995== Rerun with --leak-check=full to see details of leaked memory ==1995== ==1995== For counts of detected and suppressed errors, rerun with: -v ==1995== Use --track-origins=yes to see where uninitialised values come from ==1995== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 21 from 9)
Regards
On Wed, Oct 21, 2015 at 10:18 AM, Niels Nes
wrote: On Wed, Oct 21, 2015 at 09:53:31AM +0200, Mathieu Raillard wrote: Hi all,
We have managed to recompile MonetDB using the source code from MonetDB-11.21.5.tar.xz and with the option " --enable-debug" We also have a backup of the database (600MB) that can be provided for debugging purpose as the crash is perfectly reproducible. When running "select * from sys.storage();", here is the information gathered in gdb:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f3c24aad700 (LWP 46812)] 0x00007f3c25834fd9 in delta_bind_bat (bat=0x234b480, access=0, temp=0) at bat_storage.c:166 166 bat_set_access(b, BAT_READ); We need the back trace and probably from the calling function (ie above delta_bind_bat) you could print the column structure (print *c).
Niels
(gdb) Continuing. [Thread 0x7f3c25ce5700 (LWP 46809) exited] [Thread 0x7f3c24eaf700 (LWP 46810) exited] [Thread 0x7f3c24cae700 (LWP 46811) exited] [Thread 0x7f3c245ab700 (LWP 46817) exited] [Thread 0x7f3c247ac700 (LWP 46822) exited] [Thread 0x7f3c17fff700 (LWP 46823) exited] [Thread 0x7f3c24aad700 (LWP 46812) exited]
Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. (gdb) The program is not being run. (gdb) The program is not being run. (gdb) backtrace No stack. (gdb)
Regards
Mathieu
On Sun, Oct 18, 2015 at 6:03 PM, Ying Zhang
wrote: Hai Mathieu,
Thanks for using MonetDB, and sorry for the crash.
If possible, can you please give us the necessary data to reproduce the crash? They include: - the schema - a small set of (anonymised) data - the queries
You can also compile MonetDB from source with the --enable-debug option, so that GDB can give you the exact line where the crash has happend, and the value of the variable/statement/function/etc that has caused the crash.
Regards,
Jennie
> On Oct 15, 2015, at 17:50, Mathieu Raillard < mraillard@data-mat.fr> wrote: > > Hi all, > > We are using MoentDB since 6 months with quite good results. (thanks to the dev team for their good work) > > Unfortunately yesterday we came across an issue for which we can't find a solution. > Our database crashed for an unknown reason (RAM was fine, Disk space was fine) and now each time we re-use the same table with a query we used during the crash, the database crash with a seg fault > > Query : select * from storage(); > OS Version : CentOS release 6.7 (Final) > MonetDB Version : MonetDB Database Server Toolkit v1.1 (Jul2015) > > tail /var/log/messages : > kernel: mserver5[9945]: segfault at 18 ip 00007f9e86fd534b sp 00007f9e862912e0 error 4 in lib_sql.so[7f9e86eb1000+182000] > > merovingian.log : > database 'BI-DEV' (9972) was killed by signal SIGSEGV > > Log from gbd : > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffa24edc700 (LWP 4683)] > 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/ lib_sql.so > (gdb) bt > #0 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/ monetdb5/lib_sql.so > #1 0x00007ffa25b20fee in sql_storage () from /usr/lib64/monetdb5 /lib_sql.so > #2 0x00007ffa321aea45 in runMALsequence () from /usr/lib64/ libmonetdb5.so.19 > #3 0x00007ffa321afe29 in callMAL () from /usr/lib64/ libmonetdb5.so.19 > #4 0x00007ffa25b369f7 in SQLexecutePrepared () from /usr/lib64/ monetdb5/lib_sql.so > #5 0x00007ffa25b36efc in SQLengineIntern () from /usr/lib64/ monetdb5/lib_sql.so > #6 0x00007ffa321c87dd in runScenarioBody () from /usr/lib64/ libmonetdb5.so.19 > #7 0x00007ffa321c886f in runScenario () from /usr/lib64/ libmonetdb5.so.19 > #8 0x00007ffa321c9738 in MSserveClient () from /usr/lib64/ libmonetdb5.so.19 > #9 0x00007ffa321ca816 in MSscheduleClient () from /usr/lib64/ libmonetdb5.so.19 > #10 0x00007ffa32266e0f in doChallenge () from /usr/lib64/ libmonetdb5.so.19 > #11 0x00007ffa31d2adcf in thread_starter () from /usr/lib64/ libbat.so.12 > #12 0x00007ffa2f97da51 in start_thread (arg=0x7ffa24edc700) at pthread_create.c:301 > #13 0x00007ffa2f6ca93d in clone () at ../sysdeps/unix/sysv/linux/ x86_64/clone.S:115 > > > Any help to understand and correct what is going on would be nice. > > Regards > > Mathieu > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hi,
Yes you were right.
We may have made a manipulation mistake with gdb, We were able this time to
have the backtrace and to execute the commands you were asking for.
Here are the results:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f0357fa0700 (LWP 6892)]
0x00007f0358d27fd9 in delta_bind_bat (bat=0x31b4470, access=0, temp=0) at
bat_storage.c:166
166 bat_set_access(b, BAT_READ);
(gdb) bt
#0 0x00007f0358d27fd9 in delta_bind_bat (bat=0x31b4470, access=0, temp=0)
at bat_storage.c:166
#1 0x00007f0358d28128 in bind_col (tr=0x7f03480016a0, c=0x7f034805cec0,
access=0) at bat_storage.c:185
#2 0x00007f0358bfd8ac in sql_storage (cntxt=0x7f03591df328,
mb=0x7f03482abab0, stk=0x7f03483e81a0, pci=0x7f034837aeb0) at sql.c:4742
#3 0x00007f03628970a6 in runMALsequence (cntxt=0x7f03591df328,
mb=0x7f03482abab0, startpc=1, stoppc=0, stk=0x7f03483e81a0, env=0x0,
pcicaller=0x0) at mal_interpreter.c:631
#4 0x00007f0362896597 in callMAL (cntxt=0x7f03591df328, mb=0x7f03482abab0,
env=0x7f0357f9fa18, argv=0x7f0357f9faa0, debug=0 '\000') at
mal_interpreter.c:447
#5 0x00007f0358c05d40 in SQLexecutePrepared (c=0x7f03591df328,
be=0x7f03480caf10, q=0x7f034839e970) at sql_execute.c:328
#6 0x00007f0358c06144 in SQLengineIntern (c=0x7f03591df328,
be=0x7f03480caf10) at sql_execute.c:390
#7 0x00007f0358c04dd2 in SQLengine (c=0x7f03591df328) at
sql_scenario.c:1307
#8 0x00007f03628b491b in runPhase (c=0x7f03591df328, phase=4) at
mal_scenario.c:515
#9 0x00007f03628b4ae5 in runScenarioBody (c=0x7f03591df328) at
mal_scenario.c:560
#10 0x00007f03628b4bf4 in runScenario (c=0x7f03591df328) at
mal_scenario.c:579
#11 0x00007f03628b5b98 in MSserveClient (dummy=0x7f03591df328) at
mal_session.c:439
#12 0x00007f03628b5782 in MSscheduleClient (command=0x7f03480008d0 "\001",
challenge=0x7f0357f9fd70 "Fte2MIUw", fin=0x7f0348006b40,
fout=0x7f03480049c0) at mal_session.c:319
#13 0x00007f0362967c9f in doChallenge (data=0x7f03500008d0) at
mal_mapi.c:184
#14 0x00007f0362365007 in thread_starter (arg=0x7f0350000a50) at
gdk_system.c:458
#15 0x00007f036073aa51 in start_thread () from /lib64/libpthread.so.0
#16 0x00007f036048793d in clone () from /lib64/libc.so.6
(gdb) print *bat
$1 = {name = 0x31b44e0 "wifipass_nas_nasidentifier", bid = 2248, ibase =
97505, ibid = 2284, uibid = 1238, uvbid = 2284, cnt = 97505, ucnt = 0,
cached = 0x0, wtime = 1, next = 0x0}
(gdb) up
#1 0x00007f0358d28128 in bind_col (tr=0x7f03480016a0, c=0x7f034805cec0,
access=0) at bat_storage.c:185
185 return delta_bind_bat( c->data, access, isTemp(c));
(gdb) print *c
$2 = {base = {wtime = 0, rtime = 0, allocated = 0, flag = 0, id = 7334,
name = 0x7f034805cf40 "nasidentifier"}, type = {type = 0x2f23d10, digits =
45, scale = 0}, colnr = 19, null = 0 '\000', def = 0x0, unique = 0 '\000',
drop_action = 0, storage_type = 0x0, sorted = 0, dcount = 0, min = 0x0,
max = 0x0, t = 0x7f034805be20, data = 0x31b4470}
Regards
Mathieu
On Thu, Oct 22, 2015 at 6:26 PM, Ying Zhang
On Oct 22, 2015, at 18:04, Mathieu Raillard
wrote: Hi all,
As gdb isn't providing any stacktrace when segfault occurs,
Hai Mathieu,
Is this because you (probably accidentally) pressed the key “C” at some moment during the execution? Because in your GDB info. of yesterday morning, I see that GDB immediately continues after having received the SIGSEGV.
Would you please retry running MonetDB in GDB to see if GDB will stop at the SIGSEGV? Then execute the following commands in GD:
print *bat up print *c
Thanks!
Jennie
we ve launched mserver5 with valgrind. Here is the valgrind output when the server is crashing:
==1995== Thread 8: ==1995== Syscall param write(buf) points to uninitialised byte(s) ==1995== at 0x71D877D: ??? (in /lib64/libpthread-2.12.so) ==1995== by 0x562BD6A: GDKsave (gdk_storage.c:369) ==1995== by 0x552BF93: HEAPsave_intern (gdk_heap.c:708) ==1995== by 0x552BFD0: HEAPsave (gdk_heap.c:714) ==1995== by 0x57B561E: BATimprints (gdk_imprints.c:847) ==1995== by 0x54FD095: BAT_scanselect (gdk_select.c:910) ==1995== by 0x5504180: BATsubselect (gdk_select.c:1719) ==1995== by 0x4EE0D53: ALGsubselect2 (algebra.c:341) ==1995== by 0x4E76C22: malCommandCall (mal_interpreter.c:119) ==1995== by 0x4E790CA: runMALsequence (mal_interpreter.c:655) ==1995== by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376) ==1995== by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so) ==1995== Address 0x14724820 is 64 bytes inside a block of size 1,760 alloc'd ==1995== at 0x4C27A2E: malloc (vg_replace_malloc.c:270) ==1995== by 0x559286F: GDKmalloc_prefixsize (gdk_utils.c:641) ==1995== by 0x55928D8: GDKmallocmax (gdk_utils.c:667) ==1995== by 0x552A212: HEAPalloc (gdk_heap.c:105) ==1995== by 0x57B3F72: BATimprints (gdk_imprints.c:770) ==1995== by 0x54FD095: BAT_scanselect (gdk_select.c:910) ==1995== by 0x5504180: BATsubselect (gdk_select.c:1719) ==1995== by 0x4EE0D53: ALGsubselect2 (algebra.c:341) ==1995== by 0x4E76C22: malCommandCall (mal_interpreter.c:119) ==1995== by 0x4E790CA: runMALsequence (mal_interpreter.c:655) ==1995== by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376) ==1995== by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so) ==1995== ==1995== Thread 5: ==1995== Invalid read of size 8 ==1995== at 0x11713FD9: delta_bind_bat (bat_storage.c:166) ==1995== by 0x11714127: bind_col (bat_storage.c:185) ==1995== by 0x115E98AB: sql_storage (sql.c:4742) ==1995== by 0x4E790A5: runMALsequence (mal_interpreter.c:631) ==1995== by 0x4E78596: callMAL (mal_interpreter.c:447) ==1995== by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328) ==1995== by 0x115F2143: SQLengineIntern (sql_execute.c:390) ==1995== by 0x115F0DD1: SQLengine (sql_scenario.c:1307) ==1995== by 0x4E9691A: runPhase (mal_scenario.c:515) ==1995== by 0x4E96AE4: runScenarioBody (mal_scenario.c:560) ==1995== by 0x4E96BF3: runScenario (mal_scenario.c:579) ==1995== by 0x4E97B97: MSserveClient (mal_session.c:439) ==1995== Address 0x18 is not stack'd, malloc'd or (recently) free'd ==1995== ==1995== ==1995== Process terminating with default action of signal 11 (SIGSEGV) ==1995== Access not within mapped region at address 0x18 ==1995== at 0x11713FD9: delta_bind_bat (bat_storage.c:166) ==1995== by 0x11714127: bind_col (bat_storage.c:185) ==1995== by 0x115E98AB: sql_storage (sql.c:4742) ==1995== by 0x4E790A5: runMALsequence (mal_interpreter.c:631) ==1995== by 0x4E78596: callMAL (mal_interpreter.c:447) ==1995== by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328) ==1995== by 0x115F2143: SQLengineIntern (sql_execute.c:390) ==1995== by 0x115F0DD1: SQLengine (sql_scenario.c:1307) ==1995== by 0x4E9691A: runPhase (mal_scenario.c:515) ==1995== by 0x4E96AE4: runScenarioBody (mal_scenario.c:560) ==1995== by 0x4E96BF3: runScenario (mal_scenario.c:579) ==1995== by 0x4E97B97: MSserveClient (mal_session.c:439) ==1995== If you believe this happened as a result of a stack ==1995== overflow in your program's main thread (unlikely but ==1995== possible), you can try to increase the size of the ==1995== main thread stack using the --main-stacksize= flag. ==1995== The main thread stack size used in this run was 10485760. ==1995== ==1995== HEAP SUMMARY: ==1995== in use at exit: 41,189,795 bytes in 200,474 blocks ==1995== total heap usage: 279,871 allocs, 79,397 frees, 66,479,732 bytes allocated ==1995== ==1995== LEAK SUMMARY: ==1995== definitely lost: 2,432 bytes in 74 blocks ==1995== indirectly lost: 0 bytes in 0 blocks ==1995== possibly lost: 40,090,357 bytes in 200,294 blocks ==1995== still reachable: 1,097,006 bytes in 106 blocks ==1995== suppressed: 0 bytes in 0 blocks ==1995== Rerun with --leak-check=full to see details of leaked memory ==1995== ==1995== For counts of detected and suppressed errors, rerun with: -v ==1995== Use --track-origins=yes to see where uninitialised values come from ==1995== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 21 from 9)
Regards
On Wed, Oct 21, 2015 at 10:18 AM, Niels Nes
wrote: On Wed, Oct 21, 2015 at 09:53:31AM +0200, Mathieu Raillard wrote: Hi all,
We have managed to recompile MonetDB using the source code from MonetDB-11.21.5.tar.xz and with the option " --enable-debug" We also have a backup of the database (600MB) that can be provided for debugging purpose as the crash is perfectly reproducible. When running "select * from sys.storage();", here is the information gathered in gdb:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f3c24aad700 (LWP 46812)] 0x00007f3c25834fd9 in delta_bind_bat (bat=0x234b480, access=0, temp=0) at bat_storage.c:166 166 bat_set_access(b, BAT_READ); We need the back trace and probably from the calling function (ie above delta_bind_bat) you could print the column structure (print *c).
Niels
(gdb) Continuing. [Thread 0x7f3c25ce5700 (LWP 46809) exited] [Thread 0x7f3c24eaf700 (LWP 46810) exited] [Thread 0x7f3c24cae700 (LWP 46811) exited] [Thread 0x7f3c245ab700 (LWP 46817) exited] [Thread 0x7f3c247ac700 (LWP 46822) exited] [Thread 0x7f3c17fff700 (LWP 46823) exited] [Thread 0x7f3c24aad700 (LWP 46812) exited]
Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. (gdb) The program is not being run. (gdb) The program is not being run. (gdb) backtrace No stack. (gdb)
Regards
Mathieu
On Sun, Oct 18, 2015 at 6:03 PM, Ying Zhang
wrote: Hai Mathieu,
Thanks for using MonetDB, and sorry for the crash.
If possible, can you please give us the necessary data to reproduce the crash? They include: - the schema - a small set of (anonymised) data - the queries
You can also compile MonetDB from source with the --enable-debug option, so that GDB can give you the exact line where the crash has happend, and the value of the variable/statement/function/etc that has caused the crash.
Regards,
Jennie
> On Oct 15, 2015, at 17:50, Mathieu Raillard < mraillard@data-mat.fr> wrote: > > Hi all, > > We are using MoentDB since 6 months with quite good results. (thanks to the dev team for their good work) > > Unfortunately yesterday we came across an issue for which we can't find a solution. > Our database crashed for an unknown reason (RAM was fine, Disk space was fine) and now each time we re-use the same table with a query we used during the crash, the database crash with a seg fault > > Query : select * from storage(); > OS Version : CentOS release 6.7 (Final) > MonetDB Version : MonetDB Database Server Toolkit v1.1 (Jul2015) > > tail /var/log/messages : > kernel: mserver5[9945]: segfault at 18 ip 00007f9e86fd534b sp 00007f9e862912e0 error 4 in lib_sql.so[7f9e86eb1000+182000] > > merovingian.log : > database 'BI-DEV' (9972) was killed by signal SIGSEGV > > Log from gbd : > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffa24edc700 (LWP 4683)] > 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/ lib_sql.so > (gdb) bt > #0 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/ monetdb5/lib_sql.so > #1 0x00007ffa25b20fee in sql_storage () from /usr/lib64/monetdb5 /lib_sql.so > #2 0x00007ffa321aea45 in runMALsequence () from /usr/lib64/ libmonetdb5.so.19 > #3 0x00007ffa321afe29 in callMAL () from /usr/lib64/ libmonetdb5.so.19 > #4 0x00007ffa25b369f7 in SQLexecutePrepared () from /usr/lib64/ monetdb5/lib_sql.so > #5 0x00007ffa25b36efc in SQLengineIntern () from /usr/lib64/ monetdb5/lib_sql.so > #6 0x00007ffa321c87dd in runScenarioBody () from /usr/lib64/ libmonetdb5.so.19 > #7 0x00007ffa321c886f in runScenario () from /usr/lib64/ libmonetdb5.so.19 > #8 0x00007ffa321c9738 in MSserveClient () from /usr/lib64/ libmonetdb5.so.19 > #9 0x00007ffa321ca816 in MSscheduleClient () from /usr/lib64/ libmonetdb5.so.19 > #10 0x00007ffa32266e0f in doChallenge () from /usr/lib64/ libmonetdb5.so.19 > #11 0x00007ffa31d2adcf in thread_starter () from /usr/lib64/ libbat.so.12 > #12 0x00007ffa2f97da51 in start_thread (arg=0x7ffa24edc700) at pthread_create.c:301 > #13 0x00007ffa2f6ca93d in clone () at ../sysdeps/unix/sysv/linux/ x86_64/clone.S:115 > > > Any help to understand and correct what is going on would be nice. > > Regards > > Mathieu > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Hai Mathieu, Thanks for the GDB output. It shows that the seg. fault is because some important data BATs are missing. However, to find out how those data got lost, we have to have the queries. Can you please give us the following information so that we can try to reproduce the bug: - the CRAETE TABLE statements - a small set of (anonymised) data - the query that has triggered the seg. fault - probably also the previous queries before the seg. fault. If this involves confidential information, you can send it directly to me. With kind regards, Jennie
On Oct 23, 2015, at 10:22, Mathieu Raillard
wrote: Hi,
Yes you were right.
We may have made a manipulation mistake with gdb, We were able this time to have the backtrace and to execute the commands you were asking for. Here are the results:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f0357fa0700 (LWP 6892)] 0x00007f0358d27fd9 in delta_bind_bat (bat=0x31b4470, access=0, temp=0) at bat_storage.c:166 166 bat_set_access(b, BAT_READ); (gdb) bt #0 0x00007f0358d27fd9 in delta_bind_bat (bat=0x31b4470, access=0, temp=0) at bat_storage.c:166 #1 0x00007f0358d28128 in bind_col (tr=0x7f03480016a0, c=0x7f034805cec0, access=0) at bat_storage.c:185 #2 0x00007f0358bfd8ac in sql_storage (cntxt=0x7f03591df328, mb=0x7f03482abab0, stk=0x7f03483e81a0, pci=0x7f034837aeb0) at sql.c:4742 #3 0x00007f03628970a6 in runMALsequence (cntxt=0x7f03591df328, mb=0x7f03482abab0, startpc=1, stoppc=0, stk=0x7f03483e81a0, env=0x0, pcicaller=0x0) at mal_interpreter.c:631 #4 0x00007f0362896597 in callMAL (cntxt=0x7f03591df328, mb=0x7f03482abab0, env=0x7f0357f9fa18, argv=0x7f0357f9faa0, debug=0 '\000') at mal_interpreter.c:447 #5 0x00007f0358c05d40 in SQLexecutePrepared (c=0x7f03591df328, be=0x7f03480caf10, q=0x7f034839e970) at sql_execute.c:328 #6 0x00007f0358c06144 in SQLengineIntern (c=0x7f03591df328, be=0x7f03480caf10) at sql_execute.c:390 #7 0x00007f0358c04dd2 in SQLengine (c=0x7f03591df328) at sql_scenario.c:1307 #8 0x00007f03628b491b in runPhase (c=0x7f03591df328, phase=4) at mal_scenario.c:515 #9 0x00007f03628b4ae5 in runScenarioBody (c=0x7f03591df328) at mal_scenario.c:560 #10 0x00007f03628b4bf4 in runScenario (c=0x7f03591df328) at mal_scenario.c:579 #11 0x00007f03628b5b98 in MSserveClient (dummy=0x7f03591df328) at mal_session.c:439 #12 0x00007f03628b5782 in MSscheduleClient (command=0x7f03480008d0 "\001", challenge=0x7f0357f9fd70 "Fte2MIUw", fin=0x7f0348006b40, fout=0x7f03480049c0) at mal_session.c:319 #13 0x00007f0362967c9f in doChallenge (data=0x7f03500008d0) at mal_mapi.c:184 #14 0x00007f0362365007 in thread_starter (arg=0x7f0350000a50) at gdk_system.c:458 #15 0x00007f036073aa51 in start_thread () from /lib64/libpthread.so.0 #16 0x00007f036048793d in clone () from /lib64/libc.so.6 (gdb) print *bat $1 = {name = 0x31b44e0 "wifipass_nas_nasidentifier", bid = 2248, ibase = 97505, ibid = 2284, uibid = 1238, uvbid = 2284, cnt = 97505, ucnt = 0, cached = 0x0, wtime = 1, next = 0x0} (gdb) up #1 0x00007f0358d28128 in bind_col (tr=0x7f03480016a0, c=0x7f034805cec0, access=0) at bat_storage.c:185 185 return delta_bind_bat( c->data, access, isTemp(c)); (gdb) print *c $2 = {base = {wtime = 0, rtime = 0, allocated = 0, flag = 0, id = 7334, name = 0x7f034805cf40 "nasidentifier"}, type = {type = 0x2f23d10, digits = 45, scale = 0}, colnr = 19, null = 0 '\000', def = 0x0, unique = 0 '\000', drop_action = 0, storage_type = 0x0, sorted = 0, dcount = 0, min = 0x0, max = 0x0, t = 0x7f034805be20, data = 0x31b4470}
Regards
Mathieu
On Thu, Oct 22, 2015 at 6:26 PM, Ying Zhang
wrote: On Oct 22, 2015, at 18:04, Mathieu Raillard
wrote: Hi all,
As gdb isn't providing any stacktrace when segfault occurs,
Hai Mathieu,
Is this because you (probably accidentally) pressed the key “C” at some moment during the execution? Because in your GDB info. of yesterday morning, I see that GDB immediately continues after having received the SIGSEGV.
Would you please retry running MonetDB in GDB to see if GDB will stop at the SIGSEGV? Then execute the following commands in GD:
print *bat up print *c
Thanks!
Jennie
we ve launched mserver5 with valgrind. Here is the valgrind output when the server is crashing:
==1995== Thread 8: ==1995== Syscall param write(buf) points to uninitialised byte(s) ==1995== at 0x71D877D: ??? (in /lib64/libpthread-2.12.so) ==1995== by 0x562BD6A: GDKsave (gdk_storage.c:369) ==1995== by 0x552BF93: HEAPsave_intern (gdk_heap.c:708) ==1995== by 0x552BFD0: HEAPsave (gdk_heap.c:714) ==1995== by 0x57B561E: BATimprints (gdk_imprints.c:847) ==1995== by 0x54FD095: BAT_scanselect (gdk_select.c:910) ==1995== by 0x5504180: BATsubselect (gdk_select.c:1719) ==1995== by 0x4EE0D53: ALGsubselect2 (algebra.c:341) ==1995== by 0x4E76C22: malCommandCall (mal_interpreter.c:119) ==1995== by 0x4E790CA: runMALsequence (mal_interpreter.c:655) ==1995== by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376) ==1995== by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so) ==1995== Address 0x14724820 is 64 bytes inside a block of size 1,760 alloc'd ==1995== at 0x4C27A2E: malloc (vg_replace_malloc.c:270) ==1995== by 0x559286F: GDKmalloc_prefixsize (gdk_utils.c:641) ==1995== by 0x55928D8: GDKmallocmax (gdk_utils.c:667) ==1995== by 0x552A212: HEAPalloc (gdk_heap.c:105) ==1995== by 0x57B3F72: BATimprints (gdk_imprints.c:770) ==1995== by 0x54FD095: BAT_scanselect (gdk_select.c:910) ==1995== by 0x5504180: BATsubselect (gdk_select.c:1719) ==1995== by 0x4EE0D53: ALGsubselect2 (algebra.c:341) ==1995== by 0x4E76C22: malCommandCall (mal_interpreter.c:119) ==1995== by 0x4E790CA: runMALsequence (mal_interpreter.c:655) ==1995== by 0x4E7C5D2: DFLOWworker (mal_dataflow.c:376) ==1995== by 0x71D1A50: start_thread (in /lib64/libpthread-2.12.so) ==1995== ==1995== Thread 5: ==1995== Invalid read of size 8 ==1995== at 0x11713FD9: delta_bind_bat (bat_storage.c:166) ==1995== by 0x11714127: bind_col (bat_storage.c:185) ==1995== by 0x115E98AB: sql_storage (sql.c:4742) ==1995== by 0x4E790A5: runMALsequence (mal_interpreter.c:631) ==1995== by 0x4E78596: callMAL (mal_interpreter.c:447) ==1995== by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328) ==1995== by 0x115F2143: SQLengineIntern (sql_execute.c:390) ==1995== by 0x115F0DD1: SQLengine (sql_scenario.c:1307) ==1995== by 0x4E9691A: runPhase (mal_scenario.c:515) ==1995== by 0x4E96AE4: runScenarioBody (mal_scenario.c:560) ==1995== by 0x4E96BF3: runScenario (mal_scenario.c:579) ==1995== by 0x4E97B97: MSserveClient (mal_session.c:439) ==1995== Address 0x18 is not stack'd, malloc'd or (recently) free'd ==1995== ==1995== ==1995== Process terminating with default action of signal 11 (SIGSEGV) ==1995== Access not within mapped region at address 0x18 ==1995== at 0x11713FD9: delta_bind_bat (bat_storage.c:166) ==1995== by 0x11714127: bind_col (bat_storage.c:185) ==1995== by 0x115E98AB: sql_storage (sql.c:4742) ==1995== by 0x4E790A5: runMALsequence (mal_interpreter.c:631) ==1995== by 0x4E78596: callMAL (mal_interpreter.c:447) ==1995== by 0x115F1D3F: SQLexecutePrepared (sql_execute.c:328) ==1995== by 0x115F2143: SQLengineIntern (sql_execute.c:390) ==1995== by 0x115F0DD1: SQLengine (sql_scenario.c:1307) ==1995== by 0x4E9691A: runPhase (mal_scenario.c:515) ==1995== by 0x4E96AE4: runScenarioBody (mal_scenario.c:560) ==1995== by 0x4E96BF3: runScenario (mal_scenario.c:579) ==1995== by 0x4E97B97: MSserveClient (mal_session.c:439) ==1995== If you believe this happened as a result of a stack ==1995== overflow in your program's main thread (unlikely but ==1995== possible), you can try to increase the size of the ==1995== main thread stack using the --main-stacksize= flag. ==1995== The main thread stack size used in this run was 10485760. ==1995== ==1995== HEAP SUMMARY: ==1995== in use at exit: 41,189,795 bytes in 200,474 blocks ==1995== total heap usage: 279,871 allocs, 79,397 frees, 66,479,732 bytes allocated ==1995== ==1995== LEAK SUMMARY: ==1995== definitely lost: 2,432 bytes in 74 blocks ==1995== indirectly lost: 0 bytes in 0 blocks ==1995== possibly lost: 40,090,357 bytes in 200,294 blocks ==1995== still reachable: 1,097,006 bytes in 106 blocks ==1995== suppressed: 0 bytes in 0 blocks ==1995== Rerun with --leak-check=full to see details of leaked memory ==1995== ==1995== For counts of detected and suppressed errors, rerun with: -v ==1995== Use --track-origins=yes to see where uninitialised values come from ==1995== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 21 from 9)
Regards
On Wed, Oct 21, 2015 at 10:18 AM, Niels Nes
wrote: On Wed, Oct 21, 2015 at 09:53:31AM +0200, Mathieu Raillard wrote: Hi all,
We have managed to recompile MonetDB using the source code from MonetDB-11.21.5.tar.xz and with the option " --enable-debug" We also have a backup of the database (600MB) that can be provided for debugging purpose as the crash is perfectly reproducible. When running "select * from sys.storage();", here is the information gathered in gdb:
Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x7f3c24aad700 (LWP 46812)] 0x00007f3c25834fd9 in delta_bind_bat (bat=0x234b480, access=0, temp=0) at bat_storage.c:166 166 bat_set_access(b, BAT_READ); We need the back trace and probably from the calling function (ie above delta_bind_bat) you could print the column structure (print *c).
Niels
(gdb) Continuing. [Thread 0x7f3c25ce5700 (LWP 46809) exited] [Thread 0x7f3c24eaf700 (LWP 46810) exited] [Thread 0x7f3c24cae700 (LWP 46811) exited] [Thread 0x7f3c245ab700 (LWP 46817) exited] [Thread 0x7f3c247ac700 (LWP 46822) exited] [Thread 0x7f3c17fff700 (LWP 46823) exited] [Thread 0x7f3c24aad700 (LWP 46812) exited]
Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. (gdb) The program is not being run. (gdb) The program is not being run. (gdb) backtrace No stack. (gdb)
Regards
Mathieu
On Sun, Oct 18, 2015 at 6:03 PM, Ying Zhang
wrote: Hai Mathieu,
Thanks for using MonetDB, and sorry for the crash.
If possible, can you please give us the necessary data to reproduce the crash? They include: - the schema - a small set of (anonymised) data - the queries
You can also compile MonetDB from source with the --enable-debug option, so that GDB can give you the exact line where the crash has happend, and the value of the variable/statement/function/etc that has caused the crash.
Regards,
Jennie
> On Oct 15, 2015, at 17:50, Mathieu Raillard < mraillard@data-mat.fr> wrote: > > Hi all, > > We are using MoentDB since 6 months with quite good results. (thanks to the dev team for their good work) > > Unfortunately yesterday we came across an issue for which we can't find a solution. > Our database crashed for an unknown reason (RAM was fine, Disk space was fine) and now each time we re-use the same table with a query we used during the crash, the database crash with a seg fault > > Query : select * from storage(); > OS Version : CentOS release 6.7 (Final) > MonetDB Version : MonetDB Database Server Toolkit v1.1 (Jul2015) > > tail /var/log/messages : > kernel: mserver5[9945]: segfault at 18 ip 00007f9e86fd534b sp 00007f9e862912e0 error 4 in lib_sql.so[7f9e86eb1000+182000] > > merovingian.log : > database 'BI-DEV' (9972) was killed by signal SIGSEGV > > Log from gbd : > > Program received signal SIGSEGV, Segmentation fault. > [Switching to Thread 0x7ffa24edc700 (LWP 4683)] > 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/monetdb5/ lib_sql.so > (gdb) bt > #0 0x00007ffa25c1f34b in delta_bind_bat () from /usr/lib64/ monetdb5/lib_sql.so > #1 0x00007ffa25b20fee in sql_storage () from /usr/lib64/monetdb5 /lib_sql.so > #2 0x00007ffa321aea45 in runMALsequence () from /usr/lib64/ libmonetdb5.so.19 > #3 0x00007ffa321afe29 in callMAL () from /usr/lib64/ libmonetdb5.so.19 > #4 0x00007ffa25b369f7 in SQLexecutePrepared () from /usr/lib64/ monetdb5/lib_sql.so > #5 0x00007ffa25b36efc in SQLengineIntern () from /usr/lib64/ monetdb5/lib_sql.so > #6 0x00007ffa321c87dd in runScenarioBody () from /usr/lib64/ libmonetdb5.so.19 > #7 0x00007ffa321c886f in runScenario () from /usr/lib64/ libmonetdb5.so.19 > #8 0x00007ffa321c9738 in MSserveClient () from /usr/lib64/ libmonetdb5.so.19 > #9 0x00007ffa321ca816 in MSscheduleClient () from /usr/lib64/ libmonetdb5.so.19 > #10 0x00007ffa32266e0f in doChallenge () from /usr/lib64/ libmonetdb5.so.19 > #11 0x00007ffa31d2adcf in thread_starter () from /usr/lib64/ libbat.so.12 > #12 0x00007ffa2f97da51 in start_thread (arg=0x7ffa24edc700) at pthread_create.c:301 > #13 0x00007ffa2f6ca93d in clone () at ../sysdeps/unix/sysv/linux/ x86_64/clone.S:115 > > > Any help to understand and correct what is going on would be nice. > > Regards > > Mathieu > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Niels Nes, Manager ITF, Centrum Wiskunde & Informatica (CWI) Science Park 123, 1098 XG Amsterdam, The Netherlands room L3.14, phone ++31 20 592-4098 sip:4098@sip.cwi.nl url: https://www.cwi.nl/people/niels e-mail: Niels.Nes@cwi.nl _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
participants (3)
-
Mathieu Raillard
-
Niels Nes
-
Ying Zhang