New subject: [Monetdb-developers] server locks up on reboot attempt anytime after running mclient

11 Jun 2008

      I am running a server instance under Amazon EC2 with MonetDB 5 (latest released version) installed.

If I boot up the instance and do NOT start or use Monetdb, then I can issue a reboot command at any time, and the system reboots just fine and comes back up.

However if I issue the commands:

mkdir -p /mnt/MonetDB5/dbfarm
merovingian
monetdb create demo
monetdb start demo
mclient -lsql --time --database=demo
sql> \q

Then when I go to reboot I get the following output in the console and then things hang, and the instance never reboots. Here is the console output I get (more notes on the issue following this output):

INIT: Switching to runlevel: 6
INIT: Sending processes the TERM signal
Stopping ConsoleKit: [  OK  ]
Stopping sshd: [  OK  ]
Stopping crond: [  OK  ]
Stopping system message bus: [  OK  ]
Shutting down kernel logger: [  OK  ]
Shutting down system logger: [  OK  ]
Shutting down interface eth0:  [  OK  ]
Shutting down loopback interface:  [  OK  ]
iptables: Flushing firewall rules: [  OK  ]
iptables: Setting chains to policy ACCEPT: filter [  OK  ]
iptables: Unloading modules: [  OK  ]
Starting killall:  [  OK  ]
Sending all processes the TERM signal... ------------[ cut here ]------------
kernel BUG at include/linux/tracehook.h:369!
invalid opcode: 0000 [#1]
SMP
last sysfs file: /class/net/sit0/address
Modules linked in: sit(U) tunnel4(U) fuse(U) ipv6(U) dm_mirror(U) dm_multipath(U) dm_mod(U) pcspkr(U) ext3(U) jbd(U) mbcache(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) xenblk(U) xennet(U)
CPU:    0
EIP:    0061:[<c10254df>]    Not tainted VLI
EFLAGS: 00210087   (2.6.21.7-2.fc8xen #1)
EIP is at release_task+0x38/0x2f1
eax: c12ff000   ebx: c2b9f450   ecx: f5416000   edx: 00000000
esi: c2b9f450   edi: 00000000   ebp: 00000001   esp: ed207e68
ds: 007b   es: 007b   fs: 00d8  gs: 0000  ss: 0069
Process mserver5 (pid: 1124, ti=ed207000 task=c1404910 task.ti=ed207000)
Stack: 00000020 c1404910 00000000 ed7c15b0 c1026918 00000009 00000008 0007b6c0
       c1404d64 c1404d64 c14049c4 00000000 c14f30e4 c188bac0 00000000 ed207fb8
       c10269fb c14f30e4 00000009 b438a364 c102dcb2 00000000 00000000 00000000
Call Trace:
 [<c1026918>] do_exit+0x6d6/0x730
 [<c10269fb>] sys_exit_group+0x0/0xd
 [<c102dcb2>] get_signal_to_deliver+0x3d2/0x414
 [<c1004be7>] do_notify_resume+0x8c/0x6e6
 [<c12038ae>] do_page_fault+0x7a1/0xc24
 [<c10e48a8>] copy_to_user+0x3c/0x50
 [<c107d766>] sys_select+0x161/0x187
 [<c1005765>] work_notifysig+0x13/0x1a
 =======================
Code: 98 04 00 00 00 74 07 89 f0 e8 e6 98 02 00 8b 86 80 01 00 00 90 ff 48 04 b8 00 f0 2f c1 e8 c5 c7 1d 00 83 be 90 00 00 00 20 74 04 <0f> 0b eb fe 83 be 98 04 00 00 00 74 27 8d 86 a8 04 00 00 e8 ee
EIP: [<c10254df>] release_task+0x38/0x2f1 SS:ESP 0069:ed207e68
Fixing recursive fault but reboot is needed!
---------------------------------------------------

Note that if I run these same commands

mkdir -p /mnt/MonetDB5/dbfarm
merovingian
monetdb create demo
monetdb start demo
# OMIT THIS TIME mclient -lsql --time --database=demo

But if I DO NOT run mclient, then the problem does not occur and it will reboot fine. However once I run mclient then I am guaranteed to lock up on a reboot with the console output as shown above.

What is really bad about this issue in particular is that on EC2 if an instance will not reboot it needs to be terminated.  And when terminated all data on the instance is completely detroyed!  So I do not get a second chance - once the server is locked out like this it has to be destroyed and a new one built.

This is running on 32 bit Fedora Core 8.

What is causing this and how can I fix it? Thanks.

[Monetdb-developers] server locks up on reboot attempt anytime after running mclient

Rt Ibmer

Fabian Groffen

Rt Ibmer

tags

participants (2)