[Monetdb-developers] server locks up on reboot attempt anytime after running mclient
I am running a server instance under Amazon EC2 with MonetDB 5 (latest released version) installed. If I boot up the instance and do NOT start or use Monetdb, then I can issue a reboot command at any time, and the system reboots just fine and comes back up. However if I issue the commands: mkdir -p /mnt/MonetDB5/dbfarm merovingian monetdb create demo monetdb start demo mclient -lsql --time --database=demo sql> \q Then when I go to reboot I get the following output in the console and then things hang, and the instance never reboots. Here is the console output I get (more notes on the issue following this output): INIT: Switching to runlevel: 6 INIT: Sending processes the TERM signal Stopping ConsoleKit: [ OK ] Stopping sshd: [ OK ] Stopping crond: [ OK ] Stopping system message bus: [ OK ] Shutting down kernel logger: [ OK ] Shutting down system logger: [ OK ] Shutting down interface eth0: [ OK ] Shutting down loopback interface: [ OK ] iptables: Flushing firewall rules: [ OK ] iptables: Setting chains to policy ACCEPT: filter [ OK ] iptables: Unloading modules: [ OK ] Starting killall: [ OK ] Sending all processes the TERM signal... ------------[ cut here ]------------ kernel BUG at include/linux/tracehook.h:369! invalid opcode: 0000 [#1] SMP last sysfs file: /class/net/sit0/address Modules linked in: sit(U) tunnel4(U) fuse(U) ipv6(U) dm_mirror(U) dm_multipath(U) dm_mod(U) pcspkr(U) ext3(U) jbd(U) mbcache(U) uhci_hcd(U) ohci_hcd(U) ehci_hcd(U) xenblk(U) xennet(U) CPU: 0 EIP: 0061:[<c10254df>] Not tainted VLI EFLAGS: 00210087 (2.6.21.7-2.fc8xen #1) EIP is at release_task+0x38/0x2f1 eax: c12ff000 ebx: c2b9f450 ecx: f5416000 edx: 00000000 esi: c2b9f450 edi: 00000000 ebp: 00000001 esp: ed207e68 ds: 007b es: 007b fs: 00d8 gs: 0000 ss: 0069 Process mserver5 (pid: 1124, ti=ed207000 task=c1404910 task.ti=ed207000) Stack: 00000020 c1404910 00000000 ed7c15b0 c1026918 00000009 00000008 0007b6c0 c1404d64 c1404d64 c14049c4 00000000 c14f30e4 c188bac0 00000000 ed207fb8 c10269fb c14f30e4 00000009 b438a364 c102dcb2 00000000 00000000 00000000 Call Trace: [<c1026918>] do_exit+0x6d6/0x730 [<c10269fb>] sys_exit_group+0x0/0xd [<c102dcb2>] get_signal_to_deliver+0x3d2/0x414 [<c1004be7>] do_notify_resume+0x8c/0x6e6 [<c12038ae>] do_page_fault+0x7a1/0xc24 [<c10e48a8>] copy_to_user+0x3c/0x50 [<c107d766>] sys_select+0x161/0x187 [<c1005765>] work_notifysig+0x13/0x1a ======================= Code: 98 04 00 00 00 74 07 89 f0 e8 e6 98 02 00 8b 86 80 01 00 00 90 ff 48 04 b8 00 f0 2f c1 e8 c5 c7 1d 00 83 be 90 00 00 00 20 74 04 <0f> 0b eb fe 83 be 98 04 00 00 00 74 27 8d 86 a8 04 00 00 e8 ee EIP: [<c10254df>] release_task+0x38/0x2f1 SS:ESP 0069:ed207e68 Fixing recursive fault but reboot is needed! --------------------------------------------------- Note that if I run these same commands mkdir -p /mnt/MonetDB5/dbfarm merovingian monetdb create demo monetdb start demo # OMIT THIS TIME mclient -lsql --time --database=demo But if I DO NOT run mclient, then the problem does not occur and it will reboot fine. However once I run mclient then I am guaranteed to lock up on a reboot with the console output as shown above. What is really bad about this issue in particular is that on EC2 if an instance will not reboot it needs to be terminated. And when terminated all data on the instance is completely detroyed! So I do not get a second chance - once the server is locked out like this it has to be destroyed and a new one built. This is running on 32 bit Fedora Core 8. What is causing this and how can I fix it? Thanks.
On 11-06-2008 14:03:52 -0700, Rt Ibmer wrote:
Sending all processes the TERM signal... ------------[ cut here ]------------ kernel BUG at include/linux/tracehook.h:369!
This sounds indicative enough to me that it is not our software at fault. However, what would be interesting to know is if when you kill merovingian manually (as in: not in the shutdown sequence) does the same problem happen?
This sounds indicative enough to me that it is not our software at fault. However, what would be interesting to know is if when you kill merovingian manually (as in: not in the shutdown sequence) does the same problem happen?
It may have indeed been a bug in MonetDB because when I use the stable nightly build the issue is no longer present. HTH.
participants (2)
-
Fabian Groffen
-
Rt Ibmer