[MonetDB-users] merovingian locking up - port issue?
I'm having an issue with merovingian locking up. This is the typical scenario. - Start mero and my db with ./merovingian sobi - Everything works great for anywhere between 50 & 100 queries, then mero stops responding (I'm using python, jdbc & php) - I can still connect to the db from the command line using ./mclient -d sobi -lsql -p 50001 - both merovingian and mserver5 processes continue to run, but mero will end up dying within 1-2 minutes by itself leaving the mserver5 process running - even when mero dies, it appears that all the ports are still in use. This is the result of netstat -a AFTER mero is dead: tcp 0 0 *:50000 *:* LISTEN tcp 0 0 *:50001 *:* LISTEN tcp 0 0 mayfair.local:50000 mayfair.local:48973 ESTABLISHED tcp 0 0 mayfair.local:48973 mayfair.local:50000 ESTABLISHED udp 0 0 *:50000 *:* - when I try to restart mero to reattach it to the database it fails with this error: ./merovingian: binding to stream socket port 50000 failed: Address already in use - stopping the db doesn't work becase mero isn't running. ./monetdb stop sobi warning: MonetDB Database Server is not running stop: cannot perform: MonetDB Database Server (merovingian) is not running - the only thing left to do is kill the mserver5 process and restart both mero and the db. Unfortunately the merovingian.log file doesn't tell me anything. Any other tips for helping troubleshoot this issue? The version is 5.7.0 and was built from the Aug 23rd nightly. Many Thanks, Ross
Hi Ross, Thanks for your report. I think merovingian in your situation just crashes, for some yet to be determined reason. I do not recall how you build/install MonetDB, but if you build from source, it could help me if you could build MonetDB/SQL with --enable-debug, and attach gdb to the merovingian process *after* starting it. This is necessary since merovingian forks itself into the background. Just to give me an indication, of how to reproduce: can you send the merovingian settings from your monetdb5.conf file, and does the same crash occur if your "queries" are a simple "select 1;"? On 20-09-2008 16:45:48 -0500, Ross Bates wrote:
I'm having an issue with merovingian locking up. This is the typical scenario.
- Start mero and my db with ./merovingian sobi
- Everything works great for anywhere between 50 & 100 queries, then mero stops responding (I'm using python, jdbc & php)
- I can still connect to the db from the command line using ./mclient -d sobi -lsql -p 50001
- both merovingian and mserver5 processes continue to run, but mero will end up dying within 1-2 minutes by itself leaving the mserver5 process running
- even when mero dies, it appears that all the ports are still in use. This is the result of netstat -a AFTER mero is dead:
tcp 0 0 *:50000 *:* LISTEN tcp 0 0 *:50001 *:* LISTEN tcp 0 0 mayfair.local:50000 mayfair.local:48973 ESTABLISHED tcp 0 0 mayfair.local:48973 mayfair.local:50000 ESTABLISHED udp 0 0 *:50000 *:*
- when I try to restart mero to reattach it to the database it fails with this error:
./merovingian: binding to stream socket port 50000 failed: Address already in use
- stopping the db doesn't work becase mero isn't running.
./monetdb stop sobi
warning: MonetDB Database Server is not running stop: cannot perform: MonetDB Database Server (merovingian) is not running
- the only thing left to do is kill the mserver5 process and restart both mero and the db.
Unfortunately the merovingian.log file doesn't tell me anything.
Any other tips for helping troubleshoot this issue? The version is 5.7.0 and was built from the Aug 23rd nightly.
Many Thanks, Ross
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
HI Fabian -
I haven't made any changes to the default settings for merovinigian, here
they are without comments:
---------------
mero_msglog=${prefix}/var/log/merovingian.log
mero_errlog=${prefix}/var/log/merovingian.log
mero_timeinterval=600
mero_pidfile=${prefix}/var/run/merovingian.pid
#mero_port=50000
mero_exittimeout=7
mero_doproxy=yes
mero_discoveryttl=600
---------------
As for your question about the type of query, unfortunately I can't
reproduce the lockup in a certain number of steps or with a certain query.
The queries are dynamically generated by a script, but they are all simple
like "select col1, col2 from table where col1='foo'".
After I recompile with --enable-debug, how do use gdb to monitor the
process? If I run the following command can I just leave the gdb attach
process running until it crashes?
gdb
attach pid
Thanks,
Ross
On Sun, Sep 21, 2008 at 2:44 AM, Fabian Groffen
Hi Ross,
Thanks for your report. I think merovingian in your situation just crashes, for some yet to be determined reason.
I do not recall how you build/install MonetDB, but if you build from source, it could help me if you could build MonetDB/SQL with --enable-debug, and attach gdb to the merovingian process *after* starting it. This is necessary since merovingian forks itself into the background.
Just to give me an indication, of how to reproduce: can you send the merovingian settings from your monetdb5.conf file, and does the same crash occur if your "queries" are a simple "select 1;"?
On 20-09-2008 16:45:48 -0500, Ross Bates wrote:
I'm having an issue with merovingian locking up. This is the typical scenario.
- Start mero and my db with ./merovingian sobi
- Everything works great for anywhere between 50 & 100 queries, then mero stops responding (I'm using python, jdbc & php)
- I can still connect to the db from the command line using ./mclient -d sobi -lsql -p 50001
- both merovingian and mserver5 processes continue to run, but mero will end up dying within 1-2 minutes by itself leaving the mserver5 process running
- even when mero dies, it appears that all the ports are still in use. This is the result of netstat -a AFTER mero is dead:
tcp 0 0 *:50000 *:* LISTEN tcp 0 0 *:50001 *:* LISTEN tcp 0 0 mayfair.local:50000 mayfair.local:48973 ESTABLISHED tcp 0 0 mayfair.local:48973 mayfair.local:50000 ESTABLISHED udp 0 0 *:50000 *:*
- when I try to restart mero to reattach it to the database it fails with this error:
./merovingian: binding to stream socket port 50000 failed: Address already in use
- stopping the db doesn't work becase mero isn't running.
./monetdb stop sobi
warning: MonetDB Database Server is not running stop: cannot perform: MonetDB Database Server (merovingian) is not running
- the only thing left to do is kill the mserver5 process and restart both mero and the db.
Unfortunately the merovingian.log file doesn't tell me anything.
Any other tips for helping troubleshoot this issue? The version is 5.7.0 and was built from the Aug 23rd nightly.
Many Thanks, Ross
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On 21-09-2008 16:40:22 -0500, Ross Bates wrote:
I haven't made any changes to the default settings for merovinigian, here they are without comments:
Ok, thanks.
As for your question about the type of query, unfortunately I can't reproduce the lockup in a certain number of steps or with a certain query. The queries are dynamically generated by a script, but they are all simple like "select col1, col2 from table where col1='foo'".
After I recompile with --enable-debug, how do use gdb to monitor the process? If I run the following command can I just leave the gdb attach process running until it crashes?
gdb attach pid
yep, like that. Are you using stable or current? I'll try to reproduce it so I can debug and fix myself.
I think I at least identified the problem. Thanks to my Solaris OS, I can see how many threads are being used. And with a quick stress-test, my merovingian now already has 6000 threads in use, which feels like something is leaking :) We hunted for this on Linux, but couldn't find anything, as we only saw a bit of memory leaking. Now on Solaris I can clearly see the threads are leaking. On 22-09-2008 23:05:55 +0200, Fabian Groffen wrote:
On 21-09-2008 16:40:22 -0500, Ross Bates wrote:
I haven't made any changes to the default settings for merovinigian, here they are without comments:
Ok, thanks.
As for your question about the type of query, unfortunately I can't reproduce the lockup in a certain number of steps or with a certain query. The queries are dynamically generated by a script, but they are all simple like "select col1, col2 from table where col1='foo'".
After I recompile with --enable-debug, how do use gdb to monitor the process? If I run the following command can I just leave the gdb attach process running until it crashes?
gdb attach pid
yep, like that.
Are you using stable or current? I'll try to reproduce it so I can debug and fix myself.
On 30-09-2008 14:54:59 +0200, Fabian Groffen wrote:
I think I at least identified the problem. Thanks to my Solaris OS, I can see how many threads are being used. And with a quick stress-test, my merovingian now already has 6000 threads in use, which feels like something is leaking :)
We hunted for this on Linux, but couldn't find anything, as we only saw a bit of memory leaking. Now on Solaris I can clearly see the threads are leaking.
In case it suits your environment, you can workaround the bug by setting mero_doproxy=no in your monetdb5.conf file.
That's good news you were able to identify the potential root cause.
Is it possible that the merovinigan is trying to reuse ports from the
connection threads that are leaking?
On Tue, Sep 30, 2008 at 7:54 AM, Fabian Groffen
I think I at least identified the problem. Thanks to my Solaris OS, I can see how many threads are being used. And with a quick stress-test, my merovingian now already has 6000 threads in use, which feels like something is leaking :)
We hunted for this on Linux, but couldn't find anything, as we only saw a bit of memory leaking. Now on Solaris I can clearly see the threads are leaking.
On 22-09-2008 23:05:55 +0200, Fabian Groffen wrote:
On 21-09-2008 16:40:22 -0500, Ross Bates wrote:
I haven't made any changes to the default settings for merovinigian, here they are without comments:
Ok, thanks.
As for your question about the type of query, unfortunately I can't reproduce the lockup in a certain number of steps or with a certain query. The queries are dynamically generated by a script, but they are all simple like "select col1, col2 from table where col1='foo'".
After I recompile with --enable-debug, how do use gdb to monitor the process? If I run the following command can I just leave the gdb attach process running until it crashes?
gdb attach pid
yep, like that.
Are you using stable or current? I'll try to reproduce it so I can debug and fix myself.
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ MonetDB-users mailing list MonetDB-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/monetdb-users
On 30-09-2008 16:31:02 -0500, Ross Bates wrote:
That's good news you were able to identify the potential root cause. Is it possible that the merovinigan is trying to reuse ports from the connection threads that are leaking?
Nope, I did some debugging, but we close all threads, however one seems to leak upon every connection. I think the next thing to do is trying to figure out in the debugger what the left over thread is doing. Perhaps it's waiting for some condition somehow.
On 01-10-2008 09:06:40 +0200, Fabian Groffen wrote:
On 30-09-2008 16:31:02 -0500, Ross Bates wrote:
That's good news you were able to identify the potential root cause. Is it possible that the merovinigan is trying to reuse ports from the connection threads that are leaking?
Nope, I did some debugging, but we close all threads, however one seems to leak upon every connection. I think the next thing to do is trying to figure out in the debugger what the left over thread is doing. Perhaps it's waiting for some condition somehow.
I found the problem and just committed the fix to the Stable branch. Due to some PEBKAC we leaked a thread after each proxied connection.
participants (2)
-
Fabian Groffen
-
Ross Bates