[MonetDB-users] merovingian : could not retrieve uplog information ... Too many open files
Environment MSG BUSINESS[7403]: # MonetDB server v5.4.1, based on kernel v1.22.1 MSG BUSINESS[7403]: # Serving database 'BUSINESS' MSG BUSINESS[7403]: # Compiled for x86_64-unknown-linux-gnu/64bit with 64bit OIDs dynamically linked During the course of software development using MonetDB for our application database, after running for a week or ten days without incident, the merovingian daemon will suddenly stop accepting connections. The merovingian log shows MSG merovingian[19349]: database 'BUSINESS' already running since 2008-04-11 14:42:27, up min/avg/max: 0/0/0, crash average: 0.00 0.00 0.00 (1-0=0) MSG merovingian[19349]: redirecting client 192.168.1.1:29378 for database 'BUSINESS' to mapi:monetdb://dev04:50001/ ERR merovingian[19349]: client error: could not retrieve uplog information: IOException:sabaoth.getUplogInfo:unable to open file /monetdb/var/MonetDB5/dbfarm/BUSINESS/.uplog: Too many open files ERR merovingian[19349]: client error: IOException:sabaoth.getStatus:unable to open directory /monetdb/var/MonetDB5/dbfarm: Too many open files # examine the .uplog file ? dev04:/monetdb/var/MonetDB5/dbfarm/BUSINESS # cat .uplog 1207290718 1207844298 1207844305 1207846979 1207846986 1208461825 1208461935 1208463257 1208463466 There is lots of space on the disk. /monetdb status shows the database is up and running netstat shows the daemon and databases listening on their ports Although it is not possible to connect to the database via the daemon, we can connect using mclient, # connect locally using mclient /mclient -dBUSINESS -umonetdb -Pmonetdb -lsql sql> And we can run, via mclient, the same sort of queries that were being run when the daemon stopped accepting connections. It is difficult to determine the exact query that was run as the application is generating lots of different queries frequently. And we can connect to the database from other clients on the network by connecting directly to the database, bypassing the daemon (which is what we are doing for now)
I should add we are connecting via the JDBC driver, version 1.7, using the driver jar downloaded from the MonetDB site. On Thursday 17 April 2008 16:41:18 McKennirey.Matthew wrote:
Environment
MSG BUSINESS[7403]: # MonetDB server v5.4.1, based on kernel v1.22.1 MSG BUSINESS[7403]: # Serving database 'BUSINESS' MSG BUSINESS[7403]: # Compiled for x86_64-unknown-linux-gnu/64bit with 64bit OIDs dynamically linked
During the course of software development using MonetDB for our application database, after running for a week or ten days without incident, the merovingian daemon will suddenly stop accepting connections.
The merovingian log shows
MSG merovingian[19349]: database 'BUSINESS' already running since 2008-04-11 14:42:27, up min/avg/max: 0/0/0, crash average: 0.00 0.00 0.00 (1-0=0) MSG merovingian[19349]: redirecting client 192.168.1.1:29378 for database 'BUSINESS' to mapi:monetdb://dev04:50001/ ERR merovingian[19349]: client error: could not retrieve uplog information: IOException:sabaoth.getUplogInfo:unable to open file /monetdb/var/MonetDB5/dbfarm/BUSINESS/.uplog: Too many open files ERR merovingian[19349]: client error: IOException:sabaoth.getStatus:unable to open directory /monetdb/var/MonetDB5/dbfarm: Too many open files
# examine the .uplog file ? dev04:/monetdb/var/MonetDB5/dbfarm/BUSINESS # cat .uplog 1207290718 1207844298 1207844305 1207846979 1207846986 1208461825 1208461935 1208463257 1208463466
There is lots of space on the disk.
/monetdb status shows the database is up and running
netstat shows the daemon and databases listening on their ports
Although it is not possible to connect to the database via the daemon, we can connect using mclient,
# connect locally using mclient /mclient -dBUSINESS -umonetdb -Pmonetdb -lsql sql>
And we can run, via mclient, the same sort of queries that were being run when the daemon stopped accepting connections. It is difficult to determine the exact query that was run as the application is generating lots of different queries frequently.
And we can connect to the database from other clients on the network by connecting directly to the database, bypassing the daemon (which is what we are doing for now)
Hi Matthew, (you found a bug) On 17-04-2008 16:41:18 -0400, McKennirey.Matthew wrote:
During the course of software development using MonetDB for our application database, after running for a week or ten days without incident, the merovingian daemon will suddenly stop accepting connections.
The merovingian log shows
MSG merovingian[19349]: database 'BUSINESS' already running since 2008-04-11 14:42:27, up min/avg/max: 0/0/0, crash average: 0.00 0.00 0.00 (1-0=0) MSG merovingian[19349]: redirecting client 192.168.1.1:29378 for database 'BUSINESS' to mapi:monetdb://dev04:50001/ ERR merovingian[19349]: client error: could not retrieve uplog information: IOException:sabaoth.getUplogInfo:unable to open file /monetdb/var/MonetDB5/dbfarm/BUSINESS/.uplog: Too many open files ERR merovingian[19349]: client error: IOException:sabaoth.getStatus:unable to open directory /monetdb/var/MonetDB5/dbfarm: Too many open files
This indicates Merovingian or Sabaoth leaks filedescriptors. After your 10 days it has reached the limit of maximum open files allowed by your OS. Kudos for me that Merovingian doesn't crash but properly reports this. Bad karma for me that I leak the filedescriptors somewhere, hence the too many opened files.
There is lots of space on the disk.
It is related to the setings of `limit`, i.e. the "descriptors" setting.
/monetdb status shows the database is up and running
This is a new process, and hence can open files
Although it is not possible to connect to the database via the daemon, we can connect using mclient,
Correct, the database is just still running without problems.
And we can connect to the database from other clients on the network by connecting directly to the database, bypassing the daemon (which is what we are doing for now)
Restart Merovingian, and you should be able to go for roughly a week. In the meanwhile I'll try to hunt down this bug.
On 18-04-2008 09:37:19 +0200, Fabian Groffen wrote:
Hi Matthew, (you found a bug)
I just did an attempt to fix this in CVS stable. I'm not sure but I thought you use "current" at the moment, which means you need to wait for propagation to occur before you can try out my fix. My wannabe fix plugs a hole in code that's called upon each client connection. I'll stresstest merovingian here using a simple bash script and mclient connecting in a loop, but I'd be pleased if you could test it in your development environment which is less artificial.
This indicates Merovingian or Sabaoth leaks filedescriptors. After your 10 days it has reached the limit of maximum open files allowed by your OS. Kudos for me that Merovingian doesn't crash but properly reports this. Bad karma for me that I leak the filedescriptors somewhere, hence the too many opened files.
Thanks very much. We have installed the 2008-04-21 --nightly=current, and are now running, MonetDB server v5.5.0, based on kernel v1.23.0 Serving database 'BUSINESS' Compiled for x86_64-unknown-linux-gnu/64bit with 64bit OIDs dynamically linked We have also updated to the JDBC 1.7 (Canephora_p2 20080418 based on MCL v1.3) driver, which seems to fix unrelated connection issues. All is well for now, thanks again. On Friday 18 April 2008 04:27:26 Fabian Groffen wrote:
On 18-04-2008 09:37:19 +0200, Fabian Groffen wrote:
Hi Matthew, (you found a bug)
I just did an attempt to fix this in CVS stable. I'm not sure but I thought you use "current" at the moment, which means you need to wait for propagation to occur before you can try out my fix.
My wannabe fix plugs a hole in code that's called upon each client connection. I'll stresstest merovingian here using a simple bash script and mclient connecting in a loop, but I'd be pleased if you could test it in your development environment which is less artificial.
This indicates Merovingian or Sabaoth leaks filedescriptors. After your 10 days it has reached the limit of maximum open files allowed by your OS. Kudos for me that Merovingian doesn't crash but properly reports this. Bad karma for me that I leak the filedescriptors somewhere, hence the too many opened files.
participants (2)
-
Fabian Groffen
-
McKennirey.Matthew