Hi Matthew, (you found a bug) On 17-04-2008 16:41:18 -0400, McKennirey.Matthew wrote:
During the course of software development using MonetDB for our application database, after running for a week or ten days without incident, the merovingian daemon will suddenly stop accepting connections.
The merovingian log shows
MSG merovingian[19349]: database 'BUSINESS' already running since 2008-04-11 14:42:27, up min/avg/max: 0/0/0, crash average: 0.00 0.00 0.00 (1-0=0) MSG merovingian[19349]: redirecting client 192.168.1.1:29378 for database 'BUSINESS' to mapi:monetdb://dev04:50001/ ERR merovingian[19349]: client error: could not retrieve uplog information: IOException:sabaoth.getUplogInfo:unable to open file /monetdb/var/MonetDB5/dbfarm/BUSINESS/.uplog: Too many open files ERR merovingian[19349]: client error: IOException:sabaoth.getStatus:unable to open directory /monetdb/var/MonetDB5/dbfarm: Too many open files
This indicates Merovingian or Sabaoth leaks filedescriptors. After your 10 days it has reached the limit of maximum open files allowed by your OS. Kudos for me that Merovingian doesn't crash but properly reports this. Bad karma for me that I leak the filedescriptors somewhere, hence the too many opened files.
There is lots of space on the disk.
It is related to the setings of `limit`, i.e. the "descriptors" setting.
/monetdb status shows the database is up and running
This is a new process, and hence can open files
Although it is not possible to connect to the database via the daemon, we can connect using mclient,
Correct, the database is just still running without problems.
And we can connect to the database from other clients on the network by connecting directly to the database, bypassing the daemon (which is what we are doing for now)
Restart Merovingian, and you should be able to go for roughly a week. In the meanwhile I'll try to hunt down this bug.