Fabian and Stefan, Thank you both for the time you took to reply. We are trying to create a deployment with as much redundancy (failover) as we can. We assume hardware will fail (drives, hardware network interfaces, memory, etc, etc) and we may lose a machine, a switch, etc. The issue is not the ability of the merovingian daemon to restart mserver5 processes on the same machine, the issue is what if we lose the machine or we can't connect to the merovingian daemon? (As an aside my understanding is that merovingian starts and monitors mserver5 processes on the same machine - I do not see a way to configure merovingian to start mserver5 processes on other machines.) Our plan is to use multiple instances of MonetDB (running on multiple machines) each serving an architecturally distinct portion of the system's data such that the failure of one instance would not prevent other parts of the system from functioning. However, we would dearly like to have a failover capability on each instance of MonetDB. Again, only one instance of MonetDB would be interacting with a specific dbfarm and dbname at a time, but if it (or the machine it is on) failed to respond, we would redirect the work to a 'backup' instance on another machine. The merovingian daemon of the 'backup' would be started but there would be no activity until needed. So I guess the question is, when instance 1 of merovingian or a mserver5 process locks a dbfarm and dbname when does it release the lock? and if it fails (software or hardware failure) I presume the locks still exist preventing instance 2 from using that dbfarm and dbname? In which case we are out of luck. On Wednesday 03 September 2008 04:31:06 Stefan Manegold wrote:
On Wed, Sep 03, 2008 at 09:38:46AM +0200, Fabian Groffen wrote:
Hi Matthew,
On 02-09-2008 21:56:40 -0400, McKennirey.Matthew wrote:
While we look forward to the availability of a new real-time replication strategy for MonetDB, we were wondering if it would be plausible to configure two instances of MonetDB, on different machines, to point to the same dbfarm.
I assume you mean not only using the same dbfarm, but also using the same databases. MonetDB locks the database it is using, so unless NFS locking or something is malfunctioning you should see this doesn't work.
At different points in time (i.e., not concurrently) two different instances of MonetDB can (technically) very well share the same dbfarm --- provided the two instances of MonetDB are binary compatible. In fact, multiple instances of MonetDB can even concurrently share the same dbfarm, provided they all use a different database (dbname). MonetDB locks the database such that only a single instance can use a particular database at a time.
Only one instance would be used at a time; one instance would be the primary instance and the second instance would only be used if the first failed to respond (at which time we would stop sending requests to the primary instance and raise an alert for the system adminsitrator)
Sounds like a "failover".
We can provide for the replication of the database data at the file system layer (ZFS) but are still susceptible to a failure of MonetDB or the machine it is running on.
The danger of the ZFS solution here is that you get a copy, that doesn't include locks. We once had some thoughts of supporting read-only databases (think of a LiveCD), but for your use that sounds not quite like what you need either.
I'm wondering why you actually want to "failover" from another machine. Does it mean that merovingian isn't able to cover up mserver crashes? Does mserver take the entire machine down? Or are there other conditions (network?) that lead to this multi machine failover strategy?
I do share Fabians concerns.
Stefan