I have been attempting to create a viable period backup strategy for MonetDB and have had little success to date. Our database is way too big to use msqldump, so our only option is the one outlined here: https://www.monetdb.org/Documentation/UserGuide/FastDumpRestore. I follow those steps and use rsync to do an incremental backup of the dbfarm to another device (from fast SSD to RAID5 array). That works fine, until it doesn’t, in which case the back up gets corrupted somehow and MonetDB won’t start from the backup. That is the state I am currently in. This is the error that I am getting: !FATAL: logger_load: BBPrename to sql_snapshots_bid failed Any help on 1. Fixing the existing backup, and/or 2. Helping me understand a better way to do period, incremental backups. Thanks, Vince The information transmitted, including any attachments, is intended only for the individual or entity to which it is addressed, and may contain confidential and/or privileged information. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by individuals or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you have received this communication in error, please delete the information from any computer and notify the sender.
Hi Vincent On 24/01/2017 17:51, Vincent Sheffer wrote:
I have been attempting to create a viable period backup strategy for MonetDB and have had little success to date.
Our database is way too big to use msqldump, so our only option is the one outlined here: https://www.monetdb.org/Documentation/UserGuide/FastDumpRestore.
I follow those steps and use rsync to do an incremental backup of the dbfarm to another device (from fast SSD to RAID5 array). That works fine, until it doesn’t, in which case the back up gets corrupted somehow and MonetDB won’t start from the backup. That is the state I am currently in.
Key issue is that indeed the database server has been stopped before you start a rsync. Furthermore, Rsync is an elaborate program with lots of options. If an rsync fails, for whatever system error, indeed the backup data can not be trusted. The list of possible errors to cope with are: 1 Syntax or usage error 2 Protocol incompatibility 3 Errors selecting input/output files, dirs 4 Requested action not supported: an attempt was made to manipulate 64-bit files on a platform that cannot support them; or an option was specified that is supported by the client and not by the server. 5 Error starting client-server protocol 6 Daemon unable to append to log-file 10 Error in socket I/O 11 Error in file I/O 12 Error in rsync protocol data stream 13 Errors with program diagnostics 14 Error in IPC code 20 Received SIGUSR1 or SIGINT 21 Some error returned by waitpid() 22 Error allocating core memory buffers 23 Partial transfer due to error 24 Partial transfer due to vanished source files 25 The --max-delete limit stopped deletions 30 Timeout in data send/receive 35 Timeout waiting for daemon connection From all these exit values, the backup version can not be trusted and Rsync should be restarted. Recurring errors can indicate a broken disk. Recovering from a corrupted backup (disk) can not be detected by a DBMS. regards, Martin
This is the error that I am getting:
!FATAL: logger_load: BBPrename to sql_snapshots_bid failed
Any help on
1. Fixing the existing backup, and/or 2. Helping me understand a better way to do period, incremental backups.
Thanks, Vince
The information transmitted, including any attachments, is intended only for the individual or entity to which it is addressed, and may contain confidential and/or privileged information. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by individuals or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you have received this communication in error, please delete the information from any computer and notify the sender. _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Martin,
Thanks for the details, particularly the exist status list for rsync.
More questions:
1. When you say “restart rsync” that means start a new, non-incremental, rsync, right? Or do you think it should it be possible to attempt another incremental rsync?
2. Any idea if we can recover from the error message I sent over? !FATAL: logger_load: BBPrename to sql_snapshots_bid failed
3. Is stopping and locking MonetDB sufficient to ensure a consistent state? Does stopping allow running queries to complete, or at least modifications complete? Or are the sys.sessions query and sys.shutdown also required?
Thanks,
Vince
On 1/24/17, 11:11 AM, "users-list on behalf of Martin Kersten"
On 24/01/2017 21:41, Vincent Sheffer wrote:
Martin,
Thanks for the details, particularly the exist status list for rsync.
You should stop MonetDB using the monetdb command and also lock it. Otherwise, users can again gain access. monetdb stop <databasename> monetdb lock <databasename> rsync monetdb release <databasename> monetdb start <databasename> All transactions are properly finished.
More questions: 1. When you say “restart rsync” that means start a new, non-incremental, rsync, right? Or do you think it should it be possible to attempt another incremental rsync? 2. Any idea if we can recover from the error message I sent over? !FATAL: logger_load: BBPrename to sql_snapshots_bid failed It it reads an broken backup then likely somewhere the system encounters an unexpected situation and stops.
3. Is stopping and locking MonetDB sufficient to ensure a consistent state? Does stopping allow running queries to complete, or at least modifications complete? Or are the sys.sessions query and sys.shutdown also required?
Queries will complete or receive a soft termination signal. Never simply stop the server using e.g. a kill command.!! regards, Martin
Thanks, Vince
On 1/24/17, 11:11 AM, "users-list on behalf of Martin Kersten"
wrote: Hi Vincent
On 24/01/2017 17:51, Vincent Sheffer wrote: > I have been attempting to create a viable period backup strategy for MonetDB and have had little success to date. > > Our database is way too big to use msqldump, so our only option is the one outlined here: https://www.monetdb.org/Documentation/UserGuide/FastDumpRestore. > > I follow those steps and use rsync to do an incremental backup of the dbfarm to another device (from fast SSD to RAID5 array). That works fine, until it doesn’t, in which case the back up gets corrupted somehow and MonetDB won’t start from the backup. That is the state I am currently in. >
Key issue is that indeed the database server has been stopped before you start a rsync. Furthermore, Rsync is an elaborate program with lots of options. If an rsync fails, for whatever system error, indeed the backup data can not be trusted. The list of possible errors to cope with are: 1 Syntax or usage error 2 Protocol incompatibility 3 Errors selecting input/output files, dirs 4 Requested action not supported: an attempt was made to manipulate 64-bit files on a platform that cannot support them; or an option was specified that is supported by the client and not by the server. 5 Error starting client-server protocol 6 Daemon unable to append to log-file 10 Error in socket I/O 11 Error in file I/O 12 Error in rsync protocol data stream 13 Errors with program diagnostics 14 Error in IPC code 20 Received SIGUSR1 or SIGINT 21 Some error returned by waitpid() 22 Error allocating core memory buffers 23 Partial transfer due to error 24 Partial transfer due to vanished source files 25 The --max-delete limit stopped deletions 30 Timeout in data send/receive 35 Timeout waiting for daemon connection
From all these exit values, the backup version can not be trusted and Rsync should be restarted. Recurring errors can indicate a broken disk.
Recovering from a corrupted backup (disk) can not be detected by a DBMS.
regards, Martin
> This is the error that I am getting: > > !FATAL: logger_load: BBPrename to sql_snapshots_bid failed > > Any help on > > 1. Fixing the existing backup, and/or > 2. Helping me understand a better way to do period, incremental backups. > > Thanks, > Vince > > > The information transmitted, including any attachments, is intended only for the individual or entity to which it is addressed, and may contain confidential and/or privileged information. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by individuals or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you have received this communication in error, please delete the information from any computer and notify the sender. > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list >
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
Martin,
The sequence of actions you specify is exactly what the backup script does.
The question you did not answer is really the most important one, so I will restate it:
If there is a single glitch in any given rsync, for whatever reason, can we expect another incremental rsync to repair the damage? I ask, because if the answer is no, then MonetDB does not have anything close to a viable incremental backup strategy. There is no way we can expect no problem to ever occur. And we can’t have our system offline for hours while we do a complete copy of TBs worth of data every time a glitch does occur.
Without a solution to this problem MonetDB is just not viable for us.
Thanks,
Vince
On 1/24/17, 1:32 PM, "users-list on behalf of Martin Kersten"
On 25/01/17 02:30, Vincent Sheffer wrote:
Martin,
The sequence of actions you specify is exactly what the backup script does.
The question you did not answer is really the most important one, so I will restate it:
If there is a single glitch in any given rsync, for whatever reason, can we expect another incremental rsync to repair the damage? I ask, because if the answer is no, then MonetDB does not have anything close to a viable incremental backup strategy. There is no way we can expect no problem to ever occur. And we can’t have our system offline for hours while we do a complete copy of TBs worth of data every time a glitch does occur.
Depending on the glitch in rsync, you should be able to rerun it and rsync will fix the backup. Do look at the rsync options, though. In particular, there are options that tell rsync to not just look at the timestamp and size of a file, but also at the contents. You may need to enable that option in the second run. It will slow down the rsync process considerably, though. If the failure is because of a full disk, you first need to make space (obviously). And if the failure is due to a broken disk, you need to replace the disk.
Without a solution to this problem MonetDB is just not viable for us.
Thanks, Vince
On 1/24/17, 1:32 PM, "users-list on behalf of Martin Kersten"
wrote: On 24/01/2017 21:41, Vincent Sheffer wrote: > Martin, > > Thanks for the details, particularly the exist status list for rsync. > You should stop MonetDB using the monetdb command and also lock it. Otherwise, users can again gain access.
monetdb stop <databasename> monetdb lock <databasename> rsync monetdb release <databasename> monetdb start <databasename>
All transactions are properly finished.
> More questions: > 1. When you say “restart rsync” that means start a new, non-incremental, rsync, right? Or do you think it should it be possible to attempt another incremental rsync? > 2. Any idea if we can recover from the error message I sent over? !FATAL: logger_load: BBPrename to sql_snapshots_bid failed It it reads an broken backup then likely somewhere the system encounters an unexpected situation and stops.
> 3. Is stopping and locking MonetDB sufficient to ensure a consistent state? Does stopping allow running queries to complete, or at least modifications complete? Or are the sys.sessions query and sys.shutdown also required? > Queries will complete or receive a soft termination signal.
Never simply stop the server using e.g. a kill command.!!
regards, Martin
> Thanks, > Vince > > On 1/24/17, 11:11 AM, "users-list on behalf of Martin Kersten"
wrote: > > Hi Vincent > > On 24/01/2017 17:51, Vincent Sheffer wrote: > > I have been attempting to create a viable period backup strategy for MonetDB and have had little success to date. > > > > Our database is way too big to use msqldump, so our only option is the one outlined here: https://www.monetdb.org/Documentation/UserGuide/FastDumpRestore. > > > > I follow those steps and use rsync to do an incremental backup of the dbfarm to another device (from fast SSD to RAID5 array). That works fine, until it doesn’t, in which case the back up gets corrupted somehow and MonetDB won’t start from the backup. That is the state I am currently in. > > > > Key issue is that indeed the database server has been stopped before you start a rsync. > Furthermore, Rsync is an elaborate program with lots of options. > If an rsync fails, for whatever system error, indeed the backup data can not be trusted. > The list of possible errors to cope with are: > 1 Syntax or usage error > 2 Protocol incompatibility > 3 Errors selecting input/output files, dirs > 4 Requested action not supported: an attempt was made to manipulate 64-bit files on a platform that cannot support them; or an option was specified that is supported by the client and not by the server. > 5 Error starting client-server protocol > 6 Daemon unable to append to log-file > 10 Error in socket I/O > 11 Error in file I/O > 12 Error in rsync protocol data stream > 13 Errors with program diagnostics > 14 Error in IPC code > 20 Received SIGUSR1 or SIGINT > 21 Some error returned by waitpid() > 22 Error allocating core memory buffers > 23 Partial transfer due to error > 24 Partial transfer due to vanished source files > 25 The --max-delete limit stopped deletions > 30 Timeout in data send/receive > 35 Timeout waiting for daemon connection > > From all these exit values, the backup version can not be trusted and Rsync should be restarted. > Recurring errors can indicate a broken disk. > > Recovering from a corrupted backup (disk) can not be detected by a DBMS. > > regards, Martin > > > This is the error that I am getting: > > > > !FATAL: logger_load: BBPrename to sql_snapshots_bid failed > > > > Any help on > > > > 1. Fixing the existing backup, and/or > > 2. Helping me understand a better way to do period, incremental backups. > > > > Thanks, > > Vince > > > > > > The information transmitted, including any attachments, is intended only for the individual or entity to which it is addressed, and may contain confidential and/or privileged information. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon, this information by individuals or entities other than the intended recipient is prohibited, and all liability arising therefrom is disclaimed. If you have received this communication in error, please delete the information from any computer and notify the sender. > > _______________________________________________ > > users-list mailing list > > users-list@monetdb.org > > https://www.monetdb.org/mailman/listinfo/users-list > > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list > > > _______________________________________________ > users-list mailing list > users-list@monetdb.org > https://www.monetdb.org/mailman/listinfo/users-list > _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
_______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
Rerunning rsync did, indeed, resolve the issue. That is great news. I asked my question because I had already re-run it and it failed, but then we did some additional maintenance on the DB and ran it again. The 3rd time worked.
But there are some possibly better options mentioned in this thread that we will be exploring, specifically LVM snapshots. Our configuration is complex, but we may be able to make snapshots work.
At some point we will also explore BTRFS, but we don’t deem it quite ready for our needs.
For what it is worth our rsync command line options are: -a –delete
Thanks,
Vince
On 1/25/17, 12:58 AM, "Sjoerd Mullender"
On Tue, Jan 24, 2017 at 1:32 PM, Martin Kersten
Never simply stop the server using e.g. a kill command.!!
Is MonetDB intended to survive a power failure, system crash, OOM condition, or other event which causes all database processes to terminate? (The existence of the "wal" implies the answer is yes) If so, then you could use a low-level atomic snapshot mechanism (LVM, zfs, btrfs support this) and then use rsync to copy the snapshot. After restoring the backup it will appear as if MonetDB simply crashed and should be able to recover from that. rsync against a running database is highly unsafe because it copies different files at different times, but an OS-level snapshot will see a consistent version of all files in the dbfarm. -s
On 25/01/17 03:34, Stefan O'Rear wrote:
On Tue, Jan 24, 2017 at 1:32 PM, Martin Kersten
wrote: Never simply stop the server using e.g. a kill command.!!
Is MonetDB intended to survive a power failure, system crash, OOM condition, or other event which causes all database processes to terminate? (The existence of the "wal" implies the answer is yes)
The answer to this is indeed yes. However, disks sometimes lie to the operating system about the status of writes, so the OS might think a write to disk was successful when in fact the data is still in some cache internal to the disk. This falls outside the scope of MonetDB. And of course, bugs are always possible (and have been found and fixed in the past). But the intention is clear.
If so, then you could use a low-level atomic snapshot mechanism (LVM, zfs, btrfs support this) and then use rsync to copy the snapshot. After restoring the backup it will appear as if MonetDB simply crashed and should be able to recover from that.
rsync against a running database is highly unsafe because it copies different files at different times, but an OS-level snapshot will see a consistent version of all files in the dbfarm.
As long as the snapshot is indeed an atomic operation with respect to changes made to the file system by other processes (i.e. MonetDB), this is indeed a viable strategy.
-s _______________________________________________ users-list mailing list users-list@monetdb.org https://www.monetdb.org/mailman/listinfo/users-list
-- Sjoerd Mullender
participants (4)
-
Martin Kersten
-
Sjoerd Mullender
-
Stefan O'Rear
-
Vincent Sheffer