The quick summary of this issue is that the backup_label file is an integral part of your database cluster binary backup, and removing it to allow the recovery to proceed without error is very likely to corrupt your database.  Don't do that.

Note that this post does not attempt to provide complete instructions
for how to restore from a binary backup -- the documentation has all
that, and it is of no benefit to duplicate it here; this is to warn
people about a common error in the process that can corrupt databases
when people try to take short-cuts rather than following the steps
described in the documentation.

How to Lose Data

The Proximate Cause

If you are not careful to follow the documentation's instructions for
archiving, binary backup, and PITR restore the attempt to start the
restored database may fail, and you may see this in the log:

FATAL:  could not locate required checkpoint record
HINT:  If you are not restoring from a backup, try removing the file "$PGDATA/backup_label".

... where $PGDATA is the path to the data directory.  It is critically important to note that the hint says to try removing the file "If you are not restoring from a backup".  If you are
restoring from a backup, removing the file will prevent recovery from
knowing what set of WAL records need to be applied to the copy to put it
into a coherent state; it will assume that it is just recovering from a
crash "in place" and will be happy to apply WAL forward from the
completion of the last checkpoint.  If that last checkpoint
happened after you started the backup process, you will not replay all
the WAL needed to achieve a coherent state, and you are very likely to
have corruption in the restored database.  This corruption could result
in anything from the database failing to start to errors about bad pages
to silently returning incorrect results from queries when a particular
index is used.  These problems may appear immediately or lie dormant for
months before causing visible problems.

Note
that you might sometimes get lucky and not experience corruption.  That
doesn't mean that deleting the file when restoring from a backup is any
more safe than stepping out onto a highway without checking for
oncoming traffic -- failure to get clobbered one time provides no
guarantee that you will not get clobbered if you try it again.

Secondary Conditions

Now,
if you had followed all the other instructions from the documentation
for how to restore, making the above mistake would not corrupt your
database.  It can only do so as the last step in a chain of mistakes. 
Note that for restoring a backup you are supposed to make sure that the postmaster.pid file and the files in the pg_xlog
subdirectory have been deleted.  Failure to do so can cause corruption
if the database manages to recover in spite of the transgressions.  But
if you have deleted (or excluded from backup) the files in the pg_xlog directory, deleting the backup_label file is likely to result in another failure to start, with this in the log:

LOG:  invalid primary checkpoint record
LOG:  invalid secondary checkpoint record
PANIC:  could not locate a valid checkpoint record

What the hint from the first error above doesn't say is that if you are restoring from a backup, you should check that you don't have any files in pg_xlog from the time of the backup, you should check that do not have a postmaster.pid file, and you should make sure you have a recovery.conf file with appropriate contents (including a restore_command entry that will copy from your archive location).

Why Does This Happen?

The Recovery Process

Restoring from a binary backup makes use of the same recovery process
that prevents data loss on a crash of the server.  As pages for
relations (tables, indexes, etc.) and other internal structures are
modified, these changes are made in RAM buffers which are not written
out to the OS until they have been journalled to the Write Ahead Log
(WAL) files and flushed to persistent storage (e.g., disk). 
Periodically there is a checkpoint, which writes all of the modified
pages out to the OS and tells the OS to flush them to permanent
storage.  So, if there is a crash, the recovery process can look to the
last checkpoint and apply all WAL from that point forward to reach a
consistent state.  WAL replay will create, extend, truncate, or remove
tables as needed, modify data within files, and will tolerate the case
that these changes were already flushed to the main files or have not
yet made it to persistent storage.  To handle possible race conditions
around the checkpoint, the system tracks the last two checkpoints, and
if it can't use one of them it will go to the other.

When you run pg_start_backup()
it waits for a distributed (or "paced") checkpoint in process to
complete, or (if requested to do so with the "fast" parameter) forces an
immediate checkpoint at maximum speed.  You can then copy the files in
any order while they are being modified as long as the copy is completed
before pg_stop_backup()
is called.  Even though there is not consistency among the files (or
even within a single file), WAL replay (if it starts from the point of
the checkpoint related to the call to pg_start_backup()) will bring things to a coherent state just as it would in crash recovery.

The backup_label File

How does the recovery process know where in the WAL stream it has to
start replay for it to be possible to reach a consistent state?  For
crash recovery it's simple: it goes to the last checkpoint that is in
the WAL based on data saved in the global/pg_control
file
.  For restoring a backup, the starting point in the WAL steam must
be recorded somewhere for the recovery process to find and use.  That
is the purpose of the backup_label
file.  The presence of the file indicates to the recovery process that
it is restoring from a backup, and tells it what WAL is needed to reach a
consistent state.  It also contains information that may be of interest
to a DBA, and is in a human-readable format; but that doesn't change
the fact that it is an integral part of a backup, and the backup is not
complete or guaranteed to be usable if it is removed.

Recovery

If you delete the file and cannot prove that there were no checkpoints after pg_start_backup()
was run and before the backup copy was completed, you should assume
that the database has hidden corruption.  If you can restore from a
backup correctly, that is likely to be the best course; if not, you
should probably use pg_dump and/or pg_dumpall to get a logical dump, and restore it to a fresh cluster (i.e., use initdb to get a cluster free from corruption to restore into).

Avoidance

If you read the documentation for restoring a binary backup, and follow
the steps provided, you will never see this error during a restore and
will not suffer the corruption problems.

参考:

http://tbeitr.blogspot.com/2015/07/deleting-backuplabel-on-restore-will.html

注:1、在做物理备份时产生的backup_label里面记录了恢复时的起始checkpoint,删掉该文件后,除非你能证明在做备份期间,没有checkpoint产生,否则备份是无法使用的。

If you delete the file and cannot prove that there were no checkpoints after pg_start_backup() was run and before the backup copy was completed, you should assume that the database has hidden corruption.

2、类似crash的恢复则是读取的global/pg_control file。

Deleting backup_label on restore will corrupt your database!的更多相关文章

  1. How to restore and recover a database from an RMAN backup. (Doc ID 881395.1)

    APPLIES TO: Oracle Database - Enterprise Edition - Version 10.1.0.2 to 11.2.0.2 [Release 10.1 to 11. ...

  2. To restore the database on a new host-将数据库恢复至一个新的主机上

    To restore the database on a new host:1. Ensure that the backups of the target database are accessib ...

  3. 官方文档 恢复备份指南七 Using Flashback Database and Restore Points

    本章内容: Understanding Flashback Database, Restore Points and Guaranteed Restore Points Logging for Fla ...

  4. 转 Monitoring Restore/Recovery Progress

    ora-279 是可以忽略的报错 In general, a restore should take approximately the same time as a backup, if not l ...

  5. Avoiding PostgreSQL database corruption

    TL;DR: Don't ever set fsync=off, don't kill -9 the postmaster then deletepostmaster.pid, don't run P ...

  6. Inno Setup connection to the database and create

    原文 Inno Setup connection to the database and create Description: the first half of this program in I ...

  7. 转 RMAN: RAC Backup, Restore and Recovery using RMAN

    PURPOSE The purpose of this document is to give a quick guide for using RMAN on RAC databases. We wi ...

  8. DB restore point and datagurad

    ######## 12.5.1 Flashing Back a Physical Standby Database to a Specific Point-in-Time The following ...

  9. ylb: 数据库备份(Backup)和还原(Restore)

    ylbtech-SQL Server:SQL Server- 数据库备份(Backup)和还原(Restore) -- ======================================== ...

随机推荐

  1. PHP 小方法之 写日志方法

    if(! function_exists ('write_log') ) { function write_log($data, $name='debug', $date=null){ if (is_ ...

  2. <!DOCTYPE> DTD基础

    1.什么是DTD? DTD(Document Type Definition)是文档类型定义. 2.DTD有什么用? 约定文档格式,规定元素,元素属性,元素关系,标签,实体等. 3.DTD分类 DTD ...

  3. Bootstrap3.0学习教程十七:JavaScript插件模态框

    这篇文章中我们主要来学习一下JavaScipt插件模态框.在学习模态框之前,我们先来了解一下JavaScript插件吧. JavaScript插件概览 插件可以单个引入(使用Bootstrap提供的单 ...

  4. Django1.9开发博客(12)- i18n国际化

    国际化与本地化的目的为了能为各个不同的用户以他们最熟悉的语言和格式来显示网页. Django能完美支持文本翻译.日期时间和数字的格式化.时区. 另外,Django还有两点优势: 允许开发者和模板作者指 ...

  5. 【转载】 ionic 的 下拉刷新 与 上拉加载

    这篇文章是讲解 Ioinc中怎么实现 下拉刷新和上拉加载的.也是我们日常做项目是必不可少的功能.有兴趣的小伙伴可以来学习一下. 更多关于 IONIC 的资源: http://www.aliyue.ne ...

  6. java 多线程编程三种实现方式

    一种是继承Thread类,一种是实现Runable接口,还有一种是实现callable接口: 有博主说只有前面2种方式,我个人愚见是三种,主要详细介绍下callable的使用: 三种线程的我的个人理解 ...

  7. Java实习生面试总结

    之前写了一直存着当草稿,今天看了看. --------------------------------------------------------------------------------- ...

  8. 上下文菜单项(contextMenu)----长按按钮弹出菜单项

    <?xml version="1.0" encoding="utf-8"?> <LinearLayout xmlns:android=&quo ...

  9. PowerShell脚本:随机密码生成器

    脚本名称:s随机密码生成器_v2.63.ps1脚本作用:产生随机密码.每密码字符个数,密码数量,存盘位置等可以自定义.脚本用法:脚本采用了硬编码,所以你需要打开脚本,修改如下变量:$生成密码总个数 = ...

  10. 读取配置文件 PropertyPlaceholderConfigurer 的配置与使用

    public class SpringPropertyConfigurer extends PropertyPlaceholderConfigurer { private static Map< ...