PostgreSQL基于时间点故障恢复PITR（ point-in-time recovery ）

PostgreSQL在使用过程中经常会发生一些失误的操作，但往往是可以弥补的。但是如果真遇到了无法挽回的误操作，只能寄希望于有备份了。

接下来的故障恢复也是基于有备份的情况，没有备份的情况，目前还没有想到怎么做。

1.首先在数据库中配置日志归档

1）创建归档目录

mkdir -p /var/lib/pgsql/pg10/archive/

2)修改postgresql.conf文件

wal_level=replica

archive_mode = on

archive_command='test ! -f /var/lib/pgsql/9.10/archive/%f && cp %p /var/lib/pgsql/9.10/archive/%f'

3）重启数据库

pg_ctl restart

2.对数据库进行全量备份，这里只是为了测试，就简单的对目录进行拷贝即可

cp -r $PGDATA ~/pg10/full_back

3.对数据库进行操作并记录对应的日志号

select txid_current();

 txid_current

--------------

          557

(1 row)

              now

-------------------------------

 2018-09-03 18:07:14.288787+08

(1 row)

delete first 100 tuple of run_command

DELETE 99

 count

-------

 99901

(1 row)

select txid_current();

 txid_current

--------------

          559

(1 row)

              now

-------------------------------

 2018-09-03 18:07:14.500745+08

(1 row)

delete last 100 tuple of run_command

DELETE 100

 count

-------

 99801

(1 row)

select txid_current();

 txid_current

--------------

          561

(1 row)

              now

-------------------------------

 2018-09-03 18:07:14.571154+08

(1 row)

checkpoint

CHECKPOINT

pg_switch_wal

 pg_switch_wal

---------------

 0/3005FA0

(1 row)

checkpoint

CHECKPOINT

pg_switch_wal

 pg_switch_wal

---------------

 0/40000E8

(1 row)

4.设置recovery.conf文件

restore_command = 'cp /var/lib/pgsql/pg10/archive/%f %p'

recovery_target_xid = '557'

recovery_target_inclusive = false

recovery_target_timeline = 'latest'

5.以为恢复成功了，结果发现系统只读，不能写，paused! 后续补充····，日志：

-bash-4.1$ cat log/postgresql-2018-09-03_181007.log

2018-09-03 18:10:07.160 CST [850] LOG:  database system was interrupted; last known up at 2018-09-03 18:07:12 CST

2018-09-03 18:10:07.160 CST [850] LOG:  creating missing WAL directory "pg_wal/archive_status"

cp: cannot stat `/var/lib/pgsql/pg10/archive/00000002.history': No such file or directory

2018-09-03 18:10:07.431 CST [850] LOG:  starting point-in-time recovery to 2018-09-03 18:07:14.500745+08

2018-09-03 18:10:07.448 CST [850] LOG:  restored log file "000000010000000000000002" from archive

2018-09-03 18:10:07.596 CST [850] LOG:  redo starts at 0/2000028

2018-09-03 18:10:07.613 CST [850] LOG:  consistent recovery state reached at 0/2003C30

2018-09-03 18:10:07.613 CST [848] LOG:  database system is ready to accept read only connections

看其他人使用过程中遇到文件不存在时，会自动创建一个新的时间线，然后恢复完成，而他们都是用的10以前版本，可能因此造成的吧。

6.经过多次分析，在data目录的pg_wal中也没有发现 “00000002.history”文件，于是尝试重新回放日志，终于成功：

postgres=# select pg_wal_replay_resume();

 pg_wal_replay_resume

----------------------

(1 row)

postgres=# select pg_wal_replay_resume();

ERROR:  recovery is not in progress

HINT:  Recovery control functions can only be executed during recovery

postgres=# select count(*) from run_command ;

 count

-------

 99901

(1 row)

postgres=# insert into run_command values (1, 'test new');

INSERT 0 1

postgres=# \q

执行pg_wal_replay_resume()的日志：

-bash-4.1$ cat log/postgresql-2018-09-03_181007.log

2018-09-03 18:10:07.160 CST [850] LOG:  database system was interrupted; last known up at 2018-09-03 18:07:12 CST

2018-09-03 18:10:07.160 CST [850] LOG:  creating missing WAL directory "pg_wal/archive_status"

cp: cannot stat `/var/lib/pgsql/pg10/archive/00000002.history': No such file or directory

2018-09-03 18:10:07.431 CST [850] LOG:  starting point-in-time recovery to 2018-09-03 18:07:14.500745+08

2018-09-03 18:10:07.448 CST [850] LOG:  restored log file "000000010000000000000002" from archive

2018-09-03 18:10:07.596 CST [850] LOG:  redo starts at 0/2000028

2018-09-03 18:10:07.613 CST [850] LOG:  consistent recovery state reached at 0/2003C30

2018-09-03 18:10:07.613 CST [848] LOG:  database system is ready to accept read only connections

2018-09-03 18:10:07.646 CST [850] LOG:  restored log file "000000010000000000000003" from archive

2018-09-03 18:10:07.779 CST [866] LOG:  duration: 28.273 ms  statement: select count(*) from run_command

2018-09-03 18:10:07.797 CST [868] ERROR:  cannot execute INSERT in a read-only transaction

2018-09-03 18:10:07.797 CST [868] STATEMENT:  insert into run_command values(1, 'test new')

2018-09-03 18:10:07.804 CST [850] LOG:  recovery stopping before commit of transaction 560, time 2018-09-03 18:07:14.52735+08

2018-09-03 18:10:07.804 CST [850] LOG:  recovery has paused

2018-09-03 18:10:07.804 CST [850] HINT:  Execute pg_wal_replay_resume() to continue.

2018-09-03 18:10:21.263 CST [870] LOG:  duration: 0.697 ms  statement: select pg_wal_replay_resume();

2018-09-03 18:10:21.818 CST [850] LOG:  redo done at 0/3005E90

2018-09-03 18:10:21.818 CST [850] LOG:  last completed transaction was at log time 2018-09-03 18:07:14.496615+08

cp: cannot stat `/var/lib/pgsql/pg10/archive/00000002.history': No such file or directory

2018-09-03 18:10:21.886 CST [850] LOG:  selected new timeline ID: 2

cp: cannot stat `/var/lib/pgsql/pg10/archive/00000001.history': No such file or directory

2018-09-03 18:10:22.145 CST [850] LOG:  archive recovery complete

2018-09-03 18:10:22.476 CST [848] LOG:  database system is ready to accept connections

2018-09-03 18:10:22.775 CST [870] ERROR:  recovery is not in progress

2018-09-03 18:10:22.775 CST [870] HINT:  Recovery control functions can only be executed during recovery.

2018-09-03 18:10:22.775 CST [870] STATEMENT:  select pg_wal_replay_resume();

7.然后又尝试了使用时间和恢复点来回放，都没问题。

8.附上recovery.conf文件的配置：

在恢复过程中，用户可以通过使用recovery.conf文件来指定恢复的各个参数，如下：

归档恢复设置

restore_command：用于获取一个已归档段的XLOG日志文件的命令

archive_cleanup_command：清除不在需要的XLOG日志文件的命令

recovery_end_command：归档恢复结束后执行的命令

恢复目标设置（默认情况下，数据库将会一直恢复到 WAL 日志的末尾）

recovery_target = ’immediate’：在从一个在线备 份中恢复时，这意味着备份结束的那个点

recovery_target_name (string)：这个参数指定(pg_create_restore_point()所创建)的已命名的恢复点,将恢复到该恢复点

recovery_target_time (timestamp)：这个参数指定恢复到的时间戳

recovery_target_xid (string)：这个参数指定恢复到的事务 ID

recovery_target_inclusive (boolean)：指定是否在指定的恢复目标之后停止(true)，或者在恢复目标之前停止 (false)；适用于recovery_target_time或者recovery_target_xid被指定的情况；这个设置分别控制事务是否有准确的目标提交时间或 ID 是否将被包括在该恢复中；默认值为 true

recovery_target_timeline (string)：指定恢复到一个特定的时间线

recovery_target_action (enum)：指定在达到恢复目标时服务器应该立刻采取的动作，包括pause（暂停）、promote（接受连接）、shutdown（停止服务器），其中pause为默认动作

备库参数设置

standby_mode(boolean)：为on表示作为一个备库，否则不为备库

primary_conninfo (string)：指定备库连接主库的连接字符串

primary_slot_name (string)：通过流复制指定主库的一个复制槽来复制主库数据，如果没有设置primary_conninfo，则此参数无效

trigger_file (string)：指定一个触发器文件，该文件存在可以结束备库的恢复，即升级备库为一个独立的主库

recovery_min_apply_delay (integer)：这个参数允许将恢复延迟一段固定的时间，如果没有指定单位则以毫秒为单位。

如果recovery.conf中同时指定了recoveryTargetXid、recoveryTargetName、recoveryTargetTime时，PostgreSQL会按照RECOVERY_TARGET_XID> RECOVERY_TARGET_NAME > RECOVERY_TARGET_TIME的优先级来获取最终的目标恢复位点。

如果在recovery.conf指定recovery_targetTimeLine为latest，则可以基于当前TimeLineID为起点寻找最新时间线：

寻找当前TimeLineID的时间线历史文件“XXX.history”，如果存在则继续寻找，否则错误退出

TimeLineID是线性增长的，将当前TimeLineID自增1寻找是否存在时间线历史文件，直到不存在对应的时间线历史文件为止，即可找到最新的时间线。

后续准备找找如何在没有备份的情况下，恢复删除数据。。。。。。

PostgreSQL基于时间点故障恢复PITR（ point-in-time recovery ）的更多相关文章

7.5 Point-in-Time (Incremental) Recovery Using the Binary Log 使用binay log 基于时间点恢复
7.5 Point-in-Time (Incremental) Recovery Using the Binary Log 使用binay log 基于时间点恢复 7.5.1 Point-in-Tim ...
mysql基于“时间”的盲注
无需页面报错,根据页面响应时间做判断! mysql基于时间的盲注 =================================================================== ...
表空间基于时间点的恢复(TSPITR)
环境:RHEL 6.4 + Oracle 11.2.0.4 准备模拟环境 1. 验证表空间的依赖性 2. 确定执行TSPITR后会丢失的对象 3. 自动执行TSPITR Reference 准备模拟环 ...
JavaScript基于时间的动画算法
转自:https://segmentfault.com/a/1190000002416071 前言前段时间无聊或有聊地做了几个移动端的HTML5游戏.放在不同的移动端平台上进行测试后有了诡异的发现, ...
7.5.1 Point-in-Time Recovery Using Event Times 使用Event Times 基于时间点恢复
7.5.1 Point-in-Time Recovery Using Event Times 使用Event Times 基于时间点恢复表明开始和结束时间用于恢复, 指定 --start-datet ...
ORACLE调度之基于时间的调度（一）【weber出品】
一.调度的概述这里我看到一篇对调度的概述觉得描述的比我好,但仅限于概述部分,其他部分我觉得我讲的比他好,于是发生以下事情: ************************华丽的转载******** ...
pfSense配置基于时间的防火墙规则
基于时间的规则允许防火墙规则在指定的日期和/或时间范围内激活.基于时间的规则与任何其他规则的功能相同,只是它们在预定时间之外的规则集中实际上不存在. 基于时间的规则逻辑处理基于时间的规则时,调度计划确 ...
Python：SQLMap源码精读—基于时间的盲注（time-based blind）
建议阅读 Time-Based Blind SQL Injection Attacks 基于时间的盲注(time-based blind) 测试应用是否存在SQL注入漏洞时,经常发现某一潜在的漏洞难以 ...
[Swift]LeetCode981. 基于时间的键值存储 | Time Based Key-Value Store
Create a timebased key-value store class TimeMap, that supports two operations. 1. set(string key, s ...

随机推荐

深入 CommonJs 与 ES6 Module
目前主流的模块规范 UMD CommonJs es6 module umd 模块(通用模块) (function (global, factory) { typeof exports === 'obj ...
linux中断的下半部机制
一.中断处理为什么要下半部?Linux在中断处理中间中断处理分了上半部和下半部,目的就是提高系统的响应能力和并发能力.通俗一点来讲:当一个中断产生,调用该中断对应的处理程序(上半部)然后告诉系统,对应 ...
Hadoop资源调度器
hadoop调度器的作用是将系统中空闲的资源按一定策略分配给作业.调度器是一个可插拔的模块,用户可以根据自己的实际应用要求设计调度器.Hadoop中常见的调度器有三种,分别为: 1.基于队列的FIFO ...
bzoj 3657 斐波那契数列(fib.cpp/pas/c/in/out)
空间 512M 时限2s [题目描述] 有n个大于1的正整数a1,a2,…,an,我们知道斐波那契数列的递推式是f(i)=f(i-1)+f(i-2),现在我们修改这个递推式变为f(i)=f(i-1) ...
js初步用法
js js引入方式: 1.方式一通过script标签引入 2.方式二通过script标签引入 ,src属性引入一个外部的js文件注意: 如果你使用了script标签的src属性那么 ...
vue.js的一些小语法v-bind,v-if,v-show,v-else
知识点: v-bind 动态绑定标签属性 v-bind 可简写为 : 使用v-bind 绑定class和内联样式使用v-if,v-show,v-else进行条件渲染 <template> ...
Hadoop 常用指令
1. 察看hdfs文件系统运行情况 bin/hdfs dfsadmin -report 2. 为了方便执行 HDFS 的操作指令,我们可以将需要的 Hadoop 路径写入环境变量中,便于直接执行命令. ...
nginx for windows 中虚拟主机路径设置问题
由于Windows版本的Nginx其实是在Cygwin环境下编译的,所以Nginx使用的是Cygwin的路径格式,所以在Nginx的配置文件nginx.conf中,路径既不能使用*nix的格式,也不能 ...
WinCE数据通讯之Web Service分包传输篇
前面写过<WinCE数据通讯之Web Service篇>那篇对于数据量不是很大的情况下单包传输是可以了,但是对于大数据量的情况下WinCE终端的内存往往会在解包或者接受数据时产生内存溢出. ...
LeetCode第[88]题(Java)：Merge Sorted Array（合并已排序数组）
题目:合并已排序数组难度:Easy 题目内容: Given two sorted integer arrays nums1 and nums2, merge nums2 into nums1 as ...

PostgreSQL基于时间点故障恢复PITR（ point-in-time recovery ）

PostgreSQL基于时间点故障恢复PITR（ point-in-time recovery ）的更多相关文章

随机推荐

热门专题