KingbaseES R3 集群主库归档失败案例
案例说明:
本案例用于KingbaseES R3集群归档进程归档日志失败的处理,对于一线的生产环境具有 一定的参考意义。
数据库版本:
TEST=# select version();
VERSION
---------------------------------------------------------------------------------------------------------------
Kingbase V008R003C002B0270 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)
集群架构:

一、案例故障现象(主库归档失败)
1、主库数据库服务进程
[kingbase@node1 bin]$ ps -ef |grep kingbase
kingbase 8180 1 0 13:44 ? 00:00:00 /home/kingbase/cluster/kha/db/bin/kingbase -D /home/kingbase/cluster/kha/db/data
kingbase 8181 8180 0 13:44 ? 00:00:00 kingbase: logger process
kingbase 8183 8180 0 13:44 ? 00:00:00 kingbase: checkpointer process
kingbase 8184 8180 0 13:44 ? 00:00:00 kingbase: writer process
kingbase 8185 8180 0 13:44 ? 00:00:00 kingbase: wal writer process
kingbase 8186 8180 0 13:44 ? 00:00:00 kingbase: autovacuum launcher process
kingbase 8187 8180 0 13:44 ? 00:00:00 kingbase: archiver process failed on 000000020000000000000010
kingbase 8188 8180 0 13:44 ? 00:00:00 kingbase: stats collector process
kingbase 8189 8180 0 13:44 ? 00:00:00 kingbase: bgworker: syslogical supervisor
kingbase 8253 8180 0 13:44 ? 00:00:00 kingbase: wal sender process SYSTEM 192.168.7.243(51049) streaming 0/160001B0

2、查看sys_log日志
2021-03-01 13:49:30.693 CST,,,8187,,603c7f31.1ffb,23,,2021-03-01 13:44:17 CST,,0,LOG,00000,"archive command failed with exit code 45","The failed archive command was: /home/kingbase/cluster/kha/db/bin/sys_rman_v6 --config /home/kingbase/kbbr3_repo/sys_rman_v6.conf --stanza=kingbase archive-push sys_xlog/000000020000000000000010",,,,,,,,""
2021-03-01 13:49:30.694 CST,,,8187,,603c7f31.1ffb,24,,2021-03-01 13:44:17 CST,,0,WARNING,01000,"archiving transaction log file ""000000020000000000000010"" failed too many times, will try again later",,,,,,,,,""
二、故障处理步骤
1、查看archive日志归档配置

2、查看归档配置文件和目录信息
[kingbase@node1 sys_log]$ ls -lh /home/kingbase/kbbr3_repo/sys_rman_v6.conf
-rw-rw-r-- 1 kingbase kingbase 589 Mar 1 12:26 /home/kingbase/kbbr3_repo/sys_rman_v6.conf
[kingbase@node1 sys_log]$ cat /home/kingbase/kbbr3_repo/sys_rman_v6.conf
# Genarate by script at 20210301122559, should not change manually
[kingbase]
kb1-path=/home/kingbase/cluster/kha/db/data
kb1-port=54321
kb1-user=SUPERMANAGER_V8ADMIN
kb1-pass=S0lOR0JBU0VBRE1JTg==
kb2-path=/home/kingbase/cluster/kha/db/data
kb2-port=54321
kb2-user=SUPERMANAGER_V8ADMIN
kb2-pass=S0lOR0JBU0VBRE1JTg==
kb2-host=192.168.7.243
kb2-host-user=kingbase
[global]
repo1-path=/home/kingbase/kbbr3_repo
repo1-retention-full=5
log-path=/tmp/
log-level-file=info
log-level-console=info
log-subprocess=y
process-max=4
#### default gz, support: gz none
compress-type=gz
compress-level=3
3、执行手工归档
[kingbase@node1 bin]$ /home/kingbase/cluster/kha/db/bin/sys_rman_v6 --config /home/kingbase/kbbr3_repo/sys_rman_v6.conf --stanza=kingbase archive-push /home/kingbase/cluster/kha/db/data/sys_xlog/000000020000000000000010
2021-03-01 14:43:20.928 P00 INFO: archive-push command begin 2.27: [/home/kingbase/cluster/kha/db/data/sys_xlog/000000020000000000000010] --compress-level=3 --compress-type=gz --config=/home/kingbase/kbbr3_repo/sys_rman_v6.conf --log-level-console=info --log-level-file=info --log-path=/tmp --log-subprocess --kb2-host=192.168.7.243 --kb1-path=/home/kingbase/cluster/kha/db/data --kb2-path=/home/kingbase/cluster/kha/db/data --process-max=4 --repo1-path=/home/kingbase/kbbr3_repo --stanza=kingbase
2021-03-01 14:43:21.203 P00 INFO: pushed WAL file '000000020000000000000010' to the archive
2021-03-01 14:43:21.204 P00 INFO: archive-push command end: completed successfully (276ms)
# 查看归档日志文件
[kingbase@node1 data]$ ls -lh /home/kingbase/kbbr3_repo/archive/kingbase/9.6-1/0000000200000000
total 208K
-rw-r----- 1 kingbase kingbase 91K Mar 1 12:20 00000002000000000000000E-583ac46b5270f365463cb0bfb3b96185af6492dd.gz
-rw-r----- 1 kingbase kingbase 303 Mar 1 12:20 00000002000000000000000F.00000028.backup
-rw-r----- 1 kingbase kingbase 83K Mar 1 12:20 00000002000000000000000F-1c450ca422ee6312e8a69dcb7d8c446a99425995.gz
-rw-r----- 1 kingbase kingbase 28K Mar 1 12:21 000000020000000000000010-044ab3927144a6510a42ce9c2bc331cf209aff56.gz

=从以上信息可知,手工归档成功,可以说明归档的配置文件及目录权限等没有问题。=
4、重启集群测试
[kingbase@node1 bin]$ ps -ef |grep kingbase
kingbase 8180 1 0 13:44 ? 00:00:00 /home/kingbase/cluster/kha/db/bin/kingbase -D /home/kingbase/cluster/kha/db/data
kingbase 8181 8180 0 13:44 ? 00:00:00 kingbase: logger process
kingbase 8183 8180 0 13:44 ? 00:00:00 kingbase: checkpointer process
kingbase 8184 8180 0 13:44 ? 00:00:00 kingbase: writer process
kingbase 8185 8180 0 13:44 ? 00:00:00 kingbase: wal writer process
kingbase 8186 8180 0 13:44 ? 00:00:00 kingbase: autovacuum launcher process
kingbase 8187 8180 0 13:44 ? 00:00:00 kingbase: archiver process failed on 000000020000000000000010
kingbase 8188 8180 0 13:44 ? 00:00:00 kingbase: stats collector process
kingbase 8189 8180 0 13:44 ? 00:00:00 kingbase: bgworker: syslogical supervisor
kingbase 8253 8180 0 13:44 ? 00:00:00 kingbase: wal sender process SYSTEM 192.168.7.243(51049) streaming 0/160001B0
=== 从以上信息可知,归档仍然失败。===
5、修改archive_command配置(跳过归档)

6、重启集群测试
[kingbase@node1 bin]$ ps -ef |grep kingbase
kingbase 21906 5688 0 14:09 pts/0 00:00:00 ./ksql -U SYSTEM -W ******** TEST
kingbase 23167 1 0 14:11 ? 00:00:00 /home/kingbase/cluster/kha/db/bin/kingbase -D /home/kingbase/cluster/kha/db/data
kingbase 23168 23167 0 14:11 ? 00:00:00 kingbase: logger process
kingbase 23170 23167 0 14:11 ? 00:00:00 kingbase: checkpointer process
kingbase 23171 23167 0 14:11 ? 00:00:00 kingbase: writer process
kingbase 23172 23167 0 14:11 ? 00:00:00 kingbase: wal writer process
kingbase 23173 23167 0 14:11 ? 00:00:00 kingbase: autovacuum launcher process
kingbase 23174 23167 0 14:11 ? 00:00:00 kingbase: archiver process
kingbase 23175 23167 0 14:11 ? 00:00:00 kingbase: stats collector process
kingbase 23176 23167 0 14:11 ? 00:00:00 kingbase: bgworker: syslogical supervisor
kingbase 23194 23167 0 14:11 ? 00:00:00 kingbase: wal sender process SYSTEM 192.168.7.243(54037) streaming 0/180000D0
=== 从以上信息可知,已经没有归档失败的状态信息。===
7、在主库手工执行wal日志切换
TEST=# select sys_switch_xlog();
SYS_SWITCH_XLOG
-----------------
0/180000E8
(1 row)
TEST=# select sys_switch_xlog();
SYS_SWITCH_XLOG
-----------------
0/19000078
(1 row)
TEST=# select sys_switch_xlog();
SYS_SWITCH_XLOG
-----------------
0/1A000000
(1 row)
8、再重新恢复archive_command配置

9、重启集群测试
[kingbase@node1 bin]$ ps -ef |grep kingbase
kingbase 25979 1 0 14:15 ? 00:00:00 /home/kingbase/cluster/kha/db/bin/kingbase -D /home/kingbase/cluster/kha/db/data
kingbase 25980 25979 0 14:15 ? 00:00:00 kingbase: logger process
kingbase 25983 25979 0 14:15 ? 00:00:00 kingbase: checkpointer process
kingbase 25984 25979 0 14:15 ? 00:00:00 kingbase: writer process
kingbase 25985 25979 0 14:15 ? 00:00:00 kingbase: wal writer process
kingbase 25986 25979 0 14:15 ? 00:00:00 kingbase: autovacuum launcher process
kingbase 25987 25979 0 14:15 ? 00:00:00 kingbase: archiver process
kingbase 25988 25979 0 14:15 ? 00:00:00 kingbase: stats collector process
kingbase 25989 25979 0 14:15 ? 00:00:00 kingbase: bgworker: syslogical supervisor
kingbase 26006 25979 0 14:15 ? 00:00:00 kingbase: wal sender process SYSTEM 192.168.7.243(54457) streaming 0/1B0000D0
=== 从以上信息可知,已经没有归档失败的状态信息。===
10、测试wal日志归档
1)执行wal日志切换(主库)
TEST=# select sys_switch_xlog();
SYS_SWITCH_XLOG
-----------------
0/1C000078
(1 row)
TEST=# select sys_switch_xlog();
SYS_SWITCH_XLOG
-----------------
0/1E000000
(1 row)
2)查看归档信息
[kingbase@node1 0000000200000000]$ pwd
/home/kingbase/kbbr3_repo/archive/kingbase/9.6-1/0000000200000000
[kingbase@node1 0000000200000000]$ ls -lh
total 460K
-rw-r----- 1 kingbase kingbase 91K Mar 1 12:20 00000002000000000000000E-583ac46b5270f365463cb0bfb3b96185af6492dd.gz
-rw-r----- 1 kingbase kingbase 303 Mar 1 12:20 00000002000000000000000F.00000028.backup
-rw-r----- 1 kingbase kingbase 83K Mar 1 12:20 00000002000000000000000F-1c450ca422ee6312e8a69dcb7d8c446a99425995.gz
-rw-r----- 1 kingbase kingbase 28K Mar 1 12:21 000000020000000000000010-044ab3927144a6510a42ce9c2bc331cf209aff56.gz
-rw-r----- 1 kingbase kingbase 83K Mar 1 14:22 00000002000000000000001B-f64eadf9b3ecb50ce6925cdc8c196bf33af4cc8c.gz
-rw-r----- 1 kingbase kingbase 84K Mar 1 14:22 00000002000000000000001C-300ed282be050b1194cc15b974b5b11b120d2076.gz
-rw-r----- 1 kingbase kingbase 84K Mar 1 14:23 00000002000000000000001D-249ab2fafe02a51ca06846098d4bad2f786d1422.gz
=从归档日志信息看,在wal日志发生切换时,产生了归档;但是前面在修改archive_command='/bin/true'参数后,导致一部分wal日志没有归档。==
11、查看数据库服务进程
[kingbase@node1 0000000200000000]$ ps -ef |grep kingbase
.......
kingbase 25979 1 0 14:15 ? 00:00:00 /home/kingbase/cluster/kha/db/bin/kingbase -D /home/kingbase/cluster/kha/db/data
kingbase 25980 25979 0 14:15 ? 00:00:00 kingbase: logger process
kingbase 25983 25979 0 14:15 ? 00:00:00 kingbase: checkpointer process
kingbase 25984 25979 0 14:15 ? 00:00:00 kingbase: writer process
kingbase 25985 25979 0 14:15 ? 00:00:00 kingbase: wal writer process
kingbase 25986 25979 0 14:15 ? 00:00:00 kingbase: autovacuum launcher process
kingbase 25987 25979 0 14:15 ? 00:00:00 kingbase: archiver process last was 00000002000000000000001D
kingbase 25988 25979 0 14:15 ? 00:00:00 kingbase: stats collector process
kingbase 25989 25979 0 14:15 ? 00:00:00 kingbase: bgworker: syslogical supervisor
kingbase 26006 25979 0 14:15 ? 00:00:00 kingbase: wal sender process SYSTEM 192.168.7.243(54457) streaming 0/1E000060
=== 从以上信息获知,数据库归档进程正常。===
三、总结
此案例解决了KingbaseES R3集群归档失败的故障,对于数据库归档失败在原生PostgreSQL也发现过此类问题,处理起来比较繁琐,主库归档恢复正常后,需要对数据库做一次物理的全备,因为中间缺失了一部分wal日志的归档。
KingbaseES R3 集群主库归档失败案例的更多相关文章
- KingbaseES R3 集群一键修改集群用户密码案例
案例说明: 在KingbaseES R3集群的最新版本中增加了kingbase_monitor.sh一键修改集群用户密码的功能,本案例是对此功能的测试. kingbaseES R3集群一键修改密码说明 ...
- kingbaseES R3 集群修改data路径测试案例
案例说明: 默认KingbaseES R3集群部署后,数据存储目录(data)在/home/kingbase下,部署时不能更改:本案例是在部署完成后,迁移data目录到其他指定的存储位置. 数据库版本 ...
- KingbaseES R3 集群cluster日志切割和清理案例
案例说明: 对于KingbaseES R3集群的cluster日志默认系统是不做切割和清理的,随着运行时长的增加,日志将增长为一个非常大的文件,占用比较大的磁盘空间,并且在分析问题读取大文件时效率很低 ...
- KingbaseES R3集群在线删除数据节点案例
案例说明: kingbaseES R3集群一主多从的架构,一般有两个节点是集群的管理节点,所有的节点都可以为数据节点:对于非管理节点的数据节点可以在线删除:但是对于管理节点,无法在线删除,如果删除管理 ...
- KingbaseES R3 集群删除test库导致主备无法切换问题
案例说明: 在KingbaseES R3集群中,kingbasecluster进程会通过test库访问,连接后台数据库服务测试:如果删除test数据库,导致后台数据库服务访问失败,在集群主备切换时,无 ...
- KingbaseES R3 集群修改system用户密码方案
方案说明: 对于kingbaseES R3集群修改system密码相比单机环境有一定的复杂性,需要修改的位置如下: 1)数据库中system用户密码,可以用alter user命令修改 2)在reco ...
- KingbaseES R3集群备库执行sys_backup.sh物理备份案例
案例说明: KingbaseES R3的后期版本支持通过sys_backup.sh执行sys_rman的物理备份,实际上是调用了sys_rman_v6的工具做物理备份.本案例是在备库上执行集群的备份, ...
- kingbaseES R3 集群备库转换为单实例库案例
案例说明: 在生产环境需要将集群中架构转换为单实例环境,本案例以备库转换为单实例库为案例,介绍了两种方案,一种在数据库数据量小的环境下采用 sys_dumpall 导出导入方式建立单实例库:另外一种是 ...
- KingbaseES V8R3集群管理和维护案例之---failover切换wal日志变化分析
案例说明: 本案例通过对KingbaseES V8R3集群failover切换过程进行观察,分析了主备库切换后wal日志的变化,对应用者了解KingbaseES V8R3(R6) failover ...
随机推荐
- ngRoute 配置路径不能跳转问题
1.原因:AngularJS 版本更新至1.6后对地址做了特别处理.如:<a hret="#/someurl"> 在浏览器中被解析为"#!%2Fsomeurl ...
- C#.NET笔试题-基础
1.C#中堆和栈的区别? 栈:由编译器自动分配.释放.在函数体中定义的变量通常在栈上. 堆:一般由程序员分配释放.用new.malloc等分配内存函数分配得到的就是在堆上. 存放在栈中时要管存储顺序, ...
- SpringBoot接口 - 如何优雅的对参数进行校验?
在以SpringBoot开发Restful接口时, 对于接口的查询参数后台也是要进行校验的,同时还需要给出校验的返回信息放到上文我们统一封装的结构中.那么如何优雅的进行参数的统一校验呢? @pdai ...
- 从工程师到技术leader思维升级
身处职场之中,太多话题相围绕,"个人成长"."管理"或许是讨论的最多的了. 但"个人成长"和"管理"却是大不相同的两件事 ...
- ajax04_实现关键字联想和自动补全
用ajax实现关键字联想和自动补全 遇到的小坑 回调函数相对window.onload的摆放位置 给回调函数addData传数据时,如何操作才能将数据传进去 代码实现 前端代码 <!DOCTYP ...
- Netty-如何写一个Http服务器
前言 动机 最近在学习Netty框架,发现Netty是支持Http协议的.加上以前看过Spring-MVC的源码,就想着二者能不能结合一下,整一个简易的web框架(PS:其实不是整,是抄) 效果 项目 ...
- 微信安装包从0.5M暴涨到260M,为什么我们的程序越来越大?
最近,微信安装包从v1.0的0.5M暴涨到V8.0的 260M引起大家热议,为什么我们开发的程序越来越大?本文做一个简单的讨论.(本文主要根据B站科技老男孩<逆向工程微信安装包,11年膨胀575 ...
- 秋季招聘季如何制作一款“秀色可餐”的简历?由ShareLatex和Python3打造
原文转载自「刘悦的技术博客」https://v3u.cn/a_id_161 秋招季还有两个月就到了,即所谓的"金九银十".疫情因素导致市场环境不太理想,所以我们更应该未焚而徙薪,未 ...
- Vue 样式绑定 && 条件渲染
1 <!DOCTYPE html> 2 <html> 3 <head> 4 <meta charset="UTF-8" /> 5 & ...
- 万答#9,MySQL 中有哪些常用的日志
欢迎来到 GreatSQL社区分享的MySQL技术文章,如有疑问或想学习的内容,可以在下方评论区留言,看到后会进行解答 GreatSQL社区原创内容未经授权不得随意使用,转载请联系小编并注明来源. 前 ...