KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(二)

案例二：测试‘recovery = automatic’

1、查看集群节点状态信息：

[kingbase@node1 bin]$ ./repmgr cluster show

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+---------+---------+-----------+----------+----------+----------+----------+---------------------------

 1  | node243 | primary | * running |          | default  | 100      | 3        | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node248 | standby |   running | node243  | default  | 100      | 3        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

2、配置recovery参数

[kingbase@node3 bin]$ cat ../etc/repmgr.conf |egrep -i 'recovery|failover'

failover='automatic'

recovery='automatic'

3、重启主库节点测试

[root@node3 ~]# reboot

4、查看备库hamgr日志

=如下所示，从日志中获知，主库节点宕机后，集群执行主备切换，并且在主库节点系统正常后，将原主库作为新备库自动加入到集群。=

[2022-03-01 14:38:09] [NOTICE] starting monitoring of node "node248" (ID: 2)

[2022-03-01 14:38:09] [INFO] "connection_check_type" set to "ping"

[2022-03-01 14:38:10] [INFO] monitoring connection to upstream node "node243" (ID: 1)

[2022-03-01 14:38:10] [NOTICE] try to change wal catched_up state to 1

[2022-03-01 14:38:10] [INFO] primary flush lsn is 0/17000578, local flush lsn is 0/170004C0

[2022-03-01 14:38:10] [NOTICE] try to change streaming_sync state to TRUE

[2022-03-01 14:43:11] [INFO] node "node248" (ID: 2) monitoring upstream node "node243" (ID: 1) in normal state

[2022-03-01 14:46:42] [WARNING] unable to ping "host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"

[2022-03-01 14:46:42] [DETAIL] PQping() returned "PQPING_REJECT"

[2022-03-01 14:46:42] [WARNING] unable to connect to upstream node "node243" (ID: 1)

[2022-03-01 14:46:42] [INFO] sleeping 6 seconds until next reconnection attempt

[2022-03-01 14:46:48] [INFO] checking state of node 1, 1 of 10 attempts

[2022-03-01 14:46:58] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=192.168.7.243 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"

[2022-03-01 14:46:58] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"

[2022-03-01 14:46:58] [INFO] sleeping 6 seconds until next reconnection attempt

......

[2022-03-01 14:48:59] [INFO] checking state of node 1, 10 of 10 attempts

[2022-03-01 14:48:59] [WARNING] unable to ping "user=esrep connect_timeout=10 dbname=esrep host=192.168.7.243 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"

[2022-03-01 14:48:59] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"

[2022-03-01 14:48:59] [WARNING] unable to reconnect to node 1 after 10 attempts

[2022-03-01 14:48:59] [NOTICE] setting "wal_retrieve_retry_interval" to 86405000 milliseconds

[2022-03-01 14:49:00] [WARNING] wal receiver not running

[2022-03-01 14:49:00] [NOTICE] WAL receiver disconnected on all sibling nodes

[2022-03-01 14:49:00] [INFO] WAL receiver disconnected on all 0 sibling nodes

[2022-03-01 14:49:00] [INFO] 0 active sibling nodes registered

[2022-03-01 14:49:00] [INFO] primary and this node have the same location ("default")

[2022-03-01 14:49:00] [INFO] no other sibling nodes - we win by default

[2022-03-01 14:49:00] [NOTICE] setting "wal_retrieve_retry_interval" to 5000 ms

[2022-03-01 14:49:00] [NOTICE] this node is the only available candidate and will now promote itself

[2022-03-01 14:49:00] [INFO] try to ping the trusted_servers "192.168.7.1" before execute promote_command

[2022-03-01 14:49:02] [NOTICE] PING 192.168.7.1 (192.168.7.1) 56(84) bytes of data.

--- 192.168.7.1 ping statistics ---

2 packets transmitted, 2 received, 0% packet loss, time 1002ms

rtt min/avg/max/mdev = 2.345/22.599/42.853/20.254 ms

[2022-03-01 14:49:02] [NOTICE] successfully ping one or more of the trusted_servers "192.168.7.1"

[2022-03-01 14:49:04] [NOTICE] PING 192.168.7.241 (192.168.7.241) 56(84) bytes of data.

--- 192.168.7.241 ping statistics ---

3 packets transmitted, 0 received, 100% packet loss, time 1999ms

[2022-03-01 14:49:04] [WARNING] ping host"192.168.7.241" failed

[2022-03-01 14:49:04] [DETAIL] average RTT value is not greater than zero

[2022-03-01 14:49:04] [INFO] loadvip result: 1, arping result: 1

[2022-03-01 14:49:04] [NOTICE] new primary node (ID: 2) acquire the virtual ip 192.168.7.241/24 success

[2022-03-01 14:49:04] [INFO] promote_command is:

  "/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/repmgr  standby promote -f /home/kingbase/cluster/R6C5/R6C5R/kingbase/etc/repmgr.conf"

NOTICE: promoting standby to primary

DETAIL: promoting server "node248" (ID: 2) using sys_promote()

NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete

INFO: SET synchronous TO "async" on primary host

[2022-03-01 14:49:07] [NOTICE] try to stop old primary db (host: "192.168.7.243")

NOTICE: STANDBY PROMOTE successful

DETAIL: server "node248" (ID: 2) was successfully promoted to primary

[2022-03-01 14:49:11] [INFO] switching to primary monitoring mode

[2022-03-01 14:49:11] [NOTICE] monitoring cluster primary "node248" (ID: 2)

[2022-03-01 14:49:11] [INFO] create a thread 0x7f1b4b125700 to check the cluster status

[2022-03-01 14:49:11] [INFO] child node: 1; attached: no

[2022-03-01 14:49:11] [INFO] check node status again, try 1 / 10 times

[2022-03-01 14:49:12] [INFO] node (ID: 1): no server running

.......

[2022-03-01 14:49:29] [INFO] check node status again, try 10 / 10 times

[2022-03-01 14:49:31] [INFO] child node: 1; attached: no

[2022-03-01 14:49:31] [INFO] found node down, recovery will be triggered after recovery delay time 20s

[2022-03-01 14:49:33] [INFO] child node: 1; attached: no

......

[2022-03-01 14:49:52] [INFO] child node: 1; attached: no

[2022-03-01 14:49:52] [INFO] recovery delay time reached. can do recovery now.

[2022-03-01 14:49:52] [INFO] [thread pid:11778] do_nodes_recovery thread begin. The pthread_t tid is 0x7f1b4b125700

[2022-03-01 14:49:52] [NOTICE] [thread pid:11778] node (ID: 1; host: "192.168.7.243") is not attached, ready to auto-recovery

[2022-03-01 14:49:52] [NOTICE] [thread pid:11778] Now, the primary host ip: 192.168.7.248

[2022-03-01 14:49:52] [INFO] [thread pid:11778] ES connection to host "192.168.7.243" succeeded, ready to do auto-recovery

[2022-03-01 14:49:53] [INFO] unlink file /tmp/.s.KINGBASE.54321.lock

[2022-03-01 14:49:53] [NOTICE] executing repmgr command "/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/repmgr --dbname="host=192.168.7.248 dbname=esrep user=esrep port=54321" node rejoin --force-rewind"

NOTICE: sys_rewind execution required for this node to attach to rejoin target node 2

DETAIL: rejoin target server's timeline 8 forked off current database system timeline 7 before current recovery point 0/18000028

NOTICE: executing sys_rewind

DETAIL: sys_rewind command is "/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/sys_rewind -D '/home/kingbase/cluster/R6C5/R6C5R/kingbase/data' --source-server='host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3'"

sys_rewind: servers diverged at WAL location 0/17000680 on timeline 7

sys_rewind: rewinding from last common checkpoint at 0/160007C8 on timeline 7

sys_rewind: find last common checkpoint start time from 2022-03-01 14:49:53.170681 CST to 2022-03-01 14:49:53.296332 CST, in "0.125651" seconds.

sys_rewind: update the control file: minRecoveryPoint is '0/1700DE58', minRecoveryPointTLI is '8', and database state is 'in archive recovery'

sys_rewind: we will remove the dir '/home/kingbase/cluster/R6C5/R6C5R/kingbase/data/sys_replslot/repmgr_slot_2.rewind' and all the file/dir in it.

sys_rewind: rewind start wal location 0/16000798 (file 000000070000000000000016), end wal location 0/1700DE58 (file 000000080000000000000017). time from 2022-03-01 14:49:53.170681 CST to 2022-03-01 14:50:06.920859 CST, in "13.750178" seconds.

sys_rewind: Done!

NOTICE: 0 files copied to /home/kingbase/cluster/R6C5/R6C5R/kingbase/data

NOTICE: setting node 1's upstream to node 2

WARNING: unable to ping "host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3"

DETAIL: PQping() returned "PQPING_NO_RESPONSE"

NOTICE: begin to start server at 2022-03-01 14:50:07.530887

NOTICE: starting server using "/home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/sys_ctl  -w -t 90 -D '/home/kingbase/cluster/R6C5/R6C5R/kingbase/data' -l /home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/logfile start"

NOTICE: start server finish at 2022-03-01 14:50:08.952996

NOTICE: NODE REJOIN successful

DETAIL: node 1 is now attached to node 2

[2022-03-01 14:50:09] [NOTICE] kbha: node (ID: 1) rejoin success.

[2022-03-01 14:50:10] [NOTICE] [thread pid:11778] node "node243" (ID: 1) auto-recovery success

[2022-03-01 14:50:10] [INFO] [thread pid:11778] do_nodes_recovery thread ends. The pthread_t tid is 0x7f1b4b125700

[2022-03-01 14:50:10] [INFO] SET synchronous TO "sync" on primary host

[2022-03-01 14:50:10] [INFO] thread tid:0x7f1b4b125700 is not running

[2022-03-01 14:50:10] [INFO] the recovery thread was exited, reset tid

[2022-03-01 14:50:10] [NOTICE] Some nodes reconnect, all standby nodes are OK now

[2022-03-01 14:50:12] [NOTICE] new standby "node243" (ID: 1) has connected

5、查看备库数据库进程和集群状态信息

[kingbase@node3 bin]$ ps -ef |grep kingbase

kingbase  2654     1  0 14:49 ?        00:00:00 /home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/kbha -A daemon -f /home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/../etc/repmgr.conf

kingbase  3462     1  0 14:50 ?        00:00:00 /home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/kingbase -D /home/kingbase/cluster/R6C5/R6C5R/kingbase/data

kingbase  3463  3462  0 14:50 ?        00:00:00 kingbase: logger

kingbase  3464  3462  0 14:50 ?        00:00:00 kingbase: startup   recovering 000000080000000000000017

kingbase  3465  3462  0 14:50 ?        00:00:00 kingbase: checkpointer

kingbase  3466  3462  0 14:50 ?        00:00:00 kingbase: background writer

kingbase  3467  3462  0 14:50 ?        00:00:00 kingbase: stats collector

kingbase  3468  3462  0 14:50 ?        00:00:00 kingbase: walreceiver   streaming 0/1700F160

kingbase  3471  3462  0 14:50 ?        00:00:00 kingbase: esrep esrep 192.168.7.243(57348) idle

kingbase  3522     1  0 14:50 ?        00:00:00 /home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/repmgrd -d -v -f /home/kingbase/cluster/R6C5/R6C5R/kingbase/bin/../etc/repmgr.conf

kingbase  3523  3462  0 14:50 ?        00:00:00 kingbase: esrep esrep 192.168.7.243(57351) idle

[kingbase@node3 bin]$ ./repmgr cluster show

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string

----+---------+---------+-----------+----------+----------+----------+----------+-------------------------- 1  | node243 | standby |   running | node248  | default  | 100      | 7        | host=192.168.7.243 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

 2  | node248 | primary | * running |          | default  | 100      | 8        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

=== 从以上信息获知，原主库节点在系统恢复到正常后，集群将其作为新备库自动加入到集群。====

=未完待续=

KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(二)的更多相关文章

KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(一)
KingbaseES R6集群repmgr.conf参数'recovery'测试案例(一) 案例说明: 在KingbaseES R6集群中,主库节点出现宕机(如重启或关机),会产生主备切换,但是当主库 ...
KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(三)
案例三:测试'recovery = manual' 1.查看集群节点状态信息: [kingbase@node1 bin]$ ./repmgr cluster show ID | Name | Role ...
KingbaseES R6 集群备库网卡down测试案例
数据库版本: test=# select version(); version ------------------------------------------------------------ ...
KingbaseES R6 集群修改物理IP和VIP案例
在用户的实际环境里,可能有时需要修改主机的IP,这就涉及到集群的配置修改.以下以例子的方式,介绍下KingbaseES R6集群如何修改IP. 一.案例测试环境操作系统: [KINGBASE@nod ...
KingbaseES R6 集群repmgr witness 手工配置案例
使用见证服务器: 见证服务器是一个正常的KingbaseES实例,不是流复制群集的一部分; 其目的是,如果发生故障转移情况,则提供证明它是主服务器本身不可用的证据,而不是例如在不同物理位置之间的网络分 ...
KingbaseES R6 集群 recovery 参数对切换的影响
案例说明:在KingbaseES R6集群中,主库节点出现宕机(如重启或关机),会产生主备切换,但是当主库节点系统恢复正常后,如何对原主库节点进行处理,保证集群数据的一致性和安全,可以通过对repmg ...
KingbaseES R6 集群一键修改集群和数据库参数测试案例
案例说明: 集群环境修改集群或数据库参数,需要在每个node上都要修改,在每个节点而执行修改操作,容易出现漏改或节点上参数不一致等错误:在KingbaseES V8R6的集群中增加了,一键修改参数 ...
KingbaseES R6 集群修改data目录
案例说明: 本案例是在部署完成KingbaseES R6集群后,由于业务的需求,集群需要修改data(数据存储)目录的测试.本案例分两种修改方式,第一种是离线修改data目录,即关闭整个集群后,修改数 ...
KingbaseES R6 集群启动‘incorrect command permissions for the virtual ip’故障案例
案例说明: KingbaseES R6集群启动时,出现"incorrect command permissions for the virtual ip"故障,本案例介绍了如何分析 ...

随机推荐

一文详解JackSon配置信息
背景 1.1 问题 Spring Boot 在处理对象的序列化和反序列化时,默认使用框架自带的JackSon配置.使用框架默认的,通常会面临如下问题: Date返回日期格式(建议不使用Date,但老项 ...
在.NET 6.0上使用Kestrel配置和自定义HTTPS
大家好,我是张飞洪,感谢您的阅读,我会不定期和你分享学习心得,希望我的文章能成为你成长路上的垫脚石,让我们一起精进. 本章是<定制ASP NET 6.0框架系列文章>的第四篇.在本章,我们 ...
.NET程序配置文件操作（ini，cfg，config）
在程序开发过程中,我们一般会用到配置文件来设定一些参数.常见的配置文件格式为 ini, xml, config等. INI .ini文件,通常为初始化文件,是用来存储程序配置信息的文本文件. [Log ...
10.2 如何运行Android项目到Android Studio自带模拟器
Android开发一般都可以将应用运行到模拟器查看效果,除非特殊项目要用到真机,所以我们这里先讲解如何将项目运行到模拟器,以校验我们的开发环境以及创建的项目是否有问题. 创建模拟器点击"C ...
【Azure Developer】记录一次使用Java Azure Key Vault Secret示例代码生成的Jar包，单独运行出现 no main manifest attribute, in target/demo-1.0-SNAPSHOT.jar 错误消息
问题描述创建一个Java Console程序,用于使用Azure Key Vault Secret.在VS Code中能正常Debug,但是通过mvn clean package打包为jar文件后, ...
2019 CSP-S 初赛解析
因为我不会设置用博客园显示Markdown语法,所以在洛谷也写了一份:传送门一起讨论的这套卷.题干然后还有一些可以借鉴一下的解析选择: T1. 注意运算顺序: a%3=1 --> (int ...
JavaWEB-01-MySQL基础
JavaWeb内容数据库 – 数据存储 MySQL JDBC Maven - 项目管理工具 Mybatis 前端 - 为了前端哥们沟通 HTML+CSS JavaScript Ajax + Vue ...
树莓派Raspiberry 编译Linux实时内核PREEMPT-RT 实战
树莓派4B 实时内核(Preempt_RT)的配置和编译https://blog.csdn.net/zlp_zky/article/details/114994444 基本按照这个blog来操作. 几 ...
linux 邮件
通过命令行发送邮件 [root@ProxyServer ~]# mail -s "test" justin@51cto.com dfdafd dfadf fadf EOT [roo ...
win10系统下把玩折腾DockerToolBox以及更换国内镜像源(各种神坑)
原文转载自「刘悦的技术博客」https://v3u.cn/a_id_149 2020年,这年头如果出去面试和面试官不聊几句Docker,都不好意思说自己是搞开发的.之前玩儿Docker都是在Mac系统 ...

KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(二)

KingbaseES R6 集群repmgr.conf参数'recovery'测试案例(二)的更多相关文章

随机推荐

热门专题