案例说明:

在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作。若主节点已经失效,希望将异地备机提升为主节点。

$bin/repmgr standby promote

适用版本:

KingbaseES V8R6

集群节点信息:

 ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node101 | standby | running | node102 | running | 19312 | no | 1 second(s) ago
2 | node102 | primary | * running | | running | 20658 | no | n/a

主备流复制状态信息:

test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | b
ackend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority |
sync_state | reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--
20165 | 10 | system | node101 | 192.168.1.101 | | 10747 | 2022-09-08 11:12:59.798843+08 |
| streaming | 4/4C0018A0 | 4/4C0018A0 | 4/4C0018A0 | 4/4C0018A0 | | | | 1 |
sync | 2022-09-08 11:16:45.423742+08
(1 row)

关闭failover自动切换:

[kingbase@node102 bin]$ cat ../etc/repmgr.conf|grep failover
#failover='automatic'
failover='manual'

一、模拟主库数据库服务宕机

[kingbase@node102 bin]$ ./sys_ctl stop -D /data/kingbase/r6ha/data
waiting for server to shut down....... done

二、查看备库状态

1、数据库进程状态(仍为备库进程)

kingbase 19132     1  0 11:13 ?        00:00:00 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kingbase -D /data/kingbase/r6ha/data
kingbase 19133 19132 0 11:13 ? 00:00:00 kingbase: logger
kingbase 19134 19132 0 11:13 ? 00:00:00 kingbase: startup recovering 00000018000000040000004D
kingbase 19135 19132 0 11:13 ? 00:00:00 kingbase: checkpointer
kingbase 19136 19132 0 11:13 ? 00:00:00 kingbase: background writer
kingbase 19137 19132 0 11:13 ? 00:00:00 kingbase: stats collector
kingbase 19310 19132 0 11:13 ? 00:00:00 kingbase: system esrep 192.168.1.101(15211) idle

2、查看备库hamgr.log(获取切换过程)

# 备库repmgrd进程监控主库数据库服务状态(PQping()),当主库返回"PQPING_NO_RESPONSE"后,尝试
再次连接主库,超过阈值后,执行切换。 [2022-09-08 11:18:10] [INFO] sleeping 6 seconds until next reconnection attempt
[2022-09-08 11:18:16] [INFO] checking state of node 2, 1 of 10 attempts
[2022-09-08 11:18:16] [DEBUG] is_server_available_params(): ping status for "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr" is PQPING_NO_RESPONSE
[2022-09-08 11:18:16] [WARNING] unable to ping "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
[2022-09-08 11:18:16] [DETAIL] PQping() returned "PQPING_NO_RESPONSE"
[2022-09-08 11:18:16] [INFO] sleeping 6 seconds until next reconnection attempt
........ [2022-09-08 11:19:10] [INFO] checking state of node 2, 10 of 10 attempts
[2022-09-08 11:19:10] [DEBUG] is_server_available_params(): ping status for "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr" is PQPING_NO_RESPONSE
....... # 准备执行切换,但是在repmgr.conf中配置failover=‘manual’,将不会执行自动切换。 [2022-09-08 11:19:10] [DEBUG] do_election(): electoral term is 1
[2022-09-08 11:19:10] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate, and will not follow the new primary
[2022-09-08 11:19:10] [DETAIL] "failover" is set to "manual" in repmgr.conf
[2022-09-08 11:19:10] [HINT] manually execute "repmgr standby follow" to have this node follow the new primary
[2022-09-08 11:19:10] [DEBUG] election result: NOT CANDIDATE

三、在备库执行promote手工切换

1、执行手工切换

[kingbase@node101 bin]$ ./repmgr standby promote
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
NOTICE: promoting standby to primary
DETAIL: promoting server "node101" (ID: 1) using sys_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
DEBUG: setting node 1 as primary and marking existing primary as failed
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node101" (ID: 1) was successfully promoted to primary

2、查看切换后数据库进程状态(切换为主库状态)

[kingbase@node101 bin]$ ps -ef |grep kingbase

kingbase 19132     1  0 11:13 ?        00:00:00 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kingbase -D /data/kingbase/r6ha/data
kingbase 19133 19132 0 11:13 ? 00:00:00 kingbase: logger
kingbase 19135 19132 0 11:13 ? 00:00:00 kingbase: checkpointer
kingbase 19136 19132 0 11:13 ? 00:00:00 kingbase: background writer
kingbase 19137 19132 0 11:13 ? 00:00:00 kingbase: stats collector
kingbase 19780 19132 0 11:13 ? 00:00:00 kingbase: system test ::1(33354) idle
kingbase 19784 19132 0 11:13 ? 00:00:00 kingbase: system esrep 192.168.1.101(15243) idle
kingbase 20826 19132 0 11:20 ? 00:00:00 kingbase: walwriter
kingbase 20827 19132 0 11:20 ? 00:00:00 kingbase: autovacuum launcher
kingbase 20828 19132 1 11:20 ? 00:00:00 kingbase: archiver archiving 0000001500000002000000FC

四、将原主库恢复为新备库加入集群

1、创建备库标识文件

[kingbase@node102 log]$ touch /data/kingbase/r6ha/data/standby.signal

2、启动数据库服务

[kingbase@node102 bin]$ ./sys_ctl start -D /data/kingbase/r6ha/data

3、注册备库节点

[kingbase@node102 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node102" (ID: 2)
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
INFO: connecting to primary database
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
DEBUG: connecting to: "user=system connect_timeout=10 dbname=esrep host=192.168.1.101 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr"
DEBUG: remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A updateinfo
INFO: standby registration complete
NOTICE: standby node "node102" (ID: 2) successfully registered

4、查看集群节点状态

[kingbase@node102 bin]$ ./repmgr cluster show

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 25 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 24 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

五、总结

对于KingbaseES V8R6集群在failover切换时,如果备库不能自动切换为主库,或主库宕机后切换失败,都可以使用‘repmgr standby promote’强制手工切换备库为主库,恢复业务访问。

KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例的更多相关文章

  1. KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析

    ​ 案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...

  2. KingbaseES V8R3集群运维案例之---用户自定义表空间管理

    ​案例说明: KingbaseES 数据库支持用户自定义表空间的创建,并建议表空间的文件存储路径配置到数据库的data目录之外.本案例复现了,当用户自定义表空间存储路径配置到data下时,出现的故障问 ...

  3. KingbaseES V8R6集群管理运维案例之---repmgr standby switchover故障

    案例说明: 在KingbaseES V8R6集群备库执行"repmgr standby switchover"时,切换失败,并且在执行过程中,伴随着"repmr stan ...

  4. KingbaseES V8R6集群维护案例之---停用集群node_export进程

    案例说明: 在KingbaseES V8R6集群启动时,会启动node_exporter进程,此进程主要用于向kmonitor监控服务输出节点状态信息.在系统安全漏洞扫描中,提示出现以下安全漏洞: 对 ...

  5. kingbaseES V8R6集群备份恢复案例之---备库作为repo主机执行物理备份

    ​ 案例说明: 此案例是在KingbaseES V8R6集群环境下,当主库磁盘空间不足时,执行sys_rman备份,将集群的备库节点作为repo主机,执行备份,并将备份存储在备库的磁盘空间. 集群架构 ...

  6. KingbaseES V8R6集群维护之--修改数据库服务端口案例

    ​ 案例说明: 对于KingbaseES数据库单实例环境,只需要修改kingbase.conf文件的'port'参数即可,但是对于KingbaseES V8R6集群中涉及到多个配置文件的修改,并且在应 ...

  7. KingbaseES V8R6集群外部备份案例

    案例说明: 本案例采用sys_backup.sh执行物理备份,备份使用如下逻辑架构:集群采用CentOS 7系统,repo采用kylin V10 Server. 一主一备+外部备份 此场景为主备双机常 ...

  8. PB 级大规模 Elasticsearch 集群运维与调优实践

    PB 级大规模 Elasticsearch 集群运维与调优实践 https://mp.weixin.qq.com/s/PDyHT9IuRij20JBgbPTjFA | 导语 腾讯云 Elasticse ...

  9. 集群运维ansible

    ssh免密登录 集群运维 生成秘钥,一路enter cd ~/.ssh/ ssh-keygen -t rsa 讲id_rsa.pub文件追加到授权的key文件中 cat ~/.ssh/id_rsa.p ...

  10. 阿里巴巴大规模神龙裸金属 Kubernetes 集群运维实践

    作者 | 姚捷(喽哥)阿里云容器平台集群管理高级技术专家 本文节选自<不一样的 双11 技术:阿里巴巴经济体云原生实践>一书,点击即可完成下载. 导读:值得阿里巴巴技术人骄傲的是 2019 ...

随机推荐

  1. C-02\规范及随机数rand()原理

    小知识点 assert()函数 assert 断言函数,用于在调试过程中捕捉程序的错误.对某种假设条件进行检测,如果条件成立就不进行任何操作,如果条件不成立就捕捉到这种错误,并打印出错误信息,终止程序 ...

  2. P3_注册小程序账号&安装开发者工具

    注册小程序账号 点击注册按钮 使用浏览器打开 https://mp.weixin.qq.com/ 网址,点击右上角的"立即注册"即可进入到小程序开发账号的注册流程,主要流程截图如下 ...

  3. 《深入理解java虚拟机》第七章读书笔记——虚拟机类加载机制

    系列文章目录和关于我 一丶虚拟机类加载机制是什么 java虚拟机将描述类的数据从class文件加载到内存,并对数据进行校验,转换解析和初始化,最终形成可用被虚拟机直接使用的java类型. 二丶类加载时 ...

  4. 交叉熵损失CrossEntropyLoss

    在各种深度学习框架中,我们最常用的损失函数就是交叉熵,熵是用来描述一个系统的混乱程度,通过交叉熵我们就能够确定预测数据与真实数据的相近程度.交叉熵越小,表示数据越接近真实样本. 1 分类任务的损失计算 ...

  5. .Net6 Html.Action无法使用(ViewComponents)

    接触了 net core的小伙伴们 已经发现 @html.Action()方法 官方已经不提供支持了,转而使用 ViewComponents替代了,同时也增加了TagHelper. 1.如果想用以前的 ...

  6. URL带参数json传递进行解析

    注意参数格式是要加密的: <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> ...

  7. VueTSX 动态使用 element-plus 图标

    写 TSX 的目的 element-plus 图标集有很多,但有时需要动态使用某个图标,把所有可能用到的图标都列举出来,通过 v-if 在组件中决定到底渲染哪一个,很费时. .vue 单文件组件中做不 ...

  8. cximage菜单(Load Jpeg Resource)

    // 菜单项 cximage->resource->Load Jpeg Resource //CxImage\demo\demo.cpp ON_COMMAND(ID_CXIMAGE_LOA ...

  9. Day 24 24.2:逆向分析2 - 完美世界案例

    完美世界逆向分析 url:https://passport.wanmei.com/login?location=L3NhZmUv 定位到正确的断点位置 进行js改写操作 断点代码的关键字:setPub ...

  10. kvm 透传显卡至win10虚拟机

    环境 已安装nvidia 显卡 驱动 操作系统:CentOS Linux release 7.9.2009 (Core) 内核版本:Linux 5.4.135-1.el7.elrepo.x86_64 ...