KingbaseES V8R6 集群运维案例之 -- VIP配置错误导致集群切换失败
案例说明:
KingbaseES V8R6集群的vip在repmgr.conf中配置,本案例测试了手工卸载和加载vip的操作,对failover切换时vip的卸载和加载的影响。
适用版本:
KingbaseES V8R6
一、集群节点状态
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 51 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 50 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
二、集群vip配置
1、查看主机vip加载配置
[kingbase@node101 bin]$ ip add sh
.......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever
---如上所示,主库主机加载vip:192.168.1.254/24
2、查看集群vip配置
[kingbase@node101 bin]$ cat ../etc/repmgr.conf|grep -i vir
virtual_ip='192.168.1.254/24'
三、手工卸载vip测试
1、卸载主库vip
# 如下所示,在卸载vip时需要指定ip掩码
[root@node101 cron.d]# ip add delete 192.168.1.254/24 dev enp0s3
[root@node101 cron.d]# ip add sh
.......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
2、查看集群节点状态
Tips:
如下所示, 主库vip卸载不影响集群状态,集群状态正常。
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 51 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 50 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
3、vip自动加载
如下所示,当集群探测到主库vip缺失时,会自动加载vip。
1)查看主机vip
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever
---如上所示,在vip被手工卸载后,又被集群自动加载。
2)查看集群日志
如下所示,通过ping vip发现vip丢失时,集群会尝试自动加载vip。
[2023-03-09 17:47:05] [NOTICE] found primary node lost virtual_ip, try to acquire virtual_ip
[2023-03-09 17:47:07] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
--- 192.168.1.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1005ms
[2023-03-09 17:47:07] [WARNING] ping host"192.168.1.254" failed
[2023-03-09 17:47:07] [DETAIL] average RTT value is not greater than zero
[2023-03-09 17:47:07] [DEBUG] executing:
/home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A loadvip
[2023-03-09 17:47:07] [DEBUG] result of command was 0 (0)
[2023-03-09 17:47:07] [DEBUG] local_command(): no output returned
[2023-03-09 17:47:07] [DEBUG] executing:
/home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A arping
[2023-03-09 17:47:07] [DEBUG] result of command was 0 (0)
[2023-03-09 17:47:07] [DEBUG] local_command(): no output returned
[2023-03-09 17:47:07] [INFO] loadvip result: 1, arping result: 1
[2023-03-09 17:47:07] [NOTICE] acquire the virtual ip 192.168.1.254/24 success on localhost
四、手工加载vip测试(子网掩码变化)
1、加载不同子网掩码的vip
[root@node101 cron.d]# ip add delete 192.168.1.254/24 dev enp0s3
[root@node101 cron.d]# ip add add 192.168.1.254/32 dev enp0s3:3
[root@node101 cron.d]# ip add sh
......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/32 scope global enp0s3
valid_lft forever preferred_lft forever
---如上所示,vip被手工卸载并加载不同子网掩码的vip(192.168.1.254/32)。
2、执行failover切换测试
1) 关闭主库数据库服务
[kingbase@node101 bin]$ ./sys_ctl stop -D /data/kingbase/r6ha/data/
waiting for server to shut down.... done
server stopped
2) 查看主库ip配置
如下所示,主库vip未被卸载。
[root@node101 cron.d]# ip add sh
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/32 scope global enp0s3
3) 查看备库hamgr.log
[2023-03-09 17:52:28] [INFO] try to ping the trusted_servers "192.168.1.1" before execute promote_command
[2023-03-09 17:52:30] [NOTICE] PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.231/0.287/0.343/0.056 ms
[2023-03-09 17:52:30] [NOTICE] successfully ping one or more of the trusted_servers "192.168.1.1"
[2023-03-09 17:52:30] [DEBUG] test_ssh_connection(): executing ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /bin/true 2>/dev/null
[2023-03-09 17:52:30] [NOTICE] try to stop old primary db (host: "192.168.1.101")
[2023-03-09 17:52:30] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A stopdb
[2023-03-09 17:52:30] [DEBUG] remote_command(): no output returned
[2023-03-09 17:52:32] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
--- 192.168.1.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.472/0.505/0.538/0.033 ms
[2023-03-09 17:52:32] [WARNING] the virtual ip is already on other host, try to release it on old primary node (host: "192.168.1.101")
[2023-03-09 17:52:32] [DEBUG] test_ssh_connection(): executing ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /bin/true 2>/dev/null
[2023-03-09 17:52:32] [INFO] ES connection to host "192.168.1.101" succeeded, ready to release vip on it
[2023-03-09 17:52:32] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A check_ip --ip 192.168.1.254
[2023-03-09 17:52:32] [DEBUG] remote_command(): output returned was:
1
[2023-03-09 17:52:32] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A unloadvip
RTNETLINK answers: Cannot assign requested address
[2023-03-09 17:52:32] [DEBUG] remote_command(): no output returned
[2023-03-09 17:52:32] [WARNING] old primary node (host: "192.168.1.101") release the virtual ip 192.168.1.254/24 failed
[2023-03-09 17:52:32] [NOTICE] the time from the first failure to acquire VIP is 2 seconds (max 60 seconds), try agian
[2023-03-09 17:52:32] [NOTICE] will acquire the virtual ip again
[2023-03-09 17:52:34] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
如下图所示,failover切换时,备库远程连接主库后,执行vip的卸载,备库从repmgr.conf中读取的vip地址为:192.168.1.254/24,而主库此时加载的vip地址是:192.168.1.254/32,vip地址不匹配,因此无法卸载vip地址,导致切换失败。

五、总结
1、如果在主库上vip被手工卸载,集群不会发生切换,集群会自动判断并加载vip地址到主库。
2、如果主库上配置了和repmgr.conf中不一致的vip地址,在集群切换时,将无法执行vip地址的卸载,会导致集群切换失败。
```****
KingbaseES V8R6 集群运维案例之 -- VIP配置错误导致集群切换失败的更多相关文章
- KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例
案例说明: 在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作.若主节点已经失效,希望将异地备机提升为主节点. $bin/repmgr s ...
- KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析
案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...
- KingbaseES V8R3集群运维案例之---用户自定义表空间管理
案例说明: KingbaseES 数据库支持用户自定义表空间的创建,并建议表空间的文件存储路径配置到数据库的data目录之外.本案例复现了,当用户自定义表空间存储路径配置到data下时,出现的故障问 ...
- KingbaseES V8R3集群运维案例之---kingbase_monitor.sh启动”two master“案例
案例说明: KingbaseES V8R3集群,执行kingbase_monitor.sh启动集群,出现"two master"节点的故障,启动集群失败:通过手工sys_ctl启动 ...
- KingbaseES V8R3集群运维案例之---cluster.log ERROR: md5 authentication failed
案例说明: 在KingbaseES V8R3集群的cluster.log日志中,经常会出现"ERROR: md5 authentication failed:DETAIL: password ...
- PB 级大规模 Elasticsearch 集群运维与调优实践
PB 级大规模 Elasticsearch 集群运维与调优实践 https://mp.weixin.qq.com/s/PDyHT9IuRij20JBgbPTjFA | 导语 腾讯云 Elasticse ...
- 优化系统资源ulimit《高性能Linux服务器构建实战:运维监控、性能调优与集群应用》
优化系统资源ulimit<高性能Linux服务器构建实战:运维监控.性能调优与集群应用> 假设有这样一种情况,一台Linux 主机上同时登录了10个用户,在没有限制系统资源的情况下,这10 ...
- 优化Linux内核参数/etc/sysctl.conf sysctl 《高性能Linux服务器构建实战:运维监控、性能调优与集群应用》
优化Linux内核参数/etc/sysctl.conf sysctl <高性能Linux服务器构建实战:运维监控.性能调优与集群应用> http://book.51cto.com/ar ...
- 集群运维ansible
ssh免密登录 集群运维 生成秘钥,一路enter cd ~/.ssh/ ssh-keygen -t rsa 讲id_rsa.pub文件追加到授权的key文件中 cat ~/.ssh/id_rsa.p ...
- 阿里巴巴大规模神龙裸金属 Kubernetes 集群运维实践
作者 | 姚捷(喽哥)阿里云容器平台集群管理高级技术专家 本文节选自<不一样的 双11 技术:阿里巴巴经济体云原生实践>一书,点击即可完成下载. 导读:值得阿里巴巴技术人骄傲的是 2019 ...
随机推荐
- 【Unity3D】半球卷屏特效
1 原理 凸镜贴图 和 渐变凸镜贴图 中介绍了使用 OpenGL 实现凸镜贴图及其原理,通过顶点坐标映射到纹理坐标,并构造三角形网格,构建了真正的三维凸镜模型.本文通过 Shader 实现半球卷屏 ...
- 【Unity3D】粒子系统ParticleSystem
1 简介 拖尾(TrailRenderer).线段渲染器(LineRenderer).粒子系统(ParticleSystem)是 Unity3D 提供的三大特效,其中粒子系统的功能最为强大,特效也 ...
- 【Unity3D】2D动画
1 图片处理 通过 PS 软件将以下 gif 文件中的黑色背景删除,并将其中的 18 个图层分别保存为 png 格式图片. 2 游戏对象 1)游戏对象层级结构 2)Transform组件参 ...
- 易语言连接Mysql
最近在写游戏的辅助工具研究了下易语言,下面就说下如何连接Mysql. .版本 2 .支持库 mysql .支持库 spec Mysql句柄 = 连接MySql ("127.0.0.1&quo ...
- 责任链模式与spring容器的搭配应用
背景 有个需求,原先只涉及到一种A情况设备的筛选,每次筛选会经过多个流程,比如先a功能,a功能通过再筛选b功能,然后再筛选c功能,以此类推.现在新增了另外一种B情况的筛选,B情况同样需要A情况的筛选流 ...
- contextmanager装饰器
虽然上下文管理器很好用,但定义一个符合协议的管理器对象其实挺麻烦的 得首先创建一个类,然后实现好几个魔法方法.为了简化这部分工作,python 提供了一个非常好用的工具:@contextmanager ...
- Jetpack Compose(2) —— 入门实践
一.项目中使用 Jetpack Compose 从此节开始,为方便起见,如无特殊说明,Compose 均指代 Jetpack Compose. 开发工具: Android Studio 1.1 创建支 ...
- Python函数每日一讲 - 一文让你彻底掌握Python中的frozenset函数
引言 在 Python 中,frozenset() 函数是一个重要的工具,用于创建不可变的集合对象.本文将介绍 frozenset() 函数的语法.用法示例以及实际应用场景,帮助大家更好地理解和应用这 ...
- Error creating bean with name 'XXX': Bean with name 'senseOneToSomeFeignImpl' has been injected into other beans [XXXXXX] in its raw version as part of a circular reference
关于Spring框架中的循环依赖问题,您可以尝试以下几种方法来解决: 重新定义Bean依赖:重构代码以消除循环依赖.这可能涉及重新设计类,使它们不相互依赖即可运行. 使用Setter注入:与构造函数注 ...
- 【Azure Developer】PHP网站使用AAD授权登录的参考示例
问题描述 如果有个PHP网站,需要使用AAD授权登录,有没有PHP代码实例 可供参考呢? 参考代码 参考一篇博文(Single sign-on with Azure AD in PHP),学习使用SS ...