KingbaseES V8R6 集群运维案例之 -- VIP配置错误导致集群切换失败
案例说明:
KingbaseES V8R6集群的vip在repmgr.conf中配置,本案例测试了手工卸载和加载vip的操作,对failover切换时vip的卸载和加载的影响。
适用版本:
KingbaseES V8R6
一、集群节点状态
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 51 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 50 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
二、集群vip配置
1、查看主机vip加载配置
[kingbase@node101 bin]$ ip add sh
.......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever
---如上所示,主库主机加载vip:192.168.1.254/24
2、查看集群vip配置
[kingbase@node101 bin]$ cat ../etc/repmgr.conf|grep -i vir
virtual_ip='192.168.1.254/24'
三、手工卸载vip测试
1、卸载主库vip
# 如下所示,在卸载vip时需要指定ip掩码
[root@node101 cron.d]# ip add delete 192.168.1.254/24 dev enp0s3
[root@node101 cron.d]# ip add sh
.......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
2、查看集群节点状态
Tips:
如下所示, 主库vip卸载不影响集群状态,集群状态正常。
[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 51 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 50 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
3、vip自动加载
如下所示,当集群探测到主库vip缺失时,会自动加载vip。
1)查看主机vip
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever
---如上所示,在vip被手工卸载后,又被集群自动加载。
2)查看集群日志
如下所示,通过ping vip发现vip丢失时,集群会尝试自动加载vip。
[2023-03-09 17:47:05] [NOTICE] found primary node lost virtual_ip, try to acquire virtual_ip
[2023-03-09 17:47:07] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
--- 192.168.1.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1005ms
[2023-03-09 17:47:07] [WARNING] ping host"192.168.1.254" failed
[2023-03-09 17:47:07] [DETAIL] average RTT value is not greater than zero
[2023-03-09 17:47:07] [DEBUG] executing:
/home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A loadvip
[2023-03-09 17:47:07] [DEBUG] result of command was 0 (0)
[2023-03-09 17:47:07] [DEBUG] local_command(): no output returned
[2023-03-09 17:47:07] [DEBUG] executing:
/home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A arping
[2023-03-09 17:47:07] [DEBUG] result of command was 0 (0)
[2023-03-09 17:47:07] [DEBUG] local_command(): no output returned
[2023-03-09 17:47:07] [INFO] loadvip result: 1, arping result: 1
[2023-03-09 17:47:07] [NOTICE] acquire the virtual ip 192.168.1.254/24 success on localhost
四、手工加载vip测试(子网掩码变化)
1、加载不同子网掩码的vip
[root@node101 cron.d]# ip add delete 192.168.1.254/24 dev enp0s3
[root@node101 cron.d]# ip add add 192.168.1.254/32 dev enp0s3:3
[root@node101 cron.d]# ip add sh
......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/32 scope global enp0s3
valid_lft forever preferred_lft forever
---如上所示,vip被手工卸载并加载不同子网掩码的vip(192.168.1.254/32)。
2、执行failover切换测试
1) 关闭主库数据库服务
[kingbase@node101 bin]$ ./sys_ctl stop -D /data/kingbase/r6ha/data/
waiting for server to shut down.... done
server stopped
2) 查看主库ip配置
如下所示,主库vip未被卸载。
[root@node101 cron.d]# ip add sh
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/32 scope global enp0s3
3) 查看备库hamgr.log
[2023-03-09 17:52:28] [INFO] try to ping the trusted_servers "192.168.1.1" before execute promote_command
[2023-03-09 17:52:30] [NOTICE] PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data.
--- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.231/0.287/0.343/0.056 ms
[2023-03-09 17:52:30] [NOTICE] successfully ping one or more of the trusted_servers "192.168.1.1"
[2023-03-09 17:52:30] [DEBUG] test_ssh_connection(): executing ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /bin/true 2>/dev/null
[2023-03-09 17:52:30] [NOTICE] try to stop old primary db (host: "192.168.1.101")
[2023-03-09 17:52:30] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A stopdb
[2023-03-09 17:52:30] [DEBUG] remote_command(): no output returned
[2023-03-09 17:52:32] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
--- 192.168.1.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.472/0.505/0.538/0.033 ms
[2023-03-09 17:52:32] [WARNING] the virtual ip is already on other host, try to release it on old primary node (host: "192.168.1.101")
[2023-03-09 17:52:32] [DEBUG] test_ssh_connection(): executing ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /bin/true 2>/dev/null
[2023-03-09 17:52:32] [INFO] ES connection to host "192.168.1.101" succeeded, ready to release vip on it
[2023-03-09 17:52:32] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A check_ip --ip 192.168.1.254
[2023-03-09 17:52:32] [DEBUG] remote_command(): output returned was:
1
[2023-03-09 17:52:32] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A unloadvip
RTNETLINK answers: Cannot assign requested address
[2023-03-09 17:52:32] [DEBUG] remote_command(): no output returned
[2023-03-09 17:52:32] [WARNING] old primary node (host: "192.168.1.101") release the virtual ip 192.168.1.254/24 failed
[2023-03-09 17:52:32] [NOTICE] the time from the first failure to acquire VIP is 2 seconds (max 60 seconds), try agian
[2023-03-09 17:52:32] [NOTICE] will acquire the virtual ip again
[2023-03-09 17:52:34] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
如下图所示,failover切换时,备库远程连接主库后,执行vip的卸载,备库从repmgr.conf中读取的vip地址为:192.168.1.254/24,而主库此时加载的vip地址是:192.168.1.254/32,vip地址不匹配,因此无法卸载vip地址,导致切换失败。
五、总结
1、如果在主库上vip被手工卸载,集群不会发生切换,集群会自动判断并加载vip地址到主库。
2、如果主库上配置了和repmgr.conf中不一致的vip地址,在集群切换时,将无法执行vip地址的卸载,会导致集群切换失败。
```****
KingbaseES V8R6 集群运维案例之 -- VIP配置错误导致集群切换失败的更多相关文章
- KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例
案例说明: 在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作.若主节点已经失效,希望将异地备机提升为主节点. $bin/repmgr s ...
- KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析
案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...
- KingbaseES V8R3集群运维案例之---用户自定义表空间管理
案例说明: KingbaseES 数据库支持用户自定义表空间的创建,并建议表空间的文件存储路径配置到数据库的data目录之外.本案例复现了,当用户自定义表空间存储路径配置到data下时,出现的故障问 ...
- KingbaseES V8R3集群运维案例之---kingbase_monitor.sh启动”two master“案例
案例说明: KingbaseES V8R3集群,执行kingbase_monitor.sh启动集群,出现"two master"节点的故障,启动集群失败:通过手工sys_ctl启动 ...
- KingbaseES V8R3集群运维案例之---cluster.log ERROR: md5 authentication failed
案例说明: 在KingbaseES V8R3集群的cluster.log日志中,经常会出现"ERROR: md5 authentication failed:DETAIL: password ...
- PB 级大规模 Elasticsearch 集群运维与调优实践
PB 级大规模 Elasticsearch 集群运维与调优实践 https://mp.weixin.qq.com/s/PDyHT9IuRij20JBgbPTjFA | 导语 腾讯云 Elasticse ...
- 优化系统资源ulimit《高性能Linux服务器构建实战:运维监控、性能调优与集群应用》
优化系统资源ulimit<高性能Linux服务器构建实战:运维监控.性能调优与集群应用> 假设有这样一种情况,一台Linux 主机上同时登录了10个用户,在没有限制系统资源的情况下,这10 ...
- 优化Linux内核参数/etc/sysctl.conf sysctl 《高性能Linux服务器构建实战:运维监控、性能调优与集群应用》
优化Linux内核参数/etc/sysctl.conf sysctl <高性能Linux服务器构建实战:运维监控.性能调优与集群应用> http://book.51cto.com/ar ...
- 集群运维ansible
ssh免密登录 集群运维 生成秘钥,一路enter cd ~/.ssh/ ssh-keygen -t rsa 讲id_rsa.pub文件追加到授权的key文件中 cat ~/.ssh/id_rsa.p ...
- 阿里巴巴大规模神龙裸金属 Kubernetes 集群运维实践
作者 | 姚捷(喽哥)阿里云容器平台集群管理高级技术专家 本文节选自<不一样的 双11 技术:阿里巴巴经济体云原生实践>一书,点击即可完成下载. 导读:值得阿里巴巴技术人骄傲的是 2019 ...
随机推荐
- Working with Dates in PL/SQL(PL/SQL中使用日期)
Working with Dates in PL/SQL By Steven Feuerstein 史蒂芬.佛伊尔斯坦 The previous articles in this introduct ...
- 实操开源版全栈测试工具RunnerGo安装(二)Linux安装
手动安装(支持Linux.MacOS.Windows) Linux安装步骤 以debian系统为例,其他linux系统参考官方文档:https://docs.docker.com/engine/ins ...
- day01---操作系统安装环境准备
虚拟机安装操作系统步骤 1.新建虚拟主机 2.选择自定义 3.稍后安装操作系统 4.操作系统选择linux 5.选择存放位置 6.cpu和核数选择,默认即可 7.内存分配 8.网络选择 9.控制器类型 ...
- ASP.NET XML序列化
整理一下ASP.NET里面如何序列化实体为XML,获取解析XML内容为实体. 第一步要添加程序集引用,项目-->引用-->鼠标右键-->添加引用-->选择程序集-->Sy ...
- [软件工程] CMMI是什么?
序 能力成熟度模型集成(CMMI) 一.CMMI(能力成熟度模型集成)概述 CMMI是由美国软件工程学会(software engineering institue,简称SEI)制定的一套专门针对软件 ...
- PicGo如何设置阿里云图床
打开阿里云官网.注册并且登录.然后产品下拉列表里面通过搜索或者直接找到存储.对象存储OSS 默认你已经激活了,然后进入到控制台里面. 注意事项 Bucket名称需要全英文,不能有大写字母 服务器选国内 ...
- 【Azure 应用服务】App Service与Application Gateway组合使用时发生的域名跳转问题如何解决呢?
问题描述 为App Service配置了应用服务网关(Application Gateway),并且为Application Gateway配置了自定义域名,通过浏览器访问时,出现域名跳转问题,由自定 ...
- Mysql基础目录
尚硅谷Mysql课程笔记 课程链接: https://www.bilibili.com/video/BV1iq4y1u7vj?p=1 第01章_数据库概述 第02章_MySQL环境搭建 第03章_基本 ...
- Codeforces Round 303 (Div. 2)C. Kefa and Park(DFS、实现)
@ 目录 题面 链接 题意 题解 代码 总结 题面 链接 C. Kefa and Park 题意 求叶节点数量,叶节点满足,从根节点到叶节点的路径上最长连续1的长度小于m 题解 这道题目主要是实现,当 ...
- C++ Qt开发:QFileSystemModel文件管理组件
Qt 是一个跨平台C++图形界面开发库,利用Qt可以快速开发跨平台窗体应用程序,在Qt中我们可以通过拖拽的方式将不同组件放到指定的位置,实现图形化开发极大的方便了开发效率,本章将重点介绍如何运用QFi ...