案例说明:

KingbaseES V8R6集群的vip在repmgr.conf中配置,本案例测试了手工卸载和加载vip的操作,对failover切换时vip的卸载和加载的影响。

适用版本:

KingbaseES V8R6

一、集群节点状态

[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 51 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 50 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

二、集群vip配置

1、查看主机vip加载配置

[kingbase@node101 bin]$ ip add sh
.......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever ---如上所示,主库主机加载vip:192.168.1.254/24

2、查看集群vip配置

[kingbase@node101 bin]$ cat ../etc/repmgr.conf|grep -i vir
virtual_ip='192.168.1.254/24'

三、手工卸载vip测试

1、卸载主库vip

# 如下所示,在卸载vip时需要指定ip掩码
[root@node101 cron.d]# ip add delete 192.168.1.254/24 dev enp0s3 [root@node101 cron.d]# ip add sh
.......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3

2、查看集群节点状态

Tips:

如下所示, 主库vip卸载不影响集群状态,集群状态正常。

[kingbase@node101 bin]$ ./repmgr cluster show

 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 51 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 50 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

3、vip自动加载

如下所示,当集群探测到主库vip缺失时,会自动加载vip。

1)查看主机vip

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/24 scope global secondary enp0s3:3
valid_lft forever preferred_lft forever
---如上所示,在vip被手工卸载后,又被集群自动加载。

2)查看集群日志

如下所示,通过ping vip发现vip丢失时,集群会尝试自动加载vip。

[2023-03-09 17:47:05] [NOTICE] found primary node lost virtual_ip, try to acquire virtual_ip
[2023-03-09 17:47:07] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data. --- 192.168.1.254 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1005ms [2023-03-09 17:47:07] [WARNING] ping host"192.168.1.254" failed
[2023-03-09 17:47:07] [DETAIL] average RTT value is not greater than zero
[2023-03-09 17:47:07] [DEBUG] executing:
/home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A loadvip
[2023-03-09 17:47:07] [DEBUG] result of command was 0 (0)
[2023-03-09 17:47:07] [DEBUG] local_command(): no output returned
[2023-03-09 17:47:07] [DEBUG] executing:
/home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A arping
[2023-03-09 17:47:07] [DEBUG] result of command was 0 (0)
[2023-03-09 17:47:07] [DEBUG] local_command(): no output returned
[2023-03-09 17:47:07] [INFO] loadvip result: 1, arping result: 1 [2023-03-09 17:47:07] [NOTICE] acquire the virtual ip 192.168.1.254/24 success on localhost

四、手工加载vip测试(子网掩码变化)

1、加载不同子网掩码的vip

[root@node101 cron.d]# ip add delete 192.168.1.254/24 dev enp0s3
[root@node101 cron.d]# ip add add 192.168.1.254/32 dev enp0s3:3
[root@node101 cron.d]# ip add sh
......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/32 scope global enp0s3
valid_lft forever preferred_lft forever ---如上所示,vip被手工卸载并加载不同子网掩码的vip(192.168.1.254/32)。

2、执行failover切换测试

1) 关闭主库数据库服务

[kingbase@node101 bin]$ ./sys_ctl stop -D /data/kingbase/r6ha/data/
waiting for server to shut down.... done
server stopped

2) 查看主库ip配置

如下所示,主库vip未被卸载。

[root@node101 cron.d]# ip add sh

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.254/32 scope global enp0s3

3) 查看备库hamgr.log

[2023-03-09 17:52:28] [INFO] try to ping the trusted_servers "192.168.1.1" before execute promote_command
[2023-03-09 17:52:30] [NOTICE] PING 192.168.1.1 (192.168.1.1) 56(84) bytes of data. --- 192.168.1.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1008ms
rtt min/avg/max/mdev = 0.231/0.287/0.343/0.056 ms [2023-03-09 17:52:30] [NOTICE] successfully ping one or more of the trusted_servers "192.168.1.1"
[2023-03-09 17:52:30] [DEBUG] test_ssh_connection(): executing ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /bin/true 2>/dev/null
[2023-03-09 17:52:30] [NOTICE] try to stop old primary db (host: "192.168.1.101")
[2023-03-09 17:52:30] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A stopdb
[2023-03-09 17:52:30] [DEBUG] remote_command(): no output returned
[2023-03-09 17:52:32] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data. --- 192.168.1.254 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1001ms
rtt min/avg/max/mdev = 0.472/0.505/0.538/0.033 ms [2023-03-09 17:52:32] [WARNING] the virtual ip is already on other host, try to release it on old primary node (host: "192.168.1.101")
[2023-03-09 17:52:32] [DEBUG] test_ssh_connection(): executing ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /bin/true 2>/dev/null [2023-03-09 17:52:32] [INFO] ES connection to host "192.168.1.101" succeeded, ready to release vip on it
[2023-03-09 17:52:32] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A check_ip --ip 192.168.1.254
[2023-03-09 17:52:32] [DEBUG] remote_command(): output returned was:
1 [2023-03-09 17:52:32] [DEBUG] remote_command():
ssh -o Batchmode=yes -q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22 192.168.1.101 /home/kingbase/cluster/R6HA/kha/kingbase/bin/kbha -A unloadvip
RTNETLINK answers: Cannot assign requested address
[2023-03-09 17:52:32] [DEBUG] remote_command(): no output returned
[2023-03-09 17:52:32] [WARNING] old primary node (host: "192.168.1.101") release the virtual ip 192.168.1.254/24 failed
[2023-03-09 17:52:32] [NOTICE] the time from the first failure to acquire VIP is 2 seconds (max 60 seconds), try agian
[2023-03-09 17:52:32] [NOTICE] will acquire the virtual ip again
[2023-03-09 17:52:34] [NOTICE] PING 192.168.1.254 (192.168.1.254) 56(84) bytes of data.
  如下图所示,failover切换时,备库远程连接主库后,执行vip的卸载,备库从repmgr.conf中读取的vip地址为:192.168.1.254/24,而主库此时加载的vip地址是:192.168.1.254/32,vip地址不匹配,因此无法卸载vip地址,导致切换失败。

五、总结

    1、如果在主库上vip被手工卸载,集群不会发生切换,集群会自动判断并加载vip地址到主库。
2、如果主库上配置了和repmgr.conf中不一致的vip地址,在集群切换时,将无法执行vip地址的卸载,会导致集群切换失败。
```****

KingbaseES V8R6 集群运维案例之 -- VIP配置错误导致集群切换失败的更多相关文章

  1. KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例

    案例说明: 在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作.若主节点已经失效,希望将异地备机提升为主节点. $bin/repmgr s ...

  2. KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析

    ​ 案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...

  3. KingbaseES V8R3集群运维案例之---用户自定义表空间管理

    ​案例说明: KingbaseES 数据库支持用户自定义表空间的创建,并建议表空间的文件存储路径配置到数据库的data目录之外.本案例复现了,当用户自定义表空间存储路径配置到data下时,出现的故障问 ...

  4. KingbaseES V8R3集群运维案例之---kingbase_monitor.sh启动”two master“案例

    案例说明: KingbaseES V8R3集群,执行kingbase_monitor.sh启动集群,出现"two master"节点的故障,启动集群失败:通过手工sys_ctl启动 ...

  5. KingbaseES V8R3集群运维案例之---cluster.log ERROR: md5 authentication failed

    案例说明: 在KingbaseES V8R3集群的cluster.log日志中,经常会出现"ERROR: md5 authentication failed:DETAIL: password ...

  6. PB 级大规模 Elasticsearch 集群运维与调优实践

    PB 级大规模 Elasticsearch 集群运维与调优实践 https://mp.weixin.qq.com/s/PDyHT9IuRij20JBgbPTjFA | 导语 腾讯云 Elasticse ...

  7. 优化系统资源ulimit《高性能Linux服务器构建实战:运维监控、性能调优与集群应用》

    优化系统资源ulimit<高性能Linux服务器构建实战:运维监控.性能调优与集群应用> 假设有这样一种情况,一台Linux 主机上同时登录了10个用户,在没有限制系统资源的情况下,这10 ...

  8. 优化Linux内核参数/etc/sysctl.conf sysctl 《高性能Linux服务器构建实战:运维监控、性能调优与集群应用》

    优化Linux内核参数/etc/sysctl.conf  sysctl  <高性能Linux服务器构建实战:运维监控.性能调优与集群应用> http://book.51cto.com/ar ...

  9. 集群运维ansible

    ssh免密登录 集群运维 生成秘钥,一路enter cd ~/.ssh/ ssh-keygen -t rsa 讲id_rsa.pub文件追加到授权的key文件中 cat ~/.ssh/id_rsa.p ...

  10. 阿里巴巴大规模神龙裸金属 Kubernetes 集群运维实践

    作者 | 姚捷(喽哥)阿里云容器平台集群管理高级技术专家 本文节选自<不一样的 双11 技术:阿里巴巴经济体云原生实践>一书,点击即可完成下载. 导读:值得阿里巴巴技术人骄傲的是 2019 ...

随机推荐

  1. 【framework】AMS启动流程

    1 前言 ​ AMS 即 ActivityManagerService,负责 Activy.Service.Broadcast.ContentProvider 四大组件的生命周期管理.本文主要介绍 A ...

  2. MVVM模式的理解

    MVVM模式的理解 MVVM全称Model-View-ViewModel是基于MVC和MVP体系结构模式的改进,MVVM就是MVC模式中的View的状态和行为抽象化,将视图UI和业务逻辑分开,更清楚地 ...

  3. 微信小程序云开发项目-个人待办事项-03【主页】模块开发

    上一篇: 微信小程序云开发项目-个人待办事项-02今日模块开发 https://blog.csdn.net/IndexMan/article/details/124497893 模块开发步骤 本篇介绍 ...

  4. ElementUI导出表格数据为Excel文件

    功能介绍 将列表的数据导出成excel文件是管理系统中非常常见的功能.最近正好用到了ElementUI+Vue的组合做了个导出效果,拿出来分享一下,希望可以帮到大家:) 实现效果 实现步骤 1.定义导 ...

  5. 微信小程序生态13-微信公众号自定义菜单配置

    序 微信公众号分为订阅号和服务号两种,虽然二者很大的不同,但是这两种公众号的底部却是差不多的:都有菜单栏,而且这些底部菜单也都是自定义配置的. 如CSDN的官方公众号的底部就有精彩栏目.新程序员.CS ...

  6. 参数替换xargs

    由于很多命令不支持管道|来传递参数,xargs用于产生某个命令的参数,xargs可以读入stdin的数据,并且以空格符或回车符将stdin的数据分隔为参数 示例: 创建10个用户 echo user{ ...

  7. proc_sys_reset 复位时序

    proc_sys_reset 模块时序 下面为仿真时序,这里做一个record , 后面有使用问题可以参考该时序: 点击查看代码 module test( ); bit slowest_sync_cl ...

  8. 可以取代宝塔和Nginx的Web服务器:Caddy

    一.安装 官网文章:https://caddyserver.com/docs/install 在左边选择:Install 我的服务器是Ubuntu,所以选第二行 我的服务器是Ubuntu,官方给出的就 ...

  9. Nebula Graph 的 KV 存储分离原理和性能测评

    本文首发于 Nebula Graph Community 公众号 1. 概述 过去十年,图计算无论在学术界还是工业界热度持续升高.相伴而来的是,全世界的数据正以几何级数形式增长.在这种情况下,对于数据 ...

  10. js收藏网页功能,纠正网上乱转没求证的案例

    网站一般流行以下收藏代码 function AddFavorite(title, url){ try{ //ie收藏 window.external.addFavorite(url, title); ...