KingbaseES V8R3 集群运维系列 -- db vip和cluster vip管理
案例说明:
KingbaseES V8R3集群集成了DB VIP(应用连接)和Cluster VIP(集群管理),本案例描述了两种vip在集群的相关配置及集群故障时vip漂移的问题。
适用版本:
KingbaseES V8R3
集群架构:

一、集群VIP相关配置
1)HAmodule.conf配置DB VIP和Cluster VIP
[kingbase@node101 bin]$ cat ../etc/HAmodule.conf |grep -i vip
#vip is bound to the specified network card.example:DEV="ens33"
#db use vip/the subnet mask.example:KB_VIP="192.168.28.220/24"
KB_VIP="192.168.1.204/24" #db vip配置
#pool use vip/the subnet mask.example:KB_POOL_VIP="192.168.28.220/24"
KB_POOL_VIP="192.168.1.205" #Cluster vip配置
---集群脚本kingbase_monitor.sh在执行时,会读取HAmodule.conf中配置信息。
2)kingbasecluster.conf中Cluster vip配置
[kingbase@node101 etc]$ cat kingbasecluster.conf|grep -i 'ip add'|grep -v '#'
if_up_cmd='ip addr add 192.168.1.205/24 dev enp0s3 label enp0s3:0'
if_down_cmd='ip addr del 192.168.1.205/24 dev enp0s3'
---在执行kingbasecluster启动或停止集群服务时,会读取kingbasecluster.conf中的配置,加载或卸载Cluster vip。
二、集群VIP加载
1)DB VIP加载
[kingbase@node101 bin]$ ./kingbase_monitor.sh start
-----------------------------------------------------------------------
2023-02-14 19:00:25 KingbaseES automation beging...
......................
ADD VIP NOW AT 2023-02-14 19:00:33 ON enp0s3
execute: [/sbin/ip addr add 192.168.1.204/24 dev enp0s3 label enp0s3:2]
execute: /home/kingbase/cluster/HAR3/db/bin//arping -U 192.168.1.204 -I enp0s3 -w 1
.....
all started..
---如上所示,执行kingbase_monitor.sh start时,DB vip被加载到集群数据库服务的主节点(Primary)上。
2)Cluster vip加载(cluster.log)
2023-02-14 19:01:00: pid 31342: LOG: kingbasecluster successfully started. version 3.6.7 (release)
.......
2023-02-14 19:01:02: pid 31449: LOG: successfully acquired the delegate IP:"192.168.1.205"
2023-02-14 19:01:02: pid 31449: DETAIL: 'if_up_cmd' returned with success
---如上,在cluster.log中显示,在kingbaseclsuter服务启动时,将读取kingbasecluster.conf配置,加载Cluster vip到集群主节点上.
3)查看主节点ip信息
[kingbase@node101 bin]$ ip add sh
......
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.204/24 scope global secondary enp0s3:2
valid_lft forever preferred_lft forever
inet 192.168.1.205/24 scope global secondary enp0s3:0
valid_lft forever preferred_lft forever
---如上所示,在主节点DB VIP和Cluster vip都已经被加载。
三、VIP地址漂移测试
1、Cluster vip漂移
1)模拟kingbasecluster服务down
# 查看kingbasecluster进程
[kingbase@node101 bin]$ ps -ef |grep kingbase
.......
root 31342 1 0 19:00 ? 00:00:00 ./kingbasecluster -n
root 31383 31342 0 19:00 ? 00:00:00 kingbasecluster: watchdog
root 31450 31342 0 19:01 ? 00:00:00 kingbasecluster: lifecheck
root 31452 31450 0 19:01 ? 00:00:00 kingbasecluster: heartbeat receiver
root 31453 31450 0 19:01 ? 00:00:00 kingbasecluster: heartbeat sender
root 31456 31342 0 19:01 ? 00:00:00 kingbasecluster: wait for connection request
root 31457 31342 0 19:01 ? 00:00:00 kingbasecluster: wait for connection request
.......
# kill kingbasecluster进程
[root@node101 ~]# kill -2 31342
2)查看集群节点vip信息
#原主节点
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.204/24 scope global secondary enp0s3:2
valid_lft forever preferred_lft forever
#原备节点
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:73:47:f6 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.102/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.205/24 scope global secondary enp0s3:0
valid_lft forever preferred_lft forever
---如上所示,在主机点Cluster vip(192.168.1.205)已经被卸载,被加载到原备节点;
但是DB vip没有发生漂移,不影响应用对数据库服务的访问。
3)查看cluster.log日志
Tips:
如下所示,由于主节点kingbasecluster服务被停止,备库kingbasecluster服务切换为新主节点,Cluster vip漂移到了新的kingbasecluster主节点。
2023-02-14 19:02:51: pid 7497: LOG: We have lost the cluster master node "192.168.1.101:9999 Linux node101"
2023-02-14 19:02:51: pid 7497: LOG: watchdog node state changed from [STANDBY] to [JOINING]
......
2023-02-14 19:02:56: pid 7497: LOG: watchdog node state changed from [INITIALIZING] to [MASTER]
2023-02-14 19:02:56: pid 7497: LOG: I am announcing my self as master/coordinator watchdog node
2023-02-14 19:02:59: pid 7500: LOG: watchdog checking if kingbasecluster is alive using heartbeat
2023-02-14 19:02:59: pid 7500: DETAIL: the last heartbeat from "192.168.1.101:9999" received 8 seconds ago
2023-02-14 19:03:00: pid 7497: LOG: I am the cluster leader node
2023-02-14 19:03:00: pid 7497: DETAIL: our declare coordinator message is accepted by all nodes
........
2023-02-14 19:03:02: pid 8176: LOG: selecting backend connection
2023-02-14 19:03:02: pid 8176: DETAIL: failback event detected, discarding existing connections
2023-02-14 19:03:02: pid 7500: LOG: watchdog checking if kingbasecluster is alive using heartbeat
2023-02-14 19:03:02: pid 7500: DETAIL: the last heartbeat from "192.168.1.101:9999" received 11 seconds ago
2023-02-14 19:03:02: pid 9330: LOG: successfully acquired the delegate IP:"192.168.1.205"
2023-02-14 19:03:02: pid 9330: DETAIL: 'if_up_cmd' returned with success
4)重启原主节点的kingbasecluster服务
#启动kingbasecluster服务
[root@node101 ~]# cd /home/kingbase/cluster/HAR3/kingbasecluster/bin
[root@node101 bin]# ./restartcluster.sh
Tips:
如下所示,原主节点在启动kingbasecluster服务后,做为standby节点加入集群。
#cluster.log:
2023-02-14 19:03:05: pid 1023: LOG: watchdog node state changed from [DEAD] to [LOADING]
2023-02-14 19:03:05: pid 1023: LOG: new outbound connection to 192.168.1.102:9000
2023-02-14 19:03:05: pid 1023: LOG: setting the remote node "192.168.1.102:9999 Linux node102" as watchdog cluster master
2023-02-14 19:03:05: pid 1023: LOG: watchdog node state changed from [LOADING] to [INITIALIZING]
2023-02-14 19:03:05: pid 1023: LOG: new watchdog node connection is received from "192.168.1.102:47600"
2023-02-14 19:03:05: pid 1023: LOG: new node joined the cluster hostname:"192.168.1.102" port:9000 kingbasecluster_port:9999
2023-02-14 19:03:06: pid 1023: LOG: watchdog node state changed from [INITIALIZING] to [STANDBY]
2023-02-14 19:03:06: pid 1023: LOG: successfully joined the watchdog cluster as standby node
2、DB VIP漂移
1)模拟主库数据库服务down
[kingbase@node101 bin]$ ./sys_ctl stop -D /home/kingbase/cluster/HAR3/db/data
2)查看failover.log日志
-----------------2023-02-14 19:23:52 failover beging---------------------------------------
----failover-stats is %H = hostname of the new master node [192.168.1.102], %P = old primary node id [0], %d = node id[0], %h = host name [192.168.1.101], %O = old primary host[192.168.1.101] %m = new master node id [1], %M = old master node id [0], %D = database cluster path [/home/kingbase/cluster/HAR3/db/data].
----ping trust ip
ping trust ip 192.168.1.1 success ping times :[3], success times:[3]
----determine whether the faulty db is master or standby
master down, let 192.168.1.102 become new primary.....
2023-02-14 19:23:54 del old primary VIP on 192.168.1.101
es_client connect host:192.168.1.101 success, will stop old primary db and del the vip
stop the old primary db
sys_ctl: PID file "/home/kingbase/cluster/HAR3/db/data/kingbase.pid" does not exist
Is server running?
DEL VIP NOW AT 2023-02-14 19:23:56 ON enp0s3
execute: [/sbin/ip addr del 192.168.1.204/24 dev enp0s3]
Oprate del ip cmd end.
2023-02-14 19:23:54 add VIP on 192.168.1.102
ADD VIP NOW AT 2023-02-14 19:23:55 ON enp0s3
execute: [/sbin/ip addr add 192.168.1.204/24 dev enp0s3 label enp0s3:2]
execute: /home/kingbase/cluster/HAR3/db/bin//arping -U 192.168.1.204 -I enp0s3 -w 1
Success to send 1 packets
2023-02-14 19:23:55 promote begin...let 192.168.1.102 become master
.......
-----------------2023-02-14 19:23:55 failover end---------------------------------------
---如上所示,failover切换过程中,DB VIP将从原主库卸载,新主库加载。
3)查看新主库ip信息
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 08:00:27:73:47:f6 brd ff:ff:ff:ff:ff:ff
inet 192.168.1.102/24 brd 192.168.1.255 scope global noprefixroute enp0s3
valid_lft forever preferred_lft forever
inet 192.168.1.205/24 scope global secondary enp0s3:0
valid_lft forever preferred_lft forever
inet 192.168.1.204/24 scope global secondary enp0s3:2
---如上所示,在集群触发failover切换后,DB VIP漂移到新的primary节点。
四、总结
KingbaseES V8R3集群通过vip地址,实现应用对数据库高可用性的连接访问及集群管理。
1)DB VIP用于应用的连接访问,在启动集群时被加载到数据库服务的主节点(Primary),当主节点数据库服务down机,触发failover切换时,DB VIP漂移到新的数据库服务主节点。
2)Cluster vip用于kingbasecluster服务的访问,集群启动时加载到kingbaseclsuter的Master节点,当master节点的kingbasecluster服务down时,会漂移到新的master节点。
3)在生产环境出现不能访问9999端口(kingbasecluster服务端口)时,可以尝试重启kingbasecluster服务,默认是不影响客户端的连接访问数据库服务;但对于生产环境,最好是在应用访问的低峰时间执行。
KingbaseES V8R3 集群运维系列 -- db vip和cluster vip管理的更多相关文章
- KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析
案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...
- KingbaseES V8R3集群运维案例之---kingbase_monitor.sh启动”two master“案例
案例说明: KingbaseES V8R3集群,执行kingbase_monitor.sh启动集群,出现"two master"节点的故障,启动集群失败:通过手工sys_ctl启动 ...
- KingbaseES V8R3集群运维案例之---cluster.log ERROR: md5 authentication failed
案例说明: 在KingbaseES V8R3集群的cluster.log日志中,经常会出现"ERROR: md5 authentication failed:DETAIL: password ...
- KingbaseES V8R3集群运维案例之---用户自定义表空间管理
案例说明: KingbaseES 数据库支持用户自定义表空间的创建,并建议表空间的文件存储路径配置到数据库的data目录之外.本案例复现了,当用户自定义表空间存储路径配置到data下时,出现的故障问 ...
- KingbaseES V8R3集群维护案例之---pcp_node_refresh应用
案例说明: 在一次KingbaseES V8R3集群切换分析中,运维人员执行了pcp_node_refresh,导致集群发生了failover的切换.此文档对pcp_node_refresh工具做了应 ...
- KingbaseES V8R3集群管理维护案例之---集群迁移单实例架构
案例说明: 在生产中,需要将KingbaseES V8R3集群转换为单实例架构,可以采用以下方式快速完成集群架构的迁移. 适用版本: KingbaseES V8R3 当前数据库版本: TEST=# s ...
- KingbaseES V8R3集群管理和维护案例之---failover切换wal日志变化分析
案例说明: 本案例通过对KingbaseES V8R3集群failover切换过程进行观察,分析了主备库切换后wal日志的变化,对应用者了解KingbaseES V8R3(R6) failover ...
- KingbaseES V8R3集群维护案例之---在线添加备库管理节点
案例说明: 在KingbaseES V8R3主备流复制的集群中 ,一般有两个节点是集群的管理节点,分为master和standby:如对于一主二备的架构,其中有两个节点是管理节点,三个数据节点:管理节 ...
- KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例
案例说明: 在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作.若主节点已经失效,希望将异地备机提升为主节点. $bin/repmgr s ...
- KingbaseES V8R3 集群专用机网关失败分析案例
KingbaseES R3集群网关检测工作机制: 1.Cluster下watchdog进程在固定间隔时间,通过ping 网关地址监控链路的连通性,如果连通网关地址失败,则修改cluster sta ...
随机推荐
- 【OpenGL ES】渐变凸镜贴图
1 前言 正方形图片贴到圆形上 中将正方形图片上的纹理映射到圆形模型上,凸镜贴图 中介绍了将圆形图片上的纹理映射到凸镜模型上.如果将原图片逐渐变为凸镜效果,中间的变化过程又是什么样的? 图片的 ...
- 【OpenGL ES】正方形图片贴到圆形上
1 前言 纹理贴图 中介绍了将矩形图片贴到矩形模型上,本文将介绍:在不裁剪图片的情况下,将正方形的图片贴到圆形模型上. 思考:实数区间 [0, 1] 与 [0, 2] 的元素可以建立一一映射关 ...
- springboot项目使用外置tomcat7部署项目
Springboot使用外置tomcat7部署运行 1.pom修改 2.tomcat底下config下catalina.properties 3.在tomcat的lib文件夹下添加 javax.el- ...
- 用ELK分析每天4亿多条腾讯云MySQL审计日志(1)--解决过程
前言: 该文章将会介绍以下: 1,快速分析SQL日志的几种方法 2,使用mysql的全文索引快速分析少量SQL审计 3,准确快速分析4亿多条审计SQL日志(过程和最终解决方案) 公司核心库拆 ...
- 使用base标签解决Thymeleaf页面获取项目路径问题
问题说明 写博客页面在发表博客后我想跳转到博客详情页.这里面我用到了:window.location.href="localhost:8080/post/detail/123"; ...
- java轻量级规则引擎easy-rules使用介绍
我们在写业务代码经常遇到需要一大堆if/else,会导致代码可读性大大降低,有没有一种方法可以避免代码中出现大量的判断语句呢? 答案是用规则引擎,但是传统的规则引擎都比较重,比如开源的Drools,不 ...
- 麒麟系统开发笔记(九):在国产麒麟系统上搭建宇视摄像头SDK基础环境Demo
前言 国产麒麟系统开发上,使用宇视摄像头,本篇使用宇视官网的提供的SDK,搭建基础的国产系统上宇视摄像头SDK开发化境Demo. 效果演示 宇视SDK下载 CSDN粉丝0积分下载 ...
- postgresql表结构查询sql
数据库表结构查询sql SELECT t1.attnum as "序号", t1.attname as "字段名", concat_ws ( '', t2.ty ...
- 【LeetCode二叉树#10】从中序与后序(或者前序)遍历序列构造二叉树(首次构造二叉树)
从中序与后序遍历序列构造二叉树 力扣题目链接(opens new window) 根据一棵树的中序遍历与后序遍历构造二叉树. 注意: 你可以假设树中没有重复的元素. 例如,给出 中序遍历 inorde ...
- Finder Error code -36 “访达” 错误代码-36
导致这个问题的原因是你的iCloud (iCloud和iCloud Drive是不一样的) 快满了. 如果你想解决这个问题,有以下三个方法: 1.多买苹果iCloud.(是的,苹果现在太恶心了.但这是 ...