KingbaseES V8R3 集群运维系列 -- db vip和cluster vip管理

案例说明：

KingbaseES V8R3集群集成了DB VIP（应用连接）和Cluster VIP（集群管理），本案例描述了两种vip在集群的相关配置及集群故障时vip漂移的问题。

适用版本：

KingbaseES V8R3

集群架构：

一、集群VIP相关配置

1）HAmodule.conf配置DB VIP和Cluster VIP

[kingbase@node101 bin]$ cat ../etc/HAmodule.conf |grep -i vip

#vip is bound to the specified network card.example:DEV="ens33"

#db use vip/the subnet mask.example:KB_VIP="192.168.28.220/24"

KB_VIP="192.168.1.204/24"              #db vip配置

#pool use vip/the subnet mask.example:KB_POOL_VIP="192.168.28.220/24"

KB_POOL_VIP="192.168.1.205"            #Cluster vip配置

---集群脚本kingbase_monitor.sh在执行时，会读取HAmodule.conf中配置信息。

2）kingbasecluster.conf中Cluster vip配置

[kingbase@node101 etc]$ cat kingbasecluster.conf|grep -i 'ip add'|grep -v '#'

if_up_cmd='ip addr add 192.168.1.205/24 dev enp0s3 label enp0s3:0'

if_down_cmd='ip addr del 192.168.1.205/24 dev enp0s3'

---在执行kingbasecluster启动或停止集群服务时，会读取kingbasecluster.conf中的配置，加载或卸载Cluster vip。

二、集群VIP加载

1）DB VIP加载

[kingbase@node101 bin]$ ./kingbase_monitor.sh start

-----------------------------------------------------------------------

2023-02-14 19:00:25 KingbaseES automation beging...

......................

ADD VIP NOW AT 2023-02-14 19:00:33 ON enp0s3

execute: [/sbin/ip addr add 192.168.1.204/24 dev enp0s3 label enp0s3:2]

execute: /home/kingbase/cluster/HAR3/db/bin//arping -U 192.168.1.204 -I enp0s3 -w 1

.....

all started..

---如上所示，执行kingbase_monitor.sh start时，DB vip被加载到集群数据库服务的主节点（Primary）上。

2）Cluster vip加载（cluster.log）

2023-02-14 19:01:00: pid 31342: LOG:  kingbasecluster successfully started. version 3.6.7 (release)

.......

2023-02-14 19:01:02: pid 31449: LOG:  successfully acquired the delegate IP:"192.168.1.205"

2023-02-14 19:01:02: pid 31449: DETAIL:  'if_up_cmd' returned with success

---如上，在cluster.log中显示，在kingbaseclsuter服务启动时，将读取kingbasecluster.conf配置，加载Cluster vip到集群主节点上.

3）查看主节点ip信息

[kingbase@node101 bin]$ ip add sh

......

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff

    inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3

       valid_lft forever preferred_lft forever

    inet 192.168.1.204/24 scope global secondary enp0s3:2

       valid_lft forever preferred_lft forever

    inet 192.168.1.205/24 scope global secondary enp0s3:0

       valid_lft forever preferred_lft forever

---如上所示，在主节点DB VIP和Cluster vip都已经被加载。

三、VIP地址漂移测试

1、Cluster vip漂移

1）模拟kingbasecluster服务down

# 查看kingbasecluster进程

[kingbase@node101 bin]$ ps -ef |grep kingbase

.......

root     31342     1  0 19:00 ?        00:00:00 ./kingbasecluster -n

root     31383 31342  0 19:00 ?        00:00:00 kingbasecluster: watchdog

root     31450 31342  0 19:01 ?        00:00:00 kingbasecluster: lifecheck

root     31452 31450  0 19:01 ?        00:00:00 kingbasecluster: heartbeat receiver

root     31453 31450  0 19:01 ?        00:00:00 kingbasecluster: heartbeat sender

root     31456 31342  0 19:01 ?        00:00:00 kingbasecluster: wait for connection request

root     31457 31342  0 19:01 ?        00:00:00 kingbasecluster: wait for connection request

.......

# kill kingbasecluster进程

[root@node101 ~]# kill -2 31342

2）查看集群节点vip信息

#原主节点

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    link/ether 08:00:27:bd:83:57 brd ff:ff:ff:ff:ff:ff

    inet 192.168.1.101/24 brd 192.168.1.255 scope global noprefixroute enp0s3

       valid_lft forever preferred_lft forever

    inet 192.168.1.204/24 scope global secondary enp0s3:2

       valid_lft forever preferred_lft forever

#原备节点

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    link/ether 08:00:27:73:47:f6 brd ff:ff:ff:ff:ff:ff

    inet 192.168.1.102/24 brd 192.168.1.255 scope global noprefixroute enp0s3

       valid_lft forever preferred_lft forever

    inet 192.168.1.205/24 scope global secondary enp0s3:0

       valid_lft forever preferred_lft forever

---如上所示，在主机点Cluster vip（192.168.1.205）已经被卸载，被加载到原备节点；

但是DB vip没有发生漂移，不影响应用对数据库服务的访问。

3）查看cluster.log日志

Tips：

如下所示，由于主节点kingbasecluster服务被停止，备库kingbasecluster服务切换为新主节点，Cluster vip漂移到了新的kingbasecluster主节点。

2023-02-14 19:02:51: pid 7497: LOG:  We have lost the cluster master node "192.168.1.101:9999 Linux node101"

2023-02-14 19:02:51: pid 7497: LOG:  watchdog node state changed from [STANDBY] to [JOINING]

......

2023-02-14 19:02:56: pid 7497: LOG:  watchdog node state changed from [INITIALIZING] to [MASTER]

2023-02-14 19:02:56: pid 7497: LOG:  I am announcing my self as master/coordinator watchdog node

2023-02-14 19:02:59: pid 7500: LOG:  watchdog checking if kingbasecluster is alive using heartbeat

2023-02-14 19:02:59: pid 7500: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 8 seconds ago

2023-02-14 19:03:00: pid 7497: LOG:  I am the cluster leader node

2023-02-14 19:03:00: pid 7497: DETAIL:  our declare coordinator message is accepted by all nodes

........

2023-02-14 19:03:02: pid 8176: LOG:  selecting backend connection

2023-02-14 19:03:02: pid 8176: DETAIL:  failback event detected, discarding existing connections

2023-02-14 19:03:02: pid 7500: LOG:  watchdog checking if kingbasecluster is alive using heartbeat

2023-02-14 19:03:02: pid 7500: DETAIL:  the last heartbeat from "192.168.1.101:9999" received 11 seconds ago

2023-02-14 19:03:02: pid 9330: LOG:  successfully acquired the delegate IP:"192.168.1.205"

2023-02-14 19:03:02: pid 9330: DETAIL:  'if_up_cmd' returned with success

4）重启原主节点的kingbasecluster服务

#启动kingbasecluster服务

[root@node101 ~]# cd /home/kingbase/cluster/HAR3/kingbasecluster/bin

[root@node101 bin]# ./restartcluster.sh

Tips：

如下所示，原主节点在启动kingbasecluster服务后，做为standby节点加入集群。

#cluster.log:

2023-02-14 19:03:05: pid 1023: LOG:  watchdog node state changed from [DEAD] to [LOADING]

2023-02-14 19:03:05: pid 1023: LOG:  new outbound connection to 192.168.1.102:9000

2023-02-14 19:03:05: pid 1023: LOG:  setting the remote node "192.168.1.102:9999 Linux node102" as watchdog cluster master

2023-02-14 19:03:05: pid 1023: LOG:  watchdog node state changed from [LOADING] to [INITIALIZING]

2023-02-14 19:03:05: pid 1023: LOG:  new watchdog node connection is received from "192.168.1.102:47600"

2023-02-14 19:03:05: pid 1023: LOG:  new node joined the cluster hostname:"192.168.1.102" port:9000 kingbasecluster_port:9999

2023-02-14 19:03:06: pid 1023: LOG:  watchdog node state changed from [INITIALIZING] to [STANDBY]

2023-02-14 19:03:06: pid 1023: LOG:  successfully joined the watchdog cluster as standby node

2、DB VIP漂移

1）模拟主库数据库服务down

[kingbase@node101 bin]$ ./sys_ctl stop -D /home/kingbase/cluster/HAR3/db/data

2）查看failover.log日志

-----------------2023-02-14 19:23:52 failover beging---------------------------------------

----failover-stats is %H = hostname of the new master node [192.168.1.102], %P = old primary node id [0], %d = node id[0], %h = host name [192.168.1.101], %O = old primary host[192.168.1.101] %m = new master node id [1], %M = old master node id [0], %D = database cluster path [/home/kingbase/cluster/HAR3/db/data].

----ping trust ip

ping trust ip 192.168.1.1 success ping times :[3], success times:[3]

----determine whether the faulty db is master or standby

master down, let 192.168.1.102 become new primary.....

 2023-02-14 19:23:54 del old primary VIP on 192.168.1.101

es_client connect host:192.168.1.101 success, will stop old primary db and del the vip

stop the old primary db

sys_ctl: PID file "/home/kingbase/cluster/HAR3/db/data/kingbase.pid" does not exist

Is server running?

DEL VIP NOW AT 2023-02-14 19:23:56 ON enp0s3

execute: [/sbin/ip addr del 192.168.1.204/24 dev enp0s3]

Oprate del ip cmd end.

2023-02-14 19:23:54 add VIP on 192.168.1.102

ADD VIP NOW AT 2023-02-14 19:23:55 ON enp0s3

execute: [/sbin/ip addr add 192.168.1.204/24 dev enp0s3 label enp0s3:2]

execute: /home/kingbase/cluster/HAR3/db/bin//arping -U 192.168.1.204 -I enp0s3 -w 1

Success to send 1 packets

2023-02-14 19:23:55 promote begin...let 192.168.1.102 become master

.......

-----------------2023-02-14 19:23:55 failover end---------------------------------------

---如上所示，failover切换过程中，DB VIP将从原主库卸载，新主库加载。

3）查看新主库ip信息

2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000

    link/ether 08:00:27:73:47:f6 brd ff:ff:ff:ff:ff:ff

    inet 192.168.1.102/24 brd 192.168.1.255 scope global noprefixroute enp0s3

       valid_lft forever preferred_lft forever

    inet 192.168.1.205/24 scope global secondary enp0s3:0

       valid_lft forever preferred_lft forever

    inet 192.168.1.204/24 scope global secondary enp0s3:2

---如上所示，在集群触发failover切换后，DB VIP漂移到新的primary节点。

四、总结

KingbaseES V8R3集群通过vip地址，实现应用对数据库高可用性的连接访问及集群管理。

1）DB VIP用于应用的连接访问，在启动集群时被加载到数据库服务的主节点（Primary），当主节点数据库服务down机，触发failover切换时，DB VIP漂移到新的数据库服务主节点。

2）Cluster vip用于kingbasecluster服务的访问，集群启动时加载到kingbaseclsuter的Master节点，当master节点的kingbasecluster服务down时，会漂移到新的master节点。

3）在生产环境出现不能访问9999端口（kingbasecluster服务端口）时，可以尝试重启kingbasecluster服务，默认是不影响客户端的连接访问数据库服务；但对于生产环境，最好是在应用访问的低峰时间执行。

KingbaseES V8R3 集群运维系列 -- db vip和cluster vip管理的更多相关文章

KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析
案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...
KingbaseES V8R3集群运维案例之---kingbase_monitor.sh启动”two master“案例
案例说明: KingbaseES V8R3集群,执行kingbase_monitor.sh启动集群,出现"two master"节点的故障,启动集群失败:通过手工sys_ctl启动 ...
KingbaseES V8R3集群运维案例之---cluster.log ERROR: md5 authentication failed
案例说明: 在KingbaseES V8R3集群的cluster.log日志中,经常会出现"ERROR: md5 authentication failed:DETAIL: password ...
KingbaseES V8R3集群运维案例之---用户自定义表空间管理
案例说明: KingbaseES 数据库支持用户自定义表空间的创建,并建议表空间的文件存储路径配置到数据库的data目录之外.本案例复现了,当用户自定义表空间存储路径配置到data下时,出现的故障问 ...
KingbaseES V8R3集群维护案例之---pcp_node_refresh应用
案例说明: 在一次KingbaseES V8R3集群切换分析中,运维人员执行了pcp_node_refresh,导致集群发生了failover的切换.此文档对pcp_node_refresh工具做了应 ...
KingbaseES V8R3集群管理维护案例之---集群迁移单实例架构
案例说明: 在生产中,需要将KingbaseES V8R3集群转换为单实例架构,可以采用以下方式快速完成集群架构的迁移. 适用版本: KingbaseES V8R3 当前数据库版本: TEST=# s ...
KingbaseES V8R3集群管理和维护案例之---failover切换wal日志变化分析
案例说明: 本案例通过对KingbaseES V8R3集群failover切换过程进行观察,分析了主备库切换后wal日志的变化,对应用者了解KingbaseES V8R3(R6) failover ...
KingbaseES V8R3集群维护案例之---在线添加备库管理节点
案例说明: 在KingbaseES V8R3主备流复制的集群中 ,一般有两个节点是集群的管理节点,分为master和standby:如对于一主二备的架构,其中有两个节点是管理节点,三个数据节点:管理节 ...
KingbaseES V8R6集群运维案例之---repmgr standby promote应用案例
案例说明: 在容灾环境中,跨区域部署的异地备节点不会自主提升为主节点,在主节点发生故障或者人为需要切换时需要手动执行切换操作.若主节点已经失效,希望将异地备机提升为主节点. $bin/repmgr s ...
KingbaseES V8R3 集群专用机网关失败分析案例
KingbaseES R3集群网关检测工作机制: 1.Cluster下watchdog进程在固定间隔时间,通过ping 网关地址监控链路的连通性,如果连通网关地址失败,则修改cluster sta ...

随机推荐

【framework】AMS启动流程
1 前言 AMS 即 ActivityManagerService,负责 Activy.Service.Broadcast.ContentProvider 四大组件的生命周期管理.本文主要介绍 A ...
Eclipse文本编码格式修改为UTF-8 的方法
整理自网络,亲测可用,记录一下,方便下次查. 一般Java文件编码格式是UTF-8的.以下以默认GBK改为UTF-8为例. 1.改变整个工作空间的编码格式,这样以后新建的文件也是新设置的编码格式. e ...
在D2D环境下与GDI结合加载位图
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <Windows.h& ...
项目实战：Qt+OpenCV图像处理与识别算法平台
若该文为原创文章,未经允许不得转载原博主博客地址:https://blog.csdn.net/qq21497936原博主博客导航:https://blog.csdn.net/qq21497936/ar ...
项目实战：Qt编译Qt库以及使用C#调用Qt库，并实现C#集成Qt的tcp客户端
需求 1.Qt已经开发了应用,封装成Qt库,以供C#调用. 2.Qt的tcp客户端封装,以供C#调用,双向传递数据. 原理 1.使用QtCreator编译msvc版本的Qt库: 2.使 ...
进度条模块之tqdm
导入模块 from tqdm import tqdm import time ''' desc 描述 ncols 进度条总长度可修改 range(1000) 封装迭代器 ''' for i in t ...
abc模块的用法
首先需要了解的是一个基类(父类),abc.ABCMeta.这个是用于实现抽象类的一个基础类抽象方法的使用,在相应的方法之前一行加上@abstractmethod之后,从新的一行开始定义相应的方法.实 ...
python实用模块之netifaces获取网络接口地址相关信息
文档 https://pypi.org/project/netifaces/ 安装 pip install netifaces 使用 import netifaces netifaces.interf ...
【Azure 应用服务】在创建App Service时，遇见“No available instances to satisfy this request. App Service is attempting to increase capacity.”错误
问题描述在创建新的App Service,遇见了资源不满足当前需求的提示.详细消息为: "Code": "Conflict","Message&qu ...
【Azure Redis 缓存】Redisson 连接 Azure Redis出现间歇性 java.net.UnknownHostException 异常
问题描述在Java项目中,使用Redisson作为连接Redis的客户端,间歇性的出现了DNS Monitor throwable 错误. DNSMonitor throwable="ja ...

KingbaseES V8R3 集群运维系列 -- db vip和cluster vip管理

KingbaseES V8R3 集群运维系列 -- db vip和cluster vip管理的更多相关文章

随机推荐

热门专题