案例说明:

在KingbaseES V8R3主备流复制的集群中 ,一般有两个节点是集群的管理节点,分为master和standby;如对于一主二备的架构,其中有两个节点是管理节点,三个数据节点;管理节点运行kingbasecluster服务 ,负责集群节点状态的监控及集群主备切换等操作。

本案例详细介绍在一主一备的架构下,其中一个管理节点宕机的情况下,如何在线添加新的管理节点,如果宕机的节点是主备复制中的primary主库,将自动发生主备切换,所以在线添加的节点都是管理节点的备节点。

管理节点的添加和普通数据节点的在线添加不同,相对较复杂,有关数据节点的添加可以参考以下文档:https://www.cnblogs.com/tiany1224/p/15749993.html KingbaseES V8R3集群维护案例之--- 在线添加数据节点

适用版本:

KingbaseES V8R3通用机环境(专业机可参考)

本次案例数据库版本:

TEST=# select version();
VERSION
-------------------------------------------------------------------------------------------------------------------------
Kingbase V008R003C002B0290 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

一、集群原节点信息

[kingbase@node101 bin]$ ./ksql -U SYSTEM -W 123456 TEST -p 9999
ksql (V008R003C002B0290)
Type "help" for help. TEST=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.1.101 | 54321 | up | 0.500000 | primary | 0 | true | 0
1 | 192.168.1.102 | 54321 | up | 0.500000 | standby | 0 | false | 0
(2 rows) TEST=# select * from sys_stat_replication;
PID | USESYSID | USENAME | APPLICATION_NAME | CLIENT_ADDR | CLIENT_HOSTNAME | CLIENT_PORT | BACKEND_START | BACKEND_XMIN | STATE | SENT_LOCATION | WRITE_LOCATION | FLUSH_LOCATION | REPLAY_LOCATION | SYNC_PRIORITY | SYNC_STATE
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
25143 | 10 | SYSTEM | node102 | 192.168.1.102 | | 61622 | 2022-06-22 10:22:48.771995+08 | | streaming | 0/1A0000D0 | 0/1A0000D0 | 0/1A0000D0 | 0/1A0000D0 | 2 | sync
(1 row)

二、备库数据库主机宕机

TEST=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.1.101 | 54321 | up | 0.500000 | primary | 3 | true | 0
1 | 192.168.1.102 | 54321 | down | 0.500000 | standby | 0 | false | 0
(2 rows)

三、准备新的主机环境

Tips:

1)系统环境准备参考《KingbaseES 官方文档》

https://help.kingbase.com.cn/stage-api/profile/document/kes/v8r3/html/highly/clusterware/cluster-manage.html

2)通用机不需要安装数据库软件,专用机需要安装和主库相同版本软件。

3)创建和主库相同的集群目录结构。

1、主库集群目录结构:

[kingbase@node101 R3HA]$ pwd
/home/kingbase/cluster/R3HA [kingbase@node101 R3HA]$ ls -lh
total 33M
drwxrwxr-x 2 kingbase kingbase 4.0K Jun 21 19:04 archivedir
drwxrwxr-x 8 kingbase kingbase 95 Jun 13 19:28 db
-rw-r--r-- 1 kingbase root 29M Mar 29 15:10 db.zip
drwxrwxr-x 6 kingbase kingbase 48 Apr 1 2021 kingbasecluster
-rwxr-xr-x 1 kingbase root 4.3M Mar 29 15:10 kingbasecluster.zip
drwxr-xr-x 3 kingbase root 4.0K Apr 12 15:08 log
drwxr-xr-x 3 kingbase root 28 Mar 29 15:11 run
-rw------- 1 kingbase kingbase 8.0K Jun 22 10:42 template.bk

2、备库创建和主库相同的目录结构

[kingbase@node102 cluster]$ mkdir -p /home/kingbase/cluster/R3HA/
[kingbase@node102 cluster]$ cd R3HA
[kingbase@node102 R3HA]$ mkdir archivedir db kingbasecluster log run
[kingbase@node102 R3HA]$ ls -lh
total 0
drwxrwxr-x 2 kingbase kingbase 6 Jun 22 10:30 archivedir
drwxrwxr-x 2 kingbase kingbase 6 Jun 22 10:30 db
drwxrwxr-x 2 kingbase kingbase 6 Jun 22 10:30 kingbasecluster
drwxrwxr-x 2 kingbase kingbase 6 Jun 22 10:30 log
drwxrwxr-x 2 kingbase kingbase 6 Jun 22 10:30 run

3、从主库拷贝db目录下数据到备库(主库运行排除data目录)

[kingbase@node101 db]$ scp -r bin node102:/home/kingbase/cluster/R3HA/db
[kingbase@node101 db]$ scp -r etc node102:/home/kingbase/cluster/R3HA/db
[kingbase@node101 db]$ scp -r kb_scripts node102:/home/kingbase/cluster/R3HA/db
[kingbase@node101 db]$ scp -r lib node102:/home/kingbase/cluster/R3HA/db
[kingbase@node101 db]$ scp -r share node102:/home/kingbase/cluster/R3HA/db
[kingbase@node101 db]$ scp -r kingbase.log node102:/home/kingbase/cluster/R3HA/db

4、从主库拷贝kingbasecluster目录下所有数据到备库

[kingbase@node101 kingbasecluster]$ scp -r * node102:/home/kingbase/cluster/R3HA/kingbasecluster/

5、从主库拷贝log目录下所有数据到备库:(kingbasecluster.pid文件无需拷贝)

[kingbase@node101 log]$ scp -r * node102:/home/kingbase/cluster/R3HA/log/

6、从主库拷贝run目录下所有数据到备库:

[kingbase@node101 run]$ scp -r * node102:/home/kingbase/cluster/R3HA/run

7、从主库拷贝文件到备库

[kingbase@node101 R3HA]$ scp template.bk node102:/home/kingbase/cluster/R3HA

四、创建主备流复制

1、执行sys_basebackup克隆备库

[kingbase@node102 bin]$ ./sys_basebackup -h 192.168.1.101 -U SYSTEM -W 123456  -p 54321 -F p -x -v -P -D /home/kingbase/cluster/R3HA/db/data
transaction log start point: 0/1B000028 on timeline 7
108616/108616 kB (100%), 1/1 tablespace
transaction log end point: 0/1B0000F8
sys_basebackup: base backup completed

2、配置备库data目录权限

[kingbase@node102 db]$ chmod 700 data

3、配置recovery.conf

[kingbase@node102 data]$ cp ../etc/recovery.done ./recovery.conf

# 编辑recovery.conf:
[kingbase@node102 data]$ cat recovery.conf
standby_mode='on'
primary_conninfo='port=54321 host=192.168.1.101 user=SYSTEM password=MTIzNDU2 application_name=node101'
recovery_target_timeline='latest'
primary_slot_name ='slot_node101'

4、执行sys_ctl启动备库数据库服务

[kingbase@node102 bin]$ ./sys_ctl start -D /home/kingbase/cluster/R3HA/db/data
server starting
[kingbase@node102 bin]$ LOG: redirecting log output to logging collector process
HINT: Future log output will appear in directory "/home/kingbase/cluster/R3HA/db/data/sys_log". # 查看备库数据库服务
[kingbase@node102 bin]$ ps -ef|grep kingbase
kingbase 17885 1 0 11:10 pts/0 00:00:00 /home/kingbase/cluster/R3HA/db/bin/kingbase -D /home/kingbase/cluster/R3HA/db/data
kingbase 17886 17885 0 11:10 ? 00:00:00 kingbase: logger process
kingbase 17887 17885 0 11:10 ? 00:00:00 kingbase: startup process recovering 00000007000000000000001C
kingbase 17891 17885 0 11:10 ? 00:00:00 kingbase: checkpointer process
kingbase 17892 17885 0 11:10 ? 00:00:00 kingbase: writer process
kingbase 17893 17885 0 11:10 ? 00:00:00 kingbase: stats collector process
kingbase 17894 17885 0 11:10 ? 00:00:00 kingbase: wal receiver process streaming 0/1C000060

5、查看流复制状态

# 查看节点状态(此时备库状态为down)
TEST=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.1.101 | 54321 | up | 0.500000 | primary | 4 | true | 0
1 | 192.168.1.102 | 54321 | down | 0.500000 | standby | 0 | false | 0
(2 rows) # 查看流复制状态
TEST=# select * from sys_stat_replication;
PID | USESYSID | USENAME | APPLICATION_NAME | CLIENT_ADDR | CLIENT_HOSTNAME | CLIENT_PORT | BACKEND_START | BACKEND_XMIN | STATE | SENT_LOCATION | WRITE_LOCATION | FLUSH_LOCATION | REPLAY_LOCATION | SYNC_PRIORITY | SYNC_STATE
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
25242 | 10 | SYSTEM | node101 | 192.168.1.102 | | 33495 | 2022-06-22 11:11:02.085186+08 | | streaming | 0/1C000060 | 0/1C000060 | 0/1C000060 | 0/1C000060 | 0 | async
(1 row) # 查看复制槽信息
TEST=# select * from sys_replication_slots;
SLOT_NAME | PLUGIN | SLOT_TYPE | DATOID | DATABASE | ACTIVE | ACTIVE_PID | XMIN | CATALOG_XMIN | RESTART_LSN | CONFIRMED_FLUSH_LSN
--------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
slot_node101 | | physical | | | t | 25242 | 2112 | | 0/1C000060 |
slot_node102 | | physical | | | f | | 2112 | | 0/1A0000D0 |
(2 rows) # 从以上可知,流复制状态正常。

五、备库配置kingbasecluster管理

Tips:

1)由于kingbasecluster的配置文件从主库复制而来,在备库需要做修改。

2)修改文件为kingbasecluster.conf 和 HAmodule.conf文件。

1、配置kingbasecluster.conf 文件

2、修改HAmodule.conf文件(包括kingbasecluster和db目录下)

[kingbase@node102 etc]$ vi HAmodule.conf
#the current node ip.example:KB_LOCALHOST_IP="192.168.28.128"
KB_LOCALHOST_IP="192.168.1.102" #recoord the names of local node.example:NODE_NAME="node1"
NODE_NAME="node102"

3、root用户手工启动kingbasecluster

[root@node102 bin]# ./kingbasecluster -n >/home/kingbase/cluster/R3HA/log/cluster.log 2>&1 &
[1] 23929

4、查看kingbasecluster服务启动状态

[root@node102 bin]# netstat -an |grep 9999
tcp 0 0 0.0.0.0:9999 0.0.0.0:* LISTEN
tcp6 0 0 :::9999 :::* LISTEN # 服务端口处于监听,kingbasecluster服务启动成功。

5、将备库节点注册到集群

[kingbase@node102 bin]$ ./pcp_attach_node -h 192.168.1.101 -U kingbase 1
Password:
pcp_attach_node -- Command Successful # 备库注册后,节点状态为”up“。
[kingbase@node101 bin]$ ./ksql -U SYSTEM -W 123456 TEST -p 9999
ksql (V008R003C002B0290)
Type "help" for help. TEST=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.1.101 | 54321 | up | 0.500000 | primary | 6 | true | 0
1 | 192.168.1.102 | 54321 | up | 0.500000 | standby | 0 | false | 0
(2 rows)

六、重启集群测试(生产环境在无业务时执行)

1、关闭备库的kingbasecluster服务

[root@node102 bin]# cd /home/kingbase/cluster/R3HA/kingbasecluster/bin
[root@node102 bin]# ./kingbasecluster -m fast stop
2022-06-22 11:41:01: pid 28321: LOG: stop request sent to kingbasecluster. waiting for termination...
.done.
[1]+ Done ./kingbasecluster -n > /home/kingbase/cluster/R3HA/log/cluster.log 2>&1

2、重启集群

[kingbase@node101 bin]$ ./kingbase_monitor.sh restart
-----------------------------------------------------------------------
2022-06-22 11:42:50 KingbaseES automation beging...
......
......................
all started..
...
now we check again
=======================================================================
| ip | program| [status]
[ 192.168.1.101]| [kingbasecluster]| [active]
[ 192.168.1.102]| [kingbasecluster]| [active]
[ 192.168.1.101]| [kingbase]| [active]
[ 192.168.1.102]| [kingbase]| [active]
======================================================================= # 如上所示,集群启动成功。

3、备库配置cron计划任务

[root@node101 ~]# cat /etc/cron.d/KINGBASECRON

*/1 * * * * kingbase  /home/kingbase/cluster/R3HA/db/bin/network_rewind.sh
*/1 * * * * root /home/kingbase/cluster/R3HA/kingbasecluster/bin/restartcluster.sh

七、集群failover切换测试(生产环境业务低峰测试)

1、配置主备库arping文件属主及权限(专用机不需要此操作)

[root@node102 bin]# cd /home/kingbase/cluster/R3HA/db/bin
[root@node102 bin]# chown root.root arping
[root@node102 bin]# chmod u+s arping [root@node102 bin]# ls -lh arping
-rwsr-xr-x 1 root root 33K Apr 1 2021 arping

2、failover主备切换测试

1)关闭主库数据库服务

[kingbase@node101 bin]$ ./sys_ctl stop -D ../data
waiting for server to shut down.... done
server stopped

2)查看切换后的节点状态

[kingbase@node102 bin]$ ./ksql -U SYSTEM -W 123456 TEST -p 9999
ksql (V008R003C002B0290)
Type "help" for help. TEST=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.1.101 | 54321 | down | 0.500000 | standby | 0 | false | 0
1 | 192.168.1.102 | 54321 | up | 0.500000 | primary | 0 | true | 0
(2 rows) # 如上所示,node102已经切换为primary。

3、recovery原主库作为备库加入集群

1)配置recovery.conf

[kingbase@node101 data]$ cp ../etc/recovery.done ./recovery.conf

[kingbase@node101 data]$ cat recovery.conf
standby_mode='on'
primary_conninfo='port=54321 host=192.168.1.102 user=SYSTEM password=MTIzNDU2 application_name=node101'
recovery_target_timeline='latest'
primary_slot_name ='slot_node101'

2)在新主库创建复制槽

TEST=# select * from sys_replication_slots;
SLOT_NAME | PLUGIN | SLOT_TYPE | DATOID | DATABASE | ACTIVE | ACTIVE_PID | XMIN | CATALOG_XMIN | RESTART_LSN | CONFIRMED_FLUSH_LSN
-----------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
(0 rows) TEST=# select sys_create_physical_replication_slot('slot_node101');
SYS_CREATE_PHYSICAL_REPLICATION_SLOT
--------------------------------------
(slot_node101,)
(1 row) TEST=# select sys_create_physical_replication_slot('slot_node102');
SYS_CREATE_PHYSICAL_REPLICATION_SLOT
--------------------------------------
(slot_node102,)
(1 row) TEST=# select * from sys_replication_slots;
SLOT_NAME | PLUGIN | SLOT_TYPE | DATOID | DATABASE | ACTIVE | ACTIVE_PID | XMIN | CATALOG_XMIN | RESTART_LSN | CONFIRMED_FLUSH_LSN
--------------+--------+-----------+--------+----------+--------+------------+------+--------------+-------------+---------------------
slot_node101 | | physical | | | f | | | | |
slot_node102 | | physical | | | f | | | | |
(2 rows)

3、启动原主库(新备库)数据库服务

[kingbase@node101 bin]$ ./sys_ctl start -D ../data
server starting
[kingbase@node101 bin]$ LOG: redirecting log output to logging collector process
HINT: Future log output will appear in directory "/home/kingbase/cluster/R3HA/db/data/sys_log".

4、在新主库查看流复制和节点状态

# 查看流复制状态
TEST=# select * from sys_stat_replication;
PID | USESYSID | USENAME | APPLICATION_NAME | CLIENT_ADDR | CLIENT_HOSTNAME | CLIENT_PORT | BACKEND_START | BACKEND_XMIN | STATE | SENT_LOCATION | WRITE_LOCATION | FLUSH_LOCATION | REPLAY_LOCATION | SYNC_PRIORITY | SYNC_STATE
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+--------------+-----------+---------------+----------------+----------------+-----------------+---------------+------------
17271 | 10 | SYSTEM | node101 | 192.168.1.101 | | 17540 | 2022-06-22 14:09:44.848740+08 | | streaming | 0/22050C18 | 0/22050C18 | 0/22050C18 | 0/2204FFD8 | 0 | async
(1 row) # 查看集群节点状态
[kingbase@node102 bin]$ ./ksql -U SYSTEM -W 123456 TEST -p 9999
ksql (V008R003C002B0290)
Type "help" for help. TEST=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay
---------+---------------+-------+--------+-----------+---------+------------+-------------------+-------------------
0 | 192.168.1.101 | 54321 | up | 0.500000 | standby | 0 | false | 0
1 | 192.168.1.102 | 54321 | up | 0.500000 | primary | 0 | true | 0
(2 rows) # 由以上可知,集群failover切换完成!!!

八、总结

对于KingbaseES V8R3集群在线添加新的管理节点,操作较复杂,尤其是对kingbasecluster配置文件的修改,注意细节部分,否则集群的启动和切换都会受到影响。

KingbaseES V8R3集群维护案例之---在线添加备库管理节点的更多相关文章

  1. KingbaseES V8R6集群维护案例之---停用集群node_export进程

    案例说明: 在KingbaseES V8R6集群启动时,会启动node_exporter进程,此进程主要用于向kmonitor监控服务输出节点状态信息.在系统安全漏洞扫描中,提示出现以下安全漏洞: 对 ...

  2. KingbaseES V8R6集群维护案例之---将securecmdd通讯改为ssh案例

    案例说明: 在KingbaseES V8R6的后期版本中,为了解决有的主机之间不允许root用户ssh登录的问题,使用了securecmdd作为集群部署分发和通讯的服务,有生产环境通过漏洞扫描,在88 ...

  3. KingbaseES V8R6集群维护案例之--单实例数据迁移到集群案例

    案例说明: 生产环境是单实例,测试环境是集群,现需要将生产环境的数据迁移到集群中运行,本文档详细介绍了从单实例环境恢复数据到集群环境的操作步骤,可以作为生产环境迁移数据的参考. 适用版本: Kingb ...

  4. KingbaseES V8R6集群维护案例之--修改securecmdd工具服务端口

    案例说明: 在一些生产环境,为了系统安全,不支持ssh互信,或限制root用户使用ssh登录,KingbaseES V8R6可以使用securecmdd工具支持主机之间的通讯.securecmdd工具 ...

  5. KingbaseES V8R3集群管理维护案例之---集群迁移单实例架构

    案例说明: 在生产中,需要将KingbaseES V8R3集群转换为单实例架构,可以采用以下方式快速完成集群架构的迁移. 适用版本: KingbaseES V8R3 当前数据库版本: TEST=# s ...

  6. KingbaseES V8R3集群管理和维护案例之---failover切换wal日志变化分析

    ​ 案例说明: 本案例通过对KingbaseES V8R3集群failover切换过程进行观察,分析了主备库切换后wal日志的变化,对应用者了解KingbaseES V8R3(R6) failover ...

  7. KingbaseES V8R3集群运维案例之---主库系统down failover切换过程分析

    ​ 案例说明: KingbaseES V8R3集群failover时两个cluster都会触发,但只有一个cluster会调用脚本去执行真正的切换流程,另一个有对应的打印,但不会调用脚本,只是走相关的 ...

  8. KingbaseES V8R3集群运维案例之---用户自定义表空间管理

    ​案例说明: KingbaseES 数据库支持用户自定义表空间的创建,并建议表空间的文件存储路径配置到数据库的data目录之外.本案例复现了,当用户自定义表空间存储路径配置到data下时,出现的故障问 ...

  9. KingbaseES V8R6集群维护之--修改数据库服务端口案例

    ​ 案例说明: 对于KingbaseES数据库单实例环境,只需要修改kingbase.conf文件的'port'参数即可,但是对于KingbaseES V8R6集群中涉及到多个配置文件的修改,并且在应 ...

随机推荐

  1. 关于升级最新版本node.js你知道多少?

    1.先检查版本 node -v 2.清除缓存 npm cache clean -f 3.全局安装管理node.js版本工具 npm install -g n 4.确认安装最新版本 n stable 5 ...

  2. Java中Double类型数据比较大小

    方法一:转成字符串之后比较 如果要比较的两个double数据的字符串精度相等,可以将数据转换成string然后借助string的equals方法来间接实现比较两个double数据是否相等.注意这种方法 ...

  3. 【问题解决】Alpine镜像中执行jstack、arthas等命令提示Unable to get pid of LinuxThreads manager thread

    问题现象 最近在处理项目上问题发现之前同事构建的AlpineLinux的镜像不能执行jstack等JDK命令,报错如下. Unable to get pid of LinuxThreads manag ...

  4. 8.3 如何在Windows电脑安装Java开发环境(JDK)

    下载 来到JDK官方下载界面,找到Java SE 8(简称JDK 8)后面的JDK下载,来到该界面,先同意协议,然后下载对应平台的JDK,我们这里下载Windows x64. 安装 双击安装就行了了, ...

  5. 5.RDD操作综合实例

    一.词频统计 A. 分步骤实现 1.准备文件 (1)下载小说或长篇新闻稿 (2)上传到hdfs上 2.读文件创建RDD 3.分词 4. ·排除大小写lower(),map() ·标点符号re.spli ...

  6. namespace_std 杂题选讲

    CF1458C Latin Square 2021 EC Final C. Random Shuffle [THUPC2021] 混乱邪恶 [JOISC2022] 制作团子 3 2022 集训队互测 ...

  7. 一文聊透 Netty 核心引擎 Reactor 的运转架构

    本系列Netty源码解析文章基于 4.1.56.Final版本 本文笔者来为大家介绍下Netty的核心引擎Reactor的运转架构,希望通过本文的介绍能够让大家对Reactor是如何驱动着整个Nett ...

  8. halcon变量窗口的图像变量不显示,重启软件和电脑都没用

    有幸遇到halcon变量窗口的图像变量不显示,重启软件和电脑都没用这个沙雕问题,也是找了蛮久才发现解决办法特意记录一下. 这是正常情况下的窗口(左边)和图像变量不显示的窗口(右边): 解决方法: 鼠标 ...

  9. Solution -「SDOI2011」拦截导弹

    Sol.   题目要求一个数对序列的二维最长下降子序列,我们称其为 Q.并求出每一个元素分别在可能的 Q 中出现了多少次.   直接 Dp,时间复杂度 \(O(n^2)\) 不行.考虑 CDQ 分治 ...

  10. 【每天学一点-01】 在SpringBoot项目中使用Swagger2

    今天在做毕设的时候,发现在前后端分离的情况下,去调用接口数据时很不方便,然后回想过去,和同学一起做项目的时候,他负责后端,我负责前端,当时调用他的弄好的接口可以说是非常方便,主要是可以通过UI页面直接 ...