案例说明:

生产环境是集群环境,测试环境是集群,现需要将生产环境的数据迁移到测试集群中运行,本文档详细介绍了从集群环境迁移数据的操作步骤,可以作为生产环境迁移数据的参考。

适用版本:

KingbaseES V8R6

本案例数据库版本(集群使用相同的版本):

test=# select version();
version
----------------------------------------------------------------------------------------------------------------------
KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

生产集群节点信息:

[kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.1.201 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

测试集群节点信息:

[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 13 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 13 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

一、生产环境迁移数据前的准备

1、生产环境数据信息

prod=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+--------+----------+----------+-------------+-------------------
esrep | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod2 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
security | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
template0 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
template1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
test | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
(8 rows) prod=# select count(*) from t2;
count
--------
100000
(1 row)

2、关闭生产集群

[kingbase@node1 bin]$ ./sys_monitor.sh stop
2022-06-20 16:00:46 Ready to stop all DB ...
.......
2022-06-20 16:01:02 DB on "[192.168.1.202]" stop success.
2022-06-20 16:01:02 begin to stop DB on "[192.168.1.101]".
waiting for server to shut down....... done
server stopped
2022-06-20 16:01:06 DB on "[192.168.1.201]" stop success.
2022-06-20 16:01:06 Done.

二、迁移生产数据到测试环境

Tips:

1)将生产数据迁移到集群,需要停止生产数据库服务,根据data目录数据的大小,要估算停机窗口时间。

2)在生产数据库前,建议手工创建检查点,如果wal日志比较大,建议备份后,清理wal日志,只需要保留最近一天的日志到最近检查点后即可。

3)需要跨主机将生产库主库data目录拷贝到集群的主备库节点,需 根据网络带宽和节点数,估算整个拷贝时间。

1、关闭测试集群

[kingbase@node101 bin]$ ./sys_monitor.sh stop
2022-06-20 16:10:46 Ready to stop all DB ... 2022-06-20 16:11:02 DB on "[192.168.1.102]" stop success.
2022-06-20 16:11:02 begin to stop DB on "[192.168.1.101]".
waiting for server to shut down....... done
server stopped
2022-06-20 16:11:06 DB on "[192.168.1.101]" stop success.
2022-06-20 16:11:06 Done.

2、将测试库数据备份

[kingbase@node101 kingbase]$ mv data data.bk

3、拷贝生产集群主库data到测试集群主备库(所有节点)

1)拷贝生产数据到测试库

[kingbase@node1 kingbase]$ scp -r data node101:/home/kingbase/cluster/R6HA/kha/kingbase/
[kingbase@node1 kingbase]$ scp -r data node102:/home/kingbase/cluster/R6HA/kha/kingbase/

2)备库创建standby.signal

[kingbase@node102 data]$ touch standby.signal

3)复制测试集群的数据库配置文件:(所有节点)

[kingbase@node101 data]$ cp ../data.bk/kingbase.auto.conf ./
[kingbase@node101 data]$ cp ../data.bk/kingbase.conf ./
[kingbase@node101 data]$ cp ../data.bk/es_rep.conf ./ # 查看kingbase.auto.conf
[kingbase@node101 data]$ cat kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
enable_upper_colname = 'on'
wal_retrieve_retry_interval = '5000'
primary_conninfo = 'user=system connect_timeout=10 host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node101'
recovery_target_timeline = 'latest'
primary_slot_name = 'repmgr_slot_1'
synchronous_standby_names = ''

三、重新注册集群节点

Tips:

因为data数据中存储的是原生产集群的节点信息(esrep库),所以要根据测试库的repmgr.conf文件重新注册节点。

1、启动主备库数据库服务

[kingbase@node101 bin]$ ./sys_ctl restart -D ../data
waiting for server to shut down.... done
.......
server started

2、查看节点状态信息

[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+---------------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | ? unreachable | | default | 100 | ? | host=192.168.1.201 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | - failed | node1 | default | 100 | ? | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 WARNING: following issues were detected
- unable to connect to node "node1" (ID: 1)
- node "node1" (ID: 1) is registered as an active primary but is unreachable
- unable to connect to node "node2" (ID: 2) # 如上所示,因为repmgr.conf和esrep库的注册信息不一致,现在集群节点处于非正常状态。

3、重新注册主备库

1)注册主库

[kingbase@node101 bin]$ ./repmgr primary register --force
INFO: connecting to primary database...
INFO: "repmgr" extension is already installed
NOTICE: primary node record (ID: 1) updated [kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | - failed | node101 | default | 100 | ? | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 WARNING: following issues were detected
- unable to connect to node "node2" (ID: 2)

2)注册备库

[kingbase@node102 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node102" (ID: 2)
INFO: connecting to primary database
INFO: standby registration complete NOTICE: standby node "node102" (ID: 2) successfully registered
[kingbase@node102 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 1 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 # 如上所示,节点注册完成后,集群节点状态正常。

四、查看流复制状态

Tips:

如果生产集群和测试集群的流复制复制槽名称不一致,可能需要重建复制槽。

# 复制槽信息
test=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
repmgr_slot_2 | | physical | | | f | t | 26620 | 963 | | 0/4E000FB8 |
repmgr_slot_1 | | physical | | | f | f | | | | |
(2 rows) # 流复制状态信息
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+---- 26620 | 10 | system | node102 | 192.168.1.102 | | 49454 | 2022-06-20 17:08:15.347515+08 | | streaming | 0/4E000FB8 | 0/4E000FB8 | 0/4E000FB8 | 0/4E000FB8 | | | | 1 | sync | 2022-06-20 17:11:44.277666+08
(1 row)

五、验证数据

1、查看迁移后数据

test=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+--------+----------+----------+-------------+-------------------
esrep | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod2 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
security | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
template0 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
template1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
test | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
(8 rows) prod=# select count(*) from t2;
count
--------
100000
(1 row)

2、重启集群

[kingbase@node101 bin]$ ./sys_monitor.sh restart
2022-06-20 17:12:39 Ready to stop all DB ...
.......
2022-06-20 17:13:14 repmgrd on "[192.168.1.102]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node101 | primary | * running | | running | 28907 | no | n/a
2 | node102 | standby | running | node101 | running | 26280 | no | 2 second(s) ago
[2022-06-20 17:13:19] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/log/kbha.log" [2022-06-20 17:13:22] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/log/kbha.log" 2022-06-20 17:13:29 Done.

六、总结

 1、从集群环境迁移数据,如果需要保证数据一致,必须要将集群停库,对于生产环境,要考虑停机窗口。
2、如果需要将目标集群数据重新加载到新的集群,需要将目标集群数据做逻辑备份,但是在导入时如果有重复数据需注意处理。(如将测试数据再导入到迁移后的集群中,可能有许多数据会重复)。
3、申请停机窗口,要考虑源集群数据量的大小、主机间的网络带宽、集群节点数、集群配置时间、集群启动故障的处理时间等。

KingbbaseES V8R6集群维护案例之---集群之间数据迁移的更多相关文章

  1. KingbaseES V8R3集群维护案例之---在线添加备库管理节点

    案例说明: 在KingbaseES V8R3主备流复制的集群中 ,一般有两个节点是集群的管理节点,分为master和standby:如对于一主二备的架构,其中有两个节点是管理节点,三个数据节点:管理节 ...

  2. KingbaseES V8R6集群维护案例之--单实例数据迁移到集群案例

    案例说明: 生产环境是单实例,测试环境是集群,现需要将生产环境的数据迁移到集群中运行,本文档详细介绍了从单实例环境恢复数据到集群环境的操作步骤,可以作为生产环境迁移数据的参考. 适用版本: Kingb ...

  3. KingbaseES V8R3集群管理维护案例之---集群迁移单实例架构

    案例说明: 在生产中,需要将KingbaseES V8R3集群转换为单实例架构,可以采用以下方式快速完成集群架构的迁移. 适用版本: KingbaseES V8R3 当前数据库版本: TEST=# s ...

  4. KingbaseES V8R6集群维护案例之---停用集群node_export进程

    案例说明: 在KingbaseES V8R6集群启动时,会启动node_exporter进程,此进程主要用于向kmonitor监控服务输出节点状态信息.在系统安全漏洞扫描中,提示出现以下安全漏洞: 对 ...

  5. KingbaseES V8R6集群维护案例之---将securecmdd通讯改为ssh案例

    案例说明: 在KingbaseES V8R6的后期版本中,为了解决有的主机之间不允许root用户ssh登录的问题,使用了securecmdd作为集群部署分发和通讯的服务,有生产环境通过漏洞扫描,在88 ...

  6. KingbaseES V8R6集群维护案例之--修改securecmdd工具服务端口

    案例说明: 在一些生产环境,为了系统安全,不支持ssh互信,或限制root用户使用ssh登录,KingbaseES V8R6可以使用securecmdd工具支持主机之间的通讯.securecmdd工具 ...

  7. 手把手教你通过Ambari新建Hadoop集群图解案例

    手把手教你通过Ambari新建Hadoop集群图解案例 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 登陆系统之后,会看到Ambari空空如也的欢迎界面,接下来我们就需要介绍如何通 ...

  8. Redis集群维护、运营的相关命令与工具介绍

    Redis集群的搭建.维护.运营的相关命令与工具介绍 一.概述 此教程主要介绍redis集群的搭建(Linux),集群命令的使用,redis-trib.rb工具的使用,此工具是ruby语言写的,用于集 ...

  9. 基于Ambari的WebUI实现集群扩容案例

    基于Ambari的WebUI实现集群扩容案例 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.将HDP的服务托管给Ambari服务 1>.点击“Service Auto S ...

随机推荐

  1. JavaScript有哪些数据类型,它们的区别?

    基本数据类型:number.string.boolean.Undefined.NaN(特殊值).BigInt.Symbol 引入数据类型:Object NaN是JS中的特殊值,表示非数字,NaN不是数 ...

  2. Elasticsearch学习系列四(聚合搜索)

    聚合分析 聚合分析是数据库中重要的功能特性,完成对一个查询的集中数据的聚合计算.如:最大值.最小值.求和.平均值等等.对一个数据集求和,算最大最小值等等,在ES中称为指标聚合,而对数据做类似关系型数据 ...

  3. Servlet之Request和Response 解析

    原理 tomcat服务器会根据请求url中的资源路径,创建对应的Servlet的对象 tomcat服务器.会创建request和response对象,request对象中封装请求消息数据. tomca ...

  4. centos 7安装gitlab及使用

    GitLab 概述: 是一个利用 Ruby on Rails 开发的开源应用程序,实现一个自托管的 Git 项目仓库,可通过 Web界面迚行访问公开的戒者私人项目.Ruby on Rails 是一个可 ...

  5. 手写一个虚拟DOM库,彻底让你理解diff算法

    所谓虚拟DOM就是用js对象来描述真实DOM,它相对于原生DOM更加轻量,因为真正的DOM对象附带有非常多的属性,另外配合虚拟DOM的diff算法,能以最少的操作来更新DOM,除此之外,也能让Vue和 ...

  6. input标签的事件之oninput事件

    最近在写前端的时候,用到了oninput事件.(其中也涉及了onclick) 问题:鼠标点击数字和运算符的时候,文本框里的内容到达一定长度时,会出现无法继续往后面跟随光标的问题. 解决:见下面代码 这 ...

  7. MYSQL索引的建立、删除以及简单使用

    一.前期数据准备 1.建表 CREATE TABLE `user` ( `uid` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(50) DEFAUL ...

  8. Redis入门到精通01

    Redis入门到精通 目录 Redis入门到精通 一.Redis缓存框架基本介绍 1.1Redis的应用场景 二.Redis的安装方式 2.1Windows操作系统安装Redis 2.2Linux操作 ...

  9. fiddler5+雷电模拟器4.0对app抓包设置

    这次项目刚好需要对微信小程序进行抓包分析,二话不说拿起手机咔咔一顿连接,发现在备用机苹果上抓包正常,但主的安卓机上证书怎么装都失败,原来安卓7版本以后对用户自行安装的证书不再信任,所以无法抓包. 因为 ...

  10. 多校B层冲刺NOIP20211110 字符配对游戏

    原题 问题描述 操场边,运动会没有项目的同学也没闲着,经过几天的研究,他们发明了一个很有意思的字符串配对游戏,两位同学准备两张白纸,第一个同学在纸上写一个整数N和一个由小写字母组成的字符串S,将S重复 ...