案例说明:

生产环境是集群环境,测试环境是集群,现需要将生产环境的数据迁移到测试集群中运行,本文档详细介绍了从集群环境迁移数据的操作步骤,可以作为生产环境迁移数据的参考。

适用版本:

KingbaseES V8R6

本案例数据库版本(集群使用相同的版本):

test=# select version();
version
----------------------------------------------------------------------------------------------------------------------
KingbaseES V008R006C005B0041 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

生产集群节点信息:

[kingbase@node1 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=192.168.1.201 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | running | node1 | default | 100 | 1 | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

测试集群节点信息:

[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 13 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 13 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

一、生产环境迁移数据前的准备

1、生产环境数据信息

prod=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+--------+----------+----------+-------------+-------------------
esrep | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod2 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
security | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
template0 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
template1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
test | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
(8 rows) prod=# select count(*) from t2;
count
--------
100000
(1 row)

2、关闭生产集群

[kingbase@node1 bin]$ ./sys_monitor.sh stop
2022-06-20 16:00:46 Ready to stop all DB ...
.......
2022-06-20 16:01:02 DB on "[192.168.1.202]" stop success.
2022-06-20 16:01:02 begin to stop DB on "[192.168.1.101]".
waiting for server to shut down....... done
server stopped
2022-06-20 16:01:06 DB on "[192.168.1.201]" stop success.
2022-06-20 16:01:06 Done.

二、迁移生产数据到测试环境

Tips:

1)将生产数据迁移到集群,需要停止生产数据库服务,根据data目录数据的大小,要估算停机窗口时间。

2)在生产数据库前,建议手工创建检查点,如果wal日志比较大,建议备份后,清理wal日志,只需要保留最近一天的日志到最近检查点后即可。

3)需要跨主机将生产库主库data目录拷贝到集群的主备库节点,需 根据网络带宽和节点数,估算整个拷贝时间。

1、关闭测试集群

[kingbase@node101 bin]$ ./sys_monitor.sh stop
2022-06-20 16:10:46 Ready to stop all DB ... 2022-06-20 16:11:02 DB on "[192.168.1.102]" stop success.
2022-06-20 16:11:02 begin to stop DB on "[192.168.1.101]".
waiting for server to shut down....... done
server stopped
2022-06-20 16:11:06 DB on "[192.168.1.101]" stop success.
2022-06-20 16:11:06 Done.

2、将测试库数据备份

[kingbase@node101 kingbase]$ mv data data.bk

3、拷贝生产集群主库data到测试集群主备库(所有节点)

1)拷贝生产数据到测试库

[kingbase@node1 kingbase]$ scp -r data node101:/home/kingbase/cluster/R6HA/kha/kingbase/
[kingbase@node1 kingbase]$ scp -r data node102:/home/kingbase/cluster/R6HA/kha/kingbase/

2)备库创建standby.signal

[kingbase@node102 data]$ touch standby.signal

3)复制测试集群的数据库配置文件:(所有节点)

[kingbase@node101 data]$ cp ../data.bk/kingbase.auto.conf ./
[kingbase@node101 data]$ cp ../data.bk/kingbase.conf ./
[kingbase@node101 data]$ cp ../data.bk/es_rep.conf ./ # 查看kingbase.auto.conf
[kingbase@node101 data]$ cat kingbase.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
enable_upper_colname = 'on'
wal_retrieve_retry_interval = '5000'
primary_conninfo = 'user=system connect_timeout=10 host=192.168.1.102 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 application_name=node101'
recovery_target_timeline = 'latest'
primary_slot_name = 'repmgr_slot_1'
synchronous_standby_names = ''

三、重新注册集群节点

Tips:

因为data数据中存储的是原生产集群的节点信息(esrep库),所以要根据测试库的repmgr.conf文件重新注册节点。

1、启动主备库数据库服务

[kingbase@node101 bin]$ ./sys_ctl restart -D ../data
waiting for server to shut down.... done
.......
server started

2、查看节点状态信息

[kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+---------------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | ? unreachable | | default | 100 | ? | host=192.168.1.201 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | - failed | node1 | default | 100 | ? | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 WARNING: following issues were detected
- unable to connect to node "node1" (ID: 1)
- node "node1" (ID: 1) is registered as an active primary but is unreachable
- unable to connect to node "node2" (ID: 2) # 如上所示,因为repmgr.conf和esrep库的注册信息不一致,现在集群节点处于非正常状态。

3、重新注册主备库

1)注册主库

[kingbase@node101 bin]$ ./repmgr primary register --force
INFO: connecting to primary database...
INFO: "repmgr" extension is already installed
NOTICE: primary node record (ID: 1) updated [kingbase@node101 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node2 | standby | - failed | node101 | default | 100 | ? | host=192.168.1.202 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 WARNING: following issues were detected
- unable to connect to node "node2" (ID: 2)

2)注册备库

[kingbase@node102 bin]$ ./repmgr standby register --force
INFO: connecting to local node "node102" (ID: 2)
INFO: connecting to primary database
INFO: standby registration complete NOTICE: standby node "node102" (ID: 2) successfully registered
[kingbase@node102 bin]$ ./repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+---------+---------+-----------+----------+----------+----------+----------+----------------------------------------------------------------------------------------------------------------------------------------------------
1 | node101 | primary | * running | | default | 100 | 1 | host=192.168.1.101 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
2 | node102 | standby | running | node101 | default | 100 | 1 | host=192.168.1.102 user=system dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 # 如上所示,节点注册完成后,集群节点状态正常。

四、查看流复制状态

Tips:

如果生产集群和测试集群的流复制复制槽名称不一致,可能需要重建复制槽。

# 复制槽信息
test=# select * from sys_replication_slots;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
---------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------
repmgr_slot_2 | | physical | | | f | t | 26620 | 963 | | 0/4E000FB8 |
repmgr_slot_1 | | physical | | | f | f | | | | |
(2 rows) # 流复制状态信息
test=# select * from sys_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | backend_xmin | state | sent_lsn | write_lsn | flush_lsn | replay_lsn | write_lag | flush_lag | replay_lag | sync_priority | sync_state | reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+---- 26620 | 10 | system | node102 | 192.168.1.102 | | 49454 | 2022-06-20 17:08:15.347515+08 | | streaming | 0/4E000FB8 | 0/4E000FB8 | 0/4E000FB8 | 0/4E000FB8 | | | | 1 | sync | 2022-06-20 17:11:44.277666+08
(1 row)

五、验证数据

1、查看迁移后数据

test=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+--------+----------+----------+-------------+-------------------
esrep | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
prod2 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
security | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
template0 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
template1 | system | UTF8 | ci_x_icu | zh_CN.UTF-8 | =c/system +
| | | | | system=CTc/system
test | system | UTF8 | ci_x_icu | zh_CN.UTF-8 |
(8 rows) prod=# select count(*) from t2;
count
--------
100000
(1 row)

2、重启集群

[kingbase@node101 bin]$ ./sys_monitor.sh restart
2022-06-20 17:12:39 Ready to stop all DB ...
.......
2022-06-20 17:13:14 repmgrd on "[192.168.1.102]" start success.
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node101 | primary | * running | | running | 28907 | no | n/a
2 | node102 | standby | running | node101 | running | 26280 | no | 2 second(s) ago
[2022-06-20 17:13:19] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/log/kbha.log" [2022-06-20 17:13:22] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/kha/kingbase/log/kbha.log" 2022-06-20 17:13:29 Done.

六、总结

 1、从集群环境迁移数据,如果需要保证数据一致,必须要将集群停库,对于生产环境,要考虑停机窗口。
2、如果需要将目标集群数据重新加载到新的集群,需要将目标集群数据做逻辑备份,但是在导入时如果有重复数据需注意处理。(如将测试数据再导入到迁移后的集群中,可能有许多数据会重复)。
3、申请停机窗口,要考虑源集群数据量的大小、主机间的网络带宽、集群节点数、集群配置时间、集群启动故障的处理时间等。

KingbbaseES V8R6集群维护案例之---集群之间数据迁移的更多相关文章

  1. KingbaseES V8R3集群维护案例之---在线添加备库管理节点

    案例说明: 在KingbaseES V8R3主备流复制的集群中 ,一般有两个节点是集群的管理节点,分为master和standby:如对于一主二备的架构,其中有两个节点是管理节点,三个数据节点:管理节 ...

  2. KingbaseES V8R6集群维护案例之--单实例数据迁移到集群案例

    案例说明: 生产环境是单实例,测试环境是集群,现需要将生产环境的数据迁移到集群中运行,本文档详细介绍了从单实例环境恢复数据到集群环境的操作步骤,可以作为生产环境迁移数据的参考. 适用版本: Kingb ...

  3. KingbaseES V8R3集群管理维护案例之---集群迁移单实例架构

    案例说明: 在生产中,需要将KingbaseES V8R3集群转换为单实例架构,可以采用以下方式快速完成集群架构的迁移. 适用版本: KingbaseES V8R3 当前数据库版本: TEST=# s ...

  4. KingbaseES V8R6集群维护案例之---停用集群node_export进程

    案例说明: 在KingbaseES V8R6集群启动时,会启动node_exporter进程,此进程主要用于向kmonitor监控服务输出节点状态信息.在系统安全漏洞扫描中,提示出现以下安全漏洞: 对 ...

  5. KingbaseES V8R6集群维护案例之---将securecmdd通讯改为ssh案例

    案例说明: 在KingbaseES V8R6的后期版本中,为了解决有的主机之间不允许root用户ssh登录的问题,使用了securecmdd作为集群部署分发和通讯的服务,有生产环境通过漏洞扫描,在88 ...

  6. KingbaseES V8R6集群维护案例之--修改securecmdd工具服务端口

    案例说明: 在一些生产环境,为了系统安全,不支持ssh互信,或限制root用户使用ssh登录,KingbaseES V8R6可以使用securecmdd工具支持主机之间的通讯.securecmdd工具 ...

  7. 手把手教你通过Ambari新建Hadoop集群图解案例

    手把手教你通过Ambari新建Hadoop集群图解案例 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 登陆系统之后,会看到Ambari空空如也的欢迎界面,接下来我们就需要介绍如何通 ...

  8. Redis集群维护、运营的相关命令与工具介绍

    Redis集群的搭建.维护.运营的相关命令与工具介绍 一.概述 此教程主要介绍redis集群的搭建(Linux),集群命令的使用,redis-trib.rb工具的使用,此工具是ruby语言写的,用于集 ...

  9. 基于Ambari的WebUI实现集群扩容案例

    基于Ambari的WebUI实现集群扩容案例 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.将HDP的服务托管给Ambari服务 1>.点击“Service Auto S ...

随机推荐

  1. SQL语句的整理

    mysql语句的整理 1.SQL DML 和 DDL 可以把 SQL 分为两个部分:数据操作语言 (DML) 和 数据定义语言 (DDL). SQL (结构化查询语言)是用于执行查询的语法.但是 SQ ...

  2. 交警也觉得妙——Python 识别车牌

    车牌识别在高速公路中有着广泛的应用,比如我们常见的电子收费(ETC)系统和交通违章车辆的检测,除此之外像小区或地下 车库门禁也会用到,基本上凡是需要对车辆进行身份检测的地方都会用到. 一些背景: 车牌 ...

  3. 用 PyQt5 快速构建一个简单的 GUI 应用

    1. 介绍 Python GUI 常用的 3 种框架是:Tkinter.wxpython.PyQt5 PyQt5 基于 Qt,是 Python 和 Qt 的结合体,可以用 Python 语言编写跨平台 ...

  4. -bash: /usr/local/maven/apache-maven-3.8.1/bin/mvn: 权限不够

    chmod a+x /usr/local/maven/apache-maven-3.8.1/bin/mvn

  5. JavaWEB-03-JDBC

    内容 JDBC `JDBC`简介 JDBC `JDBC`快速入门 JDBC API `JDBC API` 详解 数据库连接池 JDBC `JDBC`案例 1. JDBC入门 1.1 概述 概念 JDB ...

  6. 万字干货|Java基础面试题(2022版)

    目录 概念常识 Java 语言有哪些特点? JVM.JRE和JDK的关系 什么是字节码? 为什么说 Java 语言是"编译与解释并存"? Oracle JDK 和OpenJDK的区 ...

  7. STM32与物联网02-网络数据收发

    在上一节中,介绍了 ESP8266 的使用方法.不过上一节中都是通过串口调试工具手动发送信息的方式来操作 ESP8266 ,这肯定不能用于实际开发.因此,本节介绍如何编写合适的程序来和 ESP8266 ...

  8. Docker 安全及日志管理

    Docker 安全及日志管理 容器的安全性问题的根源在于容器和宿主机共享内核. 容器里的应用导致Linux内核崩溃,那么整个系统可能都会崩溃. 虚拟机并没有与主机共享内核,虚拟机崩溃一般不会导致宿主机 ...

  9. Note -「模拟退火」

    随机化算法属于省选芝士体系 0x01 前置芝士 你只需要会 rand 就可以啦! 当然如果你想理解的更透彻也可以先看看 爬山算法 0x02 关于退火 退火是一种金属热处理工艺,指的是将金属缓慢加热到一 ...

  10. Linux(Centos7) 实例搭建 FTP 服务

    本文以 CentOS 7.2 64位系统为例,使用 vsftpd 作为 FTP 服务端,FileZilla 作为客户端.指导您如何在 Linux 云服务器上搭建 FTP 服务. 操作步骤 安装 vsf ...