OceanBase初体验之从MySQL迁移数据到OceanBase集群

前置条件

MySQL 环境
OceanBase 环境
测试用的表结构和一些数据

先在源端 MySQL 用如下脚本创建测试表，以及写入10000条数据用于迁移测试。

use test;

CREATE TABLE students (

  `id` int NOT NULL PRIMARY KEY,

  `name` varchar(255) ,

  `code` varchar(20),

  `class` varchar(255)

);

DELIMITER $$

DROP PROCEDURE if EXISTS insertStudents;

CREATE procedure insertStudents()

BEGIN

	DECLARE i INT;

	SET i = 0;

	WHILE i<10000 DO

		INSERT INTO students(id, name, code, class) VALUES(i+1, CONCAT('User', i), '2020010' + i + 10, CONCAT(FLOOR(RAND() * 10),'c'));

		SET i = i+1;

	END WHILE;

END $$

DELIMITER ; 

CALL insertStudents(); 

select count(*) from students;

准备工作完成后，我们选择合适的迁移工作来进行操作。对于 OceanBase 的MySQL 租户，通常有如下两种全量迁移方式：

mysqldump，MySQL自带的导出工具，安装好MySQL后就能直接使用，适合小数据量场景下的快速迁移，表结构和数据被导出成sql文件
DataX，是阿里开源的异构数据迁移工具，支持丰富的上下游数据源使用广泛，对 OceanBase 的兼容性比较好，适合大批量数据迁移有较好的性能

下面做分别演示。

迁移方式一：mysqldump

mysqldump不需要单独安装，只要装了 MySQL 的环境基本都会有mysqldump，可以用如下命令来检查：

[root@localhost ~]# which mysqldump

/usr/bin/mysqldump

另外，obclient其实也集成了mysqldump工具，直接拿来用也可以：

[root@localhost ~]# su - ob

Last login: Sun Mar 17 10:49:37 CST 2024 on pts/1

[ob@localhost ~]$ which mysqldump

~/.oceanbase-all-in-one/obclient/u01/obclient/bin/mysqldump

整体迁移流程为：从MySQL导出sql文件 -> 去OceanBase执行sql文件，导出的脚本都是标准的SQL语法。

第一步先把数据导出：

[ob@localhost ~]$ mkdir mysqldump_data

[ob@localhost ~]$ mysqldump -h x.x.x.222 -u root -P 3306 -p -B "test" --tables "students" > /home/ob/mysqldump_data/students.sql

#导出的sql脚本包含了create table和insert，可以自行检查

第二步把数据导入，通常有两种方式：mysql命令行和source语法。

#方式一：用命令行导入

[ob@localhost ~]$ mysql -h x.x.x.222 -u root -P 2883 -D test < /home/ob/mysqldump_data/students.sql

#方式二：用source导入

[ob@localhost ~]$ obclient -h x.x.x.222 -u root -P 2883 -D test

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Welcome to the OceanBase.  Commands end with ; or \g.

Your OceanBase connection id is 524290

Server version: OceanBase_CE 4.2.2.0 (r100010012024022719-c984fe7cb7a4cef85a40323a0d073f0c9b7b8235) (Built Feb 27 2024 19:20:54)

Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

obclient [test]> show tables;

Empty set (0.002 sec)

obclient [test]> source /home/ob/mysqldump_data/students.sql;

Query OK, 10000 rows affected (0.069 sec)

Records: 10000  Duplicates: 0  Warnings: 0

obclient [test]> show tables;

+----------------+

| Tables_in_test |

+----------------+

| students       |

+----------------+

1 row in set (0.002 sec)

obclient [test]> select count(*) from students;

+----------+

| count(*) |

+----------+

|    10000 |

+----------+

1 row in set (0.005 sec)

到这里数据迁移就完成了。

扩展：只要是标准SQL语法组成的.sql文件，都可以用这两种方式批量执行。

迁移方式二：DataX

DataX 是单独的组件所以需要先下载才能使用，但是基本是开箱即用，没有复杂的环境配置。

先准备好 DataX 的使用环境，下载、解压即可：

[root@localhost ~]# wget https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202308/datax.tar.gz

[root@localhost ~]# tar -xvzf datax.tar.gz

DataX 的迁移任务被定义成一个个job，安装官网的配置文件规范我们准备一个如下的job配置：

[root@localhost ~]# cd datax/job/

[root@localhost job]# vi mysql2ob_test.json

{

    "job": {

        "setting": {

            "speed": {

                "channel": 4

            },

            "errorLimit": {

                "record": 0,

                "percentage": 0.1

            }

        },

        "content": [

            {

                "reader": {

                    "name": "mysqlreader",

                    "parameter": {

                        "username": "root",

                        "password": "填实际密码，不能留空",

                        "column": ["*"],

                        "connection": [

                            {

                                "table": ["students"],

                                "jdbcUrl": ["jdbc:mysql://x.x.x.222:3306/test?useUnicode=true&characterEncoding=utf8&useSSL=false"]

                            }

                        ]

                    }

                },

                "writer": {

                    "name": "oceanbasev10writer",

                    "parameter": {

                        "obWriteMode": "insert",

                        "column": ["*"],

                        "preSql": ["truncate table students_datx"],

                        "connection": [

                            {

                                "jdbcUrl": "jdbc:oceanbase://x.x.x.222:2883/test?",

                                "table": ["students_datx"]

                            }

                        ],

                        "username": "root",

                        "password":"填实际密码，不能留空",

                        "writerThreadCount":10,

                        "batchSize": 1000,

                        "memstoreThreshold": "0.9"

                    }

                }

            }

        ]

    }

}

配置内容主要包含以下几部分：

setting，job的参数配置，如并发数、限流等
reader，源端的读取方式，主要包含源端的数据库连接信息
writer，目标端的写入方式，主要包含目标端的数据库连接信息和写入行为等

由于 DataX 不会迁移表结构，所以先在 OceanBase 中把表建好，表名可以不一样，但是字段要一样

obclient [test]> CREATE TABLE students_datax (

    ->   `id` int NOT NULL PRIMARY KEY,

    ->   `name` varchar(255) ,

    ->   `code` varchar(20),

    ->   `class` varchar(255)

    -> );

Query OK, 0 rows affected (0.179 sec)

准备就绪后启动 DataX 执行即可：

[root@localhost job]# python ../bin/datax.py mysql2ob_test.json

... ...

2024-03-17 11:37:29.012 [job-0] INFO  StandAloneJobContainerCommunicator - Total 10000 records, 207784 bytes | Speed 20.29KB/s, 1000 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.143s |  All Task WaitReaderTime 0.022s | Percentage 100.00%

2024-03-17 11:37:29.013 [job-0] INFO  JobContainer -

任务启动时刻                    : 2024-03-17 11:37:18

任务结束时刻                    : 2024-03-17 11:37:29

任务总计耗时                    :                 10s

任务平均流量                    :           20.29KB/s

记录写入速度                    :           1000rec/s

读出记录总数                    :               10000

读写失败总数                    :                   0

校验下数据是否正常：

[ob@localhost ~]$ obclient -h x.x.x.222 -u root -P 2883 -D test -p

Enter password:

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Welcome to the OceanBase.  Commands end with ; or \g.

Your OceanBase connection id is 524294

Server version: OceanBase_CE 4.2.2.0 (r100010012024022719-c984fe7cb7a4cef85a40323a0d073f0c9b7b8235) (Built Feb 27 2024 19:20:54)

Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

obclient [test]> show tables;

+----------------+

| Tables_in_test |

+----------------+

| students       |

| students_datax |

+----------------+

2 rows in set (0.002 sec)

obclient [test]> select count(*) from students_datax;

+----------+

| count(*) |

+----------+

|    10000 |

+----------+

1 row in set (0.004 sec)

其他方式

OMS迁移

OMS（OceanBase Migration Service）是 OceanBase 提供的迁移服务，它包含在 OCP 中，可以在web界面上做一些配置即可实现数据迁移，它支持库表结构迁移、全量迁移和增量同步。

后续体验OMS的时候再来介绍。

增量实时同步

前面演示的两种方法都是全量迁移，对于有增量实时同步的场景以上工具还无法解决，需要引入新的工具。这一类工具的原理基本都是一样的，就是订阅 MySQL binlog 进行回放解析成标准sql在下游执行，使用比较多的有 Canal，也是阿里的开源项目。

使用方式可以参考：

https://www.oceanbase.com/docs/common-oceanbase-database-cn-1000000000507573