DataX操作MySQL

一、 从MySQL读取

介绍

MysqlReader插件实现了从Mysql读取数据。在底层实现上,MysqlReader通过JDBC连接远程Mysql数据库,并执行相应的sql语句将数据从mysql库中SELECT出来。
不同于其他关系型数据库,MysqlReader不支持FetchSize.

实现原理

简而言之,MysqlReader通过JDBC连接器连接到远程的Mysql数据库,并根据用户配置的信息生成查询SELECT SQL语句,然后发送到远程Mysql数据库,并将该SQL执行返回结果使用DataX自定义的数据类型拼装为抽象的数据集,并传递给下游Writer处理。
对于用户配置Table、Column、Where的信息,MysqlReader将其拼接为SQL语句发送到Mysql数据库;对于用户配置querySql信息,MysqlReader直接将其发送到Mysql数据库。

json如下

{
"job": {
"setting": {
"speed": {
"channel":
},
"errorLimit": {
"record": ,
"percentage": 0.02
}
},
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "",
"column": [
"id",
"name"
],
"splitPk": "id",
"connection": [{
"table": [
"datax_test"
],
"jdbcUrl": [
"jdbc:mysql://192.168.1.123:3306/test"
]
}]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": true
}
}
}]
}
}

参数说明--jdbcUrl

描述:描述的是到对端数据库的JDBC连接信息,使用JSON的数组描述,并支持一个库填写多个连接地址。之所以使用JSON数组描述连接信息,是因为阿里集团内部支持多个IP探测,如果配置了多个,MysqlReader可以依次探测ip的可连接性,直到选择一个合法的IP。如果全部连接失败,MysqlReader报错。 注意,jdbcUrl必须包含在connection配置单元中。对于阿里集团外部使用情况,JSON数组填写一个JDBC连接即可。
jdbcUrl按照Mysql官方规范,并可以填写连接附件控制信息。
必选:是 
默认值:无 
--username
描述:数据源的用户名 
必选:是 
默认值:无 
--password
描述:数据源指定用户名的密码 
必选:是 
默认值:无 
--table
描述:所选取的需要同步的表。使用JSON的数组描述,因此支持多张表同时抽取。当配置为多张表时,用户自己需保证多张表是同一schema结构,MysqlReader不予检查表是否同一逻辑表。注意,table必须包含在connection配置单元中。
必选:是 
默认值:无 
--column
描述:所配置的表中需要同步的列名集合,使用JSON的数组描述字段信息。用户使用*代表默认使用所有列配置,例如['*']。
支持列裁剪,即列可以挑选部分列进行导出。
支持列换序,即列可以不按照表schema信息进行导出。
支持常量配置,用户需要按照Mysql SQL语法格式: ["id", "`table`", "1", "'bazhen.csy'", "null", "to_char(a + 1)", "2.3" , "true"] id为普通列名,`table`为包含保留在的列名,1为整形数字常量,'bazhen.csy'为字符串常量,null为空指针,to_char(a + 1)为表达式,2.3为浮点数,true为布尔值。
必选:是 
默认值:无 
--splitPk
描述:MysqlReader进行数据抽取时,如果指定splitPk,表示用户希望使用splitPk代表的字段进行数据分片,DataX因此会启动并发任务进行数据同步,这样可以大大提供数据同步的效能。
推荐splitPk用户使用表主键,因为表主键通常情况下比较均匀,因此切分出来的分片也不容易出现数据热点。
-- 目前splitPk仅支持整形数据切分,不支持浮点、字符串、日期等其他类型。如果用户指定其他非支持类型,MysqlReader将报错!
--如果splitPk不填写,包括不提供splitPk或者splitPk值为空,DataX视作使用单通道同步该表数据。
必选:否 
默认值:空 
--where
描述:筛选条件,MysqlReader根据指定的column、table、where条件拼接SQL,并根据这个SQL进行数据抽取。在实际业务场景中,往往会选择当天的数据进行同步,可以将where条件指定为gmt_create > $bizdate 。注意:不可以将where条件指定为limit 10,limit不是SQL的合法where子句。
where条件可以有效地进行业务增量同步。如果不填写where语句,包括不提供where的key或者value,DataX均视作同步全量数据。
必选:否 
默认值:无 
--querySql
描述:在有些业务场景下,where这一配置项不足以描述所筛选的条件,用户可以通过该配置型来自定义筛选SQL。当用户配置了这一项之后,DataX系统就会忽略table,column这些配置型,直接使用这个配置项的内容对数据进行筛选,例如需要进行多表join后同步数据,使用select a,b from table_a join table_b on table_a.id = table_b.id 
当用户配置querySql时,MysqlReader直接忽略table、column、where条件的配置,querySql优先级大于table、column、where选项。
必选:否 
默认值:无 

mysqlreader类型转换表

请注意:
--除上述罗列字段类型外,其他类型均不支持。
--tinyint(1) DataX视作为整形。
--year DataX视作为字符串类型
--bit DataX属于未定义行为。

执行

FengZhendeMacBook-Pro:bin FengZhen$ ./datax.py /Users/FengZhen/Desktop/Hadoop/dataX/json/mysql/reader_all.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) -, Alibaba Group. All Rights Reserved. -- ::04.599 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
-- ::04.612 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.162-b12
jvmInfo: Mac OS X x86_64 10.13.
cpu num: totalPhysicalMemory: -.00G
freePhysicalMemory: -.00G
maxFileDescriptorCount: -
currentOpenFileDescriptorCount: - GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size
PS Eden Space | .00MB | .00MB
Code Cache | .00MB | .44MB
Compressed Class Space | ,.00MB | .00MB
PS Survivor Space | .50MB | .50MB
PS Old Gen | .00MB | .00MB
Metaspace | -.00MB | .00MB -- ::04.638 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"mysqlreader",
"parameter":{
"column":[
"id",
"name"
],
"connection":[
{
"jdbcUrl":[
"jdbc:mysql://192.168.1.123:3306/test"
],
"table":[
"datax_test"
]
}
],
"password":"******",
"splitPk":"id",
"username":"root"
}
},
"writer":{
"name":"streamwriter",
"parameter":{
"print":true
}
}
}
],
"setting":{
"errorLimit":{
"percentage":0.02,
"record":
},
"speed":{
"channel":
}
}
} -- ::04.673 [main] WARN Engine - prioriy set to , because NumberFormatException, the value is: null
-- ::04.678 [main] INFO PerfTrace - PerfTrace traceId=job_-, isEnable=false, priority=
-- ::04.678 [main] INFO JobContainer - DataX jobContainer starts job.
-- ::04.681 [main] INFO JobContainer - Set jobId =
-- ::05.323 [job-] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
-- ::05.478 [job-] INFO OriginalConfPretreatmentUtil - table:[datax_test] has columns:[id,name].
-- ::05.490 [job-] INFO JobContainer - jobContainer starts to do prepare ...
-- ::05.491 [job-] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
-- ::05.492 [job-] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work .
-- ::05.493 [job-] INFO JobContainer - jobContainer starts to do split ...
-- ::05.493 [job-] INFO JobContainer - Job set Channel-Number to channels.
-- ::05.618 [job-] INFO SingleTableSplitUtil - split pk [sql=SELECT MIN(id),MAX(id) FROM datax_test] is running...
-- ::05.665 [job-] INFO SingleTableSplitUtil - After split(), allQuerySql=[
select id,name from datax_test where ( <= id AND id < )
select id,name from datax_test where ( <= id AND id < )
select id,name from datax_test where ( <= id AND id < )
select id,name from datax_test where ( <= id AND id <= )
select id,name from datax_test where id IS NULL
].
-- ::05.666 [job-] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [] tasks.
-- ::05.667 [job-] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [] tasks.
-- ::05.697 [job-] INFO JobContainer - jobContainer starts to do schedule ...
-- ::05.721 [job-] INFO JobContainer - Scheduler starts [] taskGroups.
-- ::05.744 [job-] INFO JobContainer - Running by standalone Mode.
-- ::05.758 [taskGroup-] INFO TaskGroupContainer - taskGroupId=[] start [] channels for [] tasks.
-- ::05.765 [taskGroup-] INFO Channel - Channel set byte_speed_limit to -, No bps activated.
-- ::05.766 [taskGroup-] INFO Channel - Channel set record_speed_limit to -, No tps activated.
-- ::05.790 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::05.795 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::05.796 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::05.796 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::05.820 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::05.821 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::05.981 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
test1
-- ::06.030 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::06.033 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::06.034 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from datax_test where ( <= id AND id <= )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::06.041 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
test3
-- ::06.137 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::06.139 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::06.139 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from datax_test where id IS NULL
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::06.157 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
test2
-- ::06.243 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::11.295 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from datax_test where ( <= id AND id <= )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
test4
test5
-- ::11.393 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::15.784 [job-] INFO StandAloneJobContainerCommunicator - Total records, bytes | Speed 0B/s, records/s | Error records, bytes | All Task WaitWriterTime .000s | All Task WaitReaderTime .000s | Percentage 0.00%
-- ::25.166 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from datax_test where id IS NULL
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::25.413 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::25.417 [taskGroup-] INFO TaskGroupContainer - taskGroup[] completed it's tasks.
-- ::25.786 [job-] INFO StandAloneJobContainerCommunicator - Total records, bytes | Speed 3B/s, records/s | Error records, bytes | All Task WaitWriterTime .000s | All Task WaitReaderTime .000s | Percentage 100.00%
-- ::25.786 [job-] INFO AbstractScheduler - Scheduler accomplished all tasks.
-- ::25.787 [job-] INFO JobContainer - DataX Writer.Job [streamwriter] do post work.
-- ::25.788 [job-] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
-- ::25.788 [job-] INFO JobContainer - DataX jobId [] completed successfully.
-- ::25.791 [job-] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /Users/FengZhen/Desktop/Hadoop/dataX/datax/hook
-- ::25.796 [job-] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | | | | .000s | .000s | .000s
PS Scavenge | | | | .000s | .000s | .000s -- ::25.797 [job-] INFO JobContainer - PerfTrace not enable!
-- ::25.798 [job-] INFO StandAloneJobContainerCommunicator - Total records, bytes | Speed 1B/s, records/s | Error records, bytes | All Task WaitWriterTime .000s | All Task WaitReaderTime .000s | Percentage 100.00%
-- ::25.799 [job-] INFO JobContainer -
任务启动时刻 : -- ::
任务结束时刻 : -- ::
任务总计耗时 : 21s
任务平均流量 : 1B/s
记录写入速度 : 0rec/s
读出记录总数 :
读写失败总数 :

在控制台可看到结果输出

二、从MySQL按条件读取数据

json如下

{
"job": {
"setting": {
"speed": {
"channel":
}
},
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "",
"connection": [{
"querySql": [
"select * from datax_test where id < 3;"
],
"jdbcUrl": [
"jdbc:mysql://bad_ip:3306/database",
"jdbc:mysql://127.0.0.1:bad_port/database",
"jdbc:mysql://192.168.1.123:3306/test"
]
}]
}
},
"writer": {
"name": "streamwriter",
"parameter": {
"print": true,
"encoding": "UTF-8"
}
}
}]
}
}

执行

FengZhendeMacBook-Pro:bin FengZhen$ ./datax.py /Users/FengZhen/Desktop/Hadoop/dataX/json/mysql/reader_select.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) -, Alibaba Group. All Rights Reserved. -- ::20.508 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
-- ::20.521 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.162-b12
jvmInfo: Mac OS X x86_64 10.13.
cpu num: totalPhysicalMemory: -.00G
freePhysicalMemory: -.00G
maxFileDescriptorCount: -
currentOpenFileDescriptorCount: - GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size
PS Eden Space | .00MB | .00MB
Code Cache | .00MB | .44MB
Compressed Class Space | ,.00MB | .00MB
PS Survivor Space | .50MB | .50MB
PS Old Gen | .00MB | .00MB
Metaspace | -.00MB | .00MB -- ::20.557 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"mysqlreader",
"parameter":{
"connection":[
{
"jdbcUrl":[
"jdbc:mysql://bad_ip:3306/database",
"jdbc:mysql://127.0.0.1:bad_port/database",
"jdbc:mysql://192.168.1.123:3306/test"
],
"querySql":[
"select * from datax_test where id < 3;"
]
}
],
"password":"******",
"username":"root"
}
},
"writer":{
"name":"streamwriter",
"parameter":{
"encoding":"UTF-8",
"print":true
}
}
}
],
"setting":{
"speed":{
"channel":
}
}
} -- ::20.609 [main] WARN Engine - prioriy set to , because NumberFormatException, the value is: null
-- ::20.612 [main] INFO PerfTrace - PerfTrace traceId=job_-, isEnable=false, priority=
-- ::20.613 [main] INFO JobContainer - DataX jobContainer starts job.
-- ::20.618 [main] INFO JobContainer - Set jobId =
-- ::21.140 [job-] WARN DBUtil - test connection of [jdbc:mysql://bad_ip:3306/database] failed, for Code:[MYSQLErrCode-02], Description:[数据库服务的IP地址或者Port错误,请检查填写的IP地址和Port或者联系DBA确认IP地址和Port是否正确。如果是同步中心用户请联系DBA确认idb上录入的IP和PORT信息和数据库的当前实际信息是一致的]. - 具体错误信息为:com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure The last packet sent successfully to the server was milliseconds ago. The driver has not received any packets from the server..
-- ::21.143 [job-] WARN DBUtil - test connection of [jdbc:mysql://127.0.0.1:bad_port/database] failed, for Code:[DBUtilErrorCode-10], Description:[连接数据库失败. 请检查您的 账号、密码、数据库名称、IP、Port或者向 DBA 寻求帮助(注意网络环境).]. - 具体错误信息为:com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: Cannot load connection class because of underlying exception: 'java.lang.NumberFormatException: For input string: "bad_port"'..
-- ::21.498 [job-] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
-- ::21.512 [job-] INFO JobContainer - jobContainer starts to do prepare ...
-- ::21.518 [job-] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
-- ::21.520 [job-] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work .
-- ::21.521 [job-] INFO JobContainer - jobContainer starts to do split ...
-- ::21.524 [job-] INFO JobContainer - Job set Channel-Number to channels.
-- ::21.546 [job-] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [] tasks.
-- ::21.548 [job-] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [] tasks.
-- ::21.587 [job-] INFO JobContainer - jobContainer starts to do schedule ...
-- ::21.592 [job-] INFO JobContainer - Scheduler starts [] taskGroups.
-- ::21.597 [job-] INFO JobContainer - Running by standalone Mode.
-- ::21.629 [taskGroup-] INFO TaskGroupContainer - taskGroupId=[] start [] channels for [] tasks.
-- ::21.639 [taskGroup-] INFO Channel - Channel set byte_speed_limit to -, No bps activated.
-- ::21.639 [taskGroup-] INFO Channel - Channel set record_speed_limit to -, No tps activated.
-- ::21.658 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::21.667 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from datax_test where id < ;
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::21.814 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from datax_test where id < ;
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
test1
test2
-- ::21.865 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::21.866 [taskGroup-] INFO TaskGroupContainer - taskGroup[] completed it's tasks.
-- ::31.685 [job-] INFO StandAloneJobContainerCommunicator - Total records, bytes | Speed 1B/s, records/s | Error records, bytes | All Task WaitWriterTime .000s | All Task WaitReaderTime .000s | Percentage 100.00%
-- ::31.685 [job-] INFO AbstractScheduler - Scheduler accomplished all tasks.
-- ::31.686 [job-] INFO JobContainer - DataX Writer.Job [streamwriter] do post work.
-- ::31.687 [job-] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
-- ::31.687 [job-] INFO JobContainer - DataX jobId [] completed successfully.
-- ::31.688 [job-] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /Users/FengZhen/Desktop/Hadoop/dataX/datax/hook
-- ::31.693 [job-] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | | | | .000s | .000s | .000s
PS Scavenge | | | | .000s | .000s | .000s -- ::31.693 [job-] INFO JobContainer - PerfTrace not enable!
-- ::31.694 [job-] INFO StandAloneJobContainerCommunicator - Total records, bytes | Speed 1B/s, records/s | Error records, bytes | All Task WaitWriterTime .000s | All Task WaitReaderTime .000s | Percentage 100.00%
-- ::31.695 [job-] INFO JobContainer -
任务启动时刻 : -- ::
任务结束时刻 : -- ::
任务总计耗时 : 11s
任务平均流量 : 1B/s
记录写入速度 : 0rec/s
读出记录总数 :
读写失败总数 :

三、从MySQL读取写入MySQL

写入MySQL简介

MysqlWriter 插件实现了写入数据到 Mysql 主库的目的表的功能。在底层实现上, MysqlWriter 通过 JDBC 连接远程 Mysql 数据库,并执行相应的 insert into ... 或者 ( replace into ...) 的 sql 语句将数据写入 Mysql,内部会分批次提交入库,需要数据库本身采用 innodb 引擎。

实现原理

MysqlWriter 通过 DataX 框架获取 Reader 生成的协议数据,根据你配置的 writeMode 生成
insert into...(当主键/唯一性索引冲突时会写不进去冲突的行)
或者
replace into...(没有遇到主键/唯一性索引冲突时,与 insert into 行为一致,冲突时会用新行替换原有行所有字段) 的语句写入数据到 Mysql。出于性能考虑,采用了 PreparedStatement + Batch,并且设置了:rewriteBatchedStatements=true,将数据缓冲到线程上下文 Buffer 中,当 Buffer 累计到预定阈值时,才发起写入请求。

json如下

{
"job": {
"setting": {
"speed": {
"channel":
},
"errorLimit": {
"record": ,
"percentage": 0.02
}
},
"content": [{
"reader": {
"name": "mysqlreader",
"parameter": {
"username": "root",
"password": "",
"column": [
"id",
"name"
],
"splitPk": "id",
"connection": [{
"table": [
"datax_test"
],
"jdbcUrl": [
"jdbc:mysql://192.168.1.123:3306/test"
]
}]
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"writeMode": "insert",
"username": "root",
"password": "",
"column": [
"id",
"name"
],
"session": [
"set session sql_mode='ANSI'"
],
"preSql": [
"delete from datax_target_test"
],
"connection": [{
"jdbcUrl": "jdbc:mysql://192.168.1.123:3306/test?useUnicode=true&characterEncoding=gbk",
"table": [
"datax_target_test"
]
}]
}
}
}]
}
}

执行

FengZhendeMacBook-Pro:bin FengZhen$ ./datax.py /Users/FengZhen/Desktop/Hadoop/dataX/json/mysql/.mysql2mysql.json

DataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) -, Alibaba Group. All Rights Reserved. -- ::13.176 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
-- ::13.189 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.162-b12
jvmInfo: Mac OS X x86_64 10.13.
cpu num: totalPhysicalMemory: -.00G
freePhysicalMemory: -.00G
maxFileDescriptorCount: -
currentOpenFileDescriptorCount: - GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size
PS Eden Space | .00MB | .00MB
Code Cache | .00MB | .44MB
Compressed Class Space | ,.00MB | .00MB
PS Survivor Space | .50MB | .50MB
PS Old Gen | .00MB | .00MB
Metaspace | -.00MB | .00MB -- ::13.218 [main] INFO Engine -
{
"content":[
{
"reader":{
"name":"mysqlreader",
"parameter":{
"column":[
"id",
"name"
],
"connection":[
{
"jdbcUrl":[
"jdbc:mysql://192.168.1.123:3306/test"
],
"table":[
"datax_test"
]
}
],
"password":"******",
"splitPk":"id",
"username":"root"
}
},
"writer":{
"name":"mysqlwriter",
"parameter":{
"column":[
"id",
"name"
],
"connection":[
{
"jdbcUrl":"jdbc:mysql://192.168.1.123:3306/test?useUnicode=true&characterEncoding=gbk",
"table":[
"datax_target_test"
]
}
],
"password":"******",
"preSql":[
"delete from datax_target_test"
],
"session":[
"set session sql_mode='ANSI'"
],
"username":"root",
"writeMode":"insert"
}
}
}
],
"setting":{
"errorLimit":{
"percentage":0.02,
"record":
},
"speed":{
"channel":
}
}
} -- ::13.268 [main] WARN Engine - prioriy set to , because NumberFormatException, the value is: null
-- ::13.271 [main] INFO PerfTrace - PerfTrace traceId=job_-, isEnable=false, priority=
-- ::13.272 [main] INFO JobContainer - DataX jobContainer starts job.
-- ::13.280 [main] INFO JobContainer - Set jobId =
-- ::13.991 [job-] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
-- ::14.147 [job-] INFO OriginalConfPretreatmentUtil - table:[datax_test] has columns:[id,name].
-- ::14.567 [job-] INFO OriginalConfPretreatmentUtil - table:[datax_target_test] all columns:[
id,name
].
-- ::14.697 [job-] INFO OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id,name) VALUES(?,?)
], which jdbcUrl like:[jdbc:mysql://192.168.1.123:3306/test?useUnicode=true&characterEncoding=gbk&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true]
-- ::14.698 [job-] INFO JobContainer - jobContainer starts to do prepare ...
-- ::14.698 [job-] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
-- ::14.699 [job-] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
-- ::14.765 [job-] INFO CommonRdbmsWriter$Job - Begin to execute preSqls:[delete from datax_target_test]. context info:jdbc:mysql://192.168.1.123:3306/test?useUnicode=true&characterEncoding=gbk&yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
-- ::14.770 [job-] INFO JobContainer - jobContainer starts to do split ...
-- ::14.771 [job-] INFO JobContainer - Job set Channel-Number to channels.
-- ::14.879 [job-] INFO SingleTableSplitUtil - split pk [sql=SELECT MIN(id),MAX(id) FROM datax_test] is running...
-- ::14.926 [job-] INFO SingleTableSplitUtil - After split(), allQuerySql=[
select id,name from datax_test where ( <= id AND id < )
select id,name from datax_test where ( <= id AND id < )
select id,name from datax_test where ( <= id AND id < )
select id,name from datax_test where ( <= id AND id <= )
select id,name from datax_test where id IS NULL
].
-- ::14.926 [job-] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [] tasks.
-- ::14.928 [job-] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [] tasks.
-- ::14.974 [job-] INFO JobContainer - jobContainer starts to do schedule ...
-- ::14.991 [job-] INFO JobContainer - Scheduler starts [] taskGroups.
-- ::14.995 [job-] INFO JobContainer - Running by standalone Mode.
-- ::15.011 [taskGroup-] INFO TaskGroupContainer - taskGroupId=[] start [] channels for [] tasks.
-- ::15.022 [taskGroup-] INFO Channel - Channel set byte_speed_limit to -, No bps activated.
-- ::15.022 [taskGroup-] INFO Channel - Channel set record_speed_limit to -, No tps activated.
-- ::15.041 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::15.052 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::15.052 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::15.052 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::15.057 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::15.057 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::15.175 [---writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
-- ::15.215 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::15.233 [---writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
-- ::19.387 [---writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
-- ::19.457 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::19.575 [---writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
-- ::19.612 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from datax_test where ( <= id AND id < )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::19.687 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::19.693 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::19.693 [---writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
-- ::19.696 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from datax_test where ( <= id AND id <= )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::19.796 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::19.799 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] attemptCount[] is started
-- ::19.799 [---reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select id,name from datax_test where id IS NULL
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::19.873 [---writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
-- ::19.882 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from datax_test where ( <= id AND id <= )
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::19.989 [---writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
-- ::20.000 [---reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select id,name from datax_test where id IS NULL
] jdbcUrl:[jdbc:mysql://192.168.1.123:3306/test?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
-- ::20.074 [---writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
-- ::20.107 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::20.142 [---writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
-- ::20.212 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::25.061 [job-] INFO StandAloneJobContainerCommunicator - Total records, bytes | Speed 0B/s, records/s | Error records, bytes | All Task WaitWriterTime .000s | All Task WaitReaderTime .000s | Percentage 0.00%
-- ::25.578 [---writer] INFO DBUtil - execute sql:[set session sql_mode='ANSI']
-- ::25.671 [taskGroup-] INFO TaskGroupContainer - taskGroup[] taskId[] is successed, used[]ms
-- ::25.671 [taskGroup-] INFO TaskGroupContainer - taskGroup[] completed it's tasks.
-- ::35.064 [job-] INFO StandAloneJobContainerCommunicator - Total records, bytes | Speed 3B/s, records/s | Error records, bytes | All Task WaitWriterTime .000s | All Task WaitReaderTime .000s | Percentage 100.00%
-- ::35.064 [job-] INFO AbstractScheduler - Scheduler accomplished all tasks.
-- ::35.065 [job-] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
-- ::35.066 [job-] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
-- ::35.067 [job-] INFO JobContainer - DataX jobId [] completed successfully.
-- ::35.068 [job-] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: /Users/FengZhen/Desktop/Hadoop/dataX/datax/hook
-- ::35.072 [job-] INFO JobContainer -
[total cpu info] =>
averageCpu | maxDeltaCpu | minDeltaCpu
-1.00% | -1.00% | -1.00%
[total gc info] =>
NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime
PS MarkSweep | | | | .000s | .000s | .000s
PS Scavenge | | | | .000s | .000s | .000s -- ::35.072 [job-] INFO JobContainer - PerfTrace not enable!
-- ::35.073 [job-] INFO StandAloneJobContainerCommunicator - Total records, bytes | Speed 1B/s, records/s | Error records, bytes | All Task WaitWriterTime .000s | All Task WaitReaderTime .000s | Percentage 100.00%
-- ::35.074 [job-] INFO JobContainer -
任务启动时刻 : -- ::
任务结束时刻 : -- ::
任务总计耗时 : 21s
任务平均流量 : 1B/s
记录写入速度 : 0rec/s
读出记录总数 :
读写失败总数 :

参数说明:
--jdbcUrl
描述:目的数据库的 JDBC 连接信息。作业运行时,DataX 会在你提供的 jdbcUrl 后面追加如下属性:yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true
注意:1、在一个数据库上只能配置一个 jdbcUrl 值。这与 MysqlReader 支持多个备库探测不同,因为此处不支持同一个数据库存在多个主库的情况(双主导入数据情况)
2、jdbcUrl按照Mysql官方规范,并可以填写连接附加控制信息,比如想指定连接编码为 gbk ,则在 jdbcUrl 后面追加属性 useUnicode=true&characterEncoding=gbk。具体请参看 Mysql官方文档或者咨询对应 DBA。
必选:是 
默认值:无 
--username
描述:目的数据库的用户名 
必选:是 
默认值:无 
--password
描述:目的数据库的密码 
必选:是 
默认值:无 
--table
描述:目的表的表名称。支持写入一个或者多个表。当配置为多张表时,必须确保所有表结构保持一致。
注意:table 和 jdbcUrl 必须包含在 connection 配置单元中
必选:是 
默认值:无 
--column
描述:目的表需要写入数据的字段,字段之间用英文逗号分隔。例如: "column": ["id","name","age"]。如果要依次写入全部列,使用表示, 例如: "column": [""]。
**column配置项必须指定,不能留空!**

注意:1、我们强烈不推荐你这样配置,因为当你目的表字段个数、类型等有改动时,你的任务可能运行不正确或者失败
2、 column 不能配置任何常量值
必选:是 
默认值:否 
--session
描述: DataX在获取Mysql连接时,执行session指定的SQL语句,修改当前connection session属性
必须: 否
默认值: 空
--preSql
描述:写入数据到目的表前,会先执行这里的标准语句。如果 Sql 中有你需要操作到的表名称,请使用 @table 表示,这样在实际执行 Sql 语句时,会对变量按照实际表名称进行替换。比如你的任务是要写入到目的端的100个同构分表(表名称为:datax_00,datax01, ... datax_98,datax_99),并且你希望导入数据前,先对表中数据进行删除操作,那么你可以这样配置:"preSql":["delete from 表名"],效果是:在执行到每个表写入数据前,会先执行对应的 delete from 对应表名称 
必选:否 
默认值:无 
--postSql
描述:写入数据到目的表后,会执行这里的标准语句。(原理同 preSql ) 
必选:否 
默认值:无 
--writeMode
描述:控制写入数据到目标表采用 insert into 或者 replace into 或者 ON DUPLICATE KEY UPDATE 语句
必选:是 
所有选项:insert/replace/update 
默认值:insert 
--batchSize
描述:一次性批量提交的记录数大小,该值可以极大减少DataX与Mysql的网络交互次数,并提升整体吞吐量。但是该值设置过大可能会造成DataX运行进程OOM情况。
必选:否 
默认值:1024 
类型转换
类似 MysqlReader ,目前 MysqlWriter 支持大部分 Mysql 类型,但也存在部分个别类型没有支持的情况,请注意检查你的类型。
下面列出 MysqlWriter 针对 Mysql 类型转换列表:

  • bit类型目前是未定义类型转换

DataX-MySQL(读写)的更多相关文章

  1. mysql读写分离(PHP类)

    mysql读写分离(PHP类) 博客分类: php mysql   自己实现了php的读写分离,并且不用修改程序 优点:实现了读写分离,不依赖服务器硬件配置,并且都是可以配置read服务器,无限扩展 ...

  2. amoeba实现MySQL读写分离

    amoeba实现MySQL读写分离 准备环境:主机A和主机B作主从配置,IP地址为192.168.131.129和192.168.131.130,主机C作为中间件,也就是作为代理服务器,IP地址为19 ...

  3. PHP代码实现MySQL读写分离

    关于MySQL的读写分离有几种方法:中间件,Mysql驱动层,代码控制 关于中间件和Mysql驱动层实现Mysql读写分离的方法,今天暂不做研究, 这里主要写一点简单的代码来实现由PHP代码控制MyS ...

  4. 转:Mysql读写分离实现的三种方式

    1 程序修改mysql操作类可以参考PHP实现的Mysql读写分离,阿权开始的本项目,以php程序解决此需求.优点:直接和数据库通信,简单快捷的读写分离和随机的方式实现的负载均衡,权限独立分配缺点:自 ...

  5. 使用Atlas实现MySQL读写分离+MySQL-(Master-Slave)配置

    参考博文: MySQL-(Master-Slave)配置  本人按照博友北在北方的配置已成功  我使用的是 mysql5.6.27版本. 使用Atlas实现MySQL读写分离 数据切分——Atlas读 ...

  6. MySQL读写分离技术

    1.简介 当今MySQL使用相当广泛,随着用户的增多以及数据量的增大,高并发随之而来.然而我们有很多办法可以缓解数据库的压力.分布式数据库.负载均衡.读写分离.增加缓存服务器等等.这里我们将采用读写分 ...

  7. php实现MySQL读写分离

    MySQL读写分离有好几种方式 MySQL中间件 MySQL驱动层 代码控制 关于 中间件 和 驱动层的方式这里不做深究  暂且简单介绍下 如何通过PHP代码来控制MySQL读写分离 我们都知道 &q ...

  8. [记录]MySQL读写分离(Atlas和MySQL-proxy)

    MySQL读写分离(Atlas和MySQL-proxy) 一.阿里云使用Atlas从外网访问MySQL(RDS) (同样的方式修改配置文件可以实现代理也可以实现读写分离,具体看使用场景) 1.在跳板机 ...

  9. docker环境 mysql读写分离 mycat maxscale

    #mysql读写分离测试 环境centos 7.4 ,docker 17.12 ,docker-compose mysql 5.7 主从 mycat 1.6 读写分离 maxscale 2.2.4 读 ...

  10. mysql读写分离总结

    随着一个网站的业务不断扩展,数据不断增加,数据库的压力也会越来越大,对数据库或者SQL的基本优化可能达不到最终的效果,我们可以采用读写分离的策略来改变现状.读写分离现在被大量应用于很多大型网站,这个技 ...

随机推荐

  1. PID file found but no matching process was found. Stop aborted

    一般脚本部署时不会遇到这种情况,有时候自个手动处理会出现”PID file found but no matching process was found. Stop aborted”,根据意思就可以 ...

  2. Java Random 含参与不含参构造函数的区别

    ##Random 通常用来作为随机数生成器,它有两个构造方法: Random random = new Random(); Random random2 = new Random(50); 1.不含参 ...

  3. CodeIgniter框架——数据库类(配置+快速入门)

    CodeIgniter用户指南——数据库类 数据库配置 入门:用法举例 连接数据库 查询 生成查询结果 查询辅助函数 Active Record 类 事务 表格元数据 字段元数据 自定义函数调用 查询 ...

  4. hdu 5068(线段树+矩阵乘法)

    矩阵乘法来进行所有路径的运算, 线段树来查询修改. 关键还是矩阵乘法的结合律. Harry And Math Teacher Time Limit: 5000/3000 MS (Java/Others ...

  5. Oracle 物理体系

    Oracle  物理体系 Oracle 物理体系 问题 参考资料   Oracle 物理体系       PGA:program global area ,私有不共享内存. PGA起到预处理的作用: ...

  6. VSpy之C Code Interface的使用

    Spy3 要运行 CCodeInterface 功能,需要安装运行环境,建议安装 Visual Studio2003,2005,2008,2010 或更新的版本.当然也可以安装 VC express ...

  7. ECMAScript6箭头函数ArrowFunction"=>"

    一.说明 ECMAScript6可以用箭头"=>"定义函数.x => x * x或(x) => {return x * x;}与匿名函数function(x){r ...

  8. Quartz使用总结(转发:http://www.cnblogs.com/drift-ice/p/3817269.html)

    Quartz可以用来做什么? Quartz是一个任务调度框架.比如你遇到这样的问题 想每月25号,信用卡自动还款 想每年4月1日自己给当年暗恋女神发一封匿名贺卡 想每隔1小时,备份一下自己的爱情动作片 ...

  9. POSIX相关概念

    POSIX 表示可移植操作系统接口(Portable Operating System Interface ,缩写为 POSIX),POSIX标准定义了操作系统应该为应用程序提供的接口标准,是IEEE ...

  10. C#中时间的Ticks属性

    C#中时间的Ticks属性是一个很大的长整数,单位是 100 毫微秒.表示自 0001 年 1 月 1 日午夜 12:00:00 以来已经过的时间的以 100 毫微秒为间隔的间隔数,已经说得很清楚了, ...