flume-ng-sql-source实现oracle增量数据读取
一、下载编译flume-ng-sql-source
下载地址:https://github.com/keedio/flume-ng-sql-source.git ,安装说明文档编译和拷贝jar包
嫌麻烦的也可以直接,CSDN下载地址:http://download.csdn.net/detail/chongxin1/9892184
此时最新的版本为flume-ng-sql-source-1.4.3.jar,flume-ng-sql-source-1.4.3.jar是flume用于连接数据库的重要支撑jar包。
二、把flume-ng-sql-source-1.4.3.jar放到flume的lib目录下
三、把oracle(此处用的是oracle库)的驱动包放到flume的lib目录下
oracle的jdbc驱动包,放在oracle安装目录下,路径为:D:\app\product\11.2.0\dbhome_1\jdbc\lib
如图:
把ojdbc5.jar放到flume的lib目录下,如图:
四、运行Demo
1、创建数据库表
- create table flume_ng_sql_source (
- id varchar2(32) primary key,
- msg varchar2(32),
- createTime date not null
- );
- insert into flume_ng_sql_source(id,msg,createTime) values('1','Test increment Data',to_date('2017-08-01 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('2','Test increment Data',to_date('2017-08-02 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('3','Test increment Data',to_date('2017-08-03 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('4','Test increment Data',to_date('2017-08-04 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('5','Test increment Data',to_date('2017-08-05 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('6','Test increment Data',to_date('2017-08-06 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- commit;
2、新建flume-sql.conf
- touch /usr/local/flume/flume-sql.conf
- sudo gedit /usr/local/flume/flume-sql.conf
- agentTest.channels = channelTest
- agentTest.sources = sourceTest
- agentTest.sinks = sinkTest
- ###########sql source#################
- # For each Test of the sources, the type is defined
- agentTest.sources.sourceTest.type = org.keedio.flume.source.SQLSource
- agentTest.sources.sourceTest.hibernate.connection.url = jdbc:oracle:thin:@192.168.168.100:1521/orcl
- # Hibernate Database connection properties
- agentTest.sources.sourceTest.hibernate.connection.user = flume
- agentTest.sources.sourceTest.hibernate.connection.password = 1234
- agentTest.sources.sourceTest.hibernate.connection.autocommit = true
- agentTest.sources.sourceTest.hibernate.dialect = org.hibernate.dialect.Oracle10gDialect
- agentTest.sources.sourceTest.hibernate.connection.driver_class = oracle.jdbc.driver.OracleDriver
- agentTest.sources.sourceTest.run.query.delay=1
- agentTest.sources.sourceTest.status.file.path = /usr/local/flume
- agentTest.sources.sourceTest.status.file.name = agentTest.sqlSource.status
- # Custom query
- agentTest.sources.sourceTest.start.from = '2017-07-31 07:06:20'
- agentTest.sources.sourceTest.custom.query = SELECT CHR(39)||TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS')||CHR(39),ID,MSG FROM FLUME_NG_SQL_SOURCE WHERE CREATETIME > TO_DATE($@$,'YYYY-MM-DD HH24:MI:SS') ORDER BY CREATETIME ASC
- agentTest.sources.sourceTest.batch.size = 6000
- agentTest.sources.sourceTest.max.rows = 1000
- agentTest.sources.sourceTest.hibernate.connection.provider_class = org.hibernate.connection.C3P0ConnectionProvider
- agentTest.sources.sourceTest.hibernate.c3p0.min_size=1
- agentTest.sources.sourceTest.hibernate.c3p0.max_size=10
- ##############################
- agentTest.channels.channelTest.type = memory
- agentTest.channels.channelTest.capacity = 10000
- agentTest.channels.channelTest.transactionCapacity = 10000
- agentTest.channels.channelTest.byteCapacityBufferPercentage = 20
- agentTest.channels.channelTest.byteCapacity = 1600000
- agentTest.sinks.sinkTest.type = org.apache.flume.sink.kafka.KafkaSink
- agentTest.sinks.sinkTest.topic = TestTopic
- agentTest.sinks.sinkTest.brokerList = 192.168.168.200:9092
- agentTest.sinks.sinkTest.requiredAcks = 1
- agentTest.sinks.sinkTest.batchSize = 20
- agentTest.sinks.sinkTest.channel = channelTest
- agentTest.sinks.sinkTest.channel = channelTest
- agentTest.sources.sourceTest.channels=channelTest
3、flume-ng启动flume-sql.conf和测试
启动kafka的消费者,监听topic主题:
- kafka-console-consumer.sh --zookeeper 192.168.168.200:2181 --topic TestTopic
flume-ng启动flume-sql.conf
- flume-ng agent --conf conf --conf-file /usr/local/flume/flume-sql.conf --name agentTest -Dflume.root.logger=INFO,console
TestTopic消费者控制台打印:
- [root@master ~]# kafka-console-consumer.sh --zookeeper 192.168.168.200:2181 --topic TestTopic
- "'2017-08-01 07:06:20'","1","Test increment Data"
- "'2017-08-02 07:06:20'","2","Test increment Data"
- "'2017-08-03 07:06:20'","3","Test increment Data"
- "'2017-08-04 07:06:20'","4","Test increment Data"
- "'2017-08-05 07:06:20'","5","Test increment Data"
- "'2017-08-06 07:06:20'","6","Test increment Data"
根据配置查看相应的状态文件/usr/local/flume/agentTest.sqlSource.status:
- agentTest.sources.sourceTest.status.file.path = /usr/local/flume
- agentTest.sources.sourceTest.status.file.name = agentTest.sqlSource.status
- {"SourceName":"sourceTest","URL":"jdbc:oracle:thin:@192.168.168.100:1521\/orcl","LastIndex":"'2017-08-06 07:06:20'","Query":"SELECT CHR(39)||TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS')||CHR(39) AS INCREMENTAL,ID,MSG FROM FLUME_NG_SQL_SOURCE WHERE TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS') > $@$ ORDER BY INCREMENTAL ASC"}
从"LastIndex":"'2017-08-06 07:06:20'",可以看出当前的最后一条增量数据日期是'2017-08-06 07:06:20',也就是说下一次WHERE TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS') > $@$,$@$的值就是'2017-08-06 07:06:20'。
往flume_ng_sql_source表中插入增量数据:
- insert into flume_ng_sql_source(id,msg,createTime) values('7','Test increment Data',to_date('2017-08-07 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('8','Test increment Data',to_date('2017-08-08 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('9','Test increment Data',to_date('2017-08-09 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- insert into flume_ng_sql_source(id,msg,createTime) values('10','Test increment Data',to_date('2017-08-10 07:06:20','yyyy-mm-dd hh24:mi:ss'));
- commit;
TestTopic消费者控制台打印:
- [root@master ~]# kafka-console-consumer.sh --zookeeper 192.168.168.200:2181--topic TestTopic
- "'2017-08-01 07:06:20'","1","Test increment Data"
- "'2017-08-02 07:06:20'","2","Test increment Data"
- "'2017-08-03 07:06:20'","3","Test increment Data"
- "'2017-08-04 07:06:20'","4","Test increment Data"
- "'2017-08-05 07:06:20'","5","Test increment Data"
- "'2017-08-06 07:06:20'","6","Test increment Data"
- "'2017-08-07 07:06:20'","7","Test increment Data"
- "'2017-08-08 07:06:20'","8","Test increment Data"
- "'2017-08-09 07:06:20'","9","Test increment Data"
- "'2017-08-10 07:06:20'","10","Test increment Data"
根据配置查看相应的状态文件/usr/local/flume/agentTest.sqlSource.status:
- {"SourceName":"sourceTest","URL":"jdbc:oracle:thin:@192.168.168.100:1521\/orcl","LastIndex":"'2017-08-10 07:06:20'","Query":"SELECT CHR(39)||TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS')||CHR(39) AS INCREMENTAL,ID,MSG FROM FLUME_NG_SQL_SOURCE WHERE TO_CHAR(CREATETIME,'YYYY-MM-DD HH24:MI:SS') > $@$ ORDER BY INCREMENTAL ASC"}
"LastIndex":"'2017-08-10 07:06:20'"
至此,flume-ng-sql-source实现oracle增量数据读取成功!!!
五、相关配置参数说明
https://github.com/keedio/flume-ng-sql-source
Configuration of SQL Source:
Mandatory properties in bold
| Property Name | Default | Description |
|---|---|---|
| channels | - | Connected channel names |
| type | - | The component type name, needs to be org.keedio.flume.source.SQLSource |
| hibernate.connection.url | - | Url to connect with the remote Database |
| hibernate.connection.user | - | Username to connect with the database |
| hibernate.connection.password | - | Password to connect with the database |
| table | - | Table to export data |
| status.file.name | - | Local file name to save last row number read |
| status.file.path | /var/lib/flume | Path to save the status file |
| start.from | 0 | Start value to import data |
| delimiter.entry | , | delimiter of incoming entry |
| enclose.by.quotes | true | If Quotes are applied to all values in the output. |
| columns.to.select | * | Which colums of the table will be selected |
| run.query.delay | 10000 | ms to wait between run queries |
| batch.size | 100 | Batch size to send events to flume channel |
| max.rows | 10000 | Max rows to import per query |
| read.only | false | Sets read only session with DDBB |
| custom.query | - | Custom query to force a special request to the DB, be carefull. Check below explanation of this property. |
| hibernate.connection.driver_class | - | Driver class to use by hibernate, if not specified the framework will auto asign one |
| hibernate.dialect | - | Dialect to use by hibernate, if not specified the framework will auto asign one. Check https://docs.jboss.org/hibernate/orm/4.3/manual/en-US/html/ch03.html#configuration-optional-dialects for a complete list of available dialects |
| hibernate.connection.provider_class | - | Set to org.hibernate.connection.C3P0ConnectionProvider to use C3P0 connection pool (recommended for production) |
| hibernate.c3p0.min_size | - | Min connection pool size |
| hibernate.c3p0.max_size | - | Max connection pool size |
Custom Query
A custom query is supported to bring the possibility of using the entire SQL language. This is powerful, but risky, be careful with the custom queries used.
To avoid row export repetitions use the $@$ special character in WHERE clause, to incrementaly export not processed rows and the new ones inserted.
IMPORTANT: For proper operation of Custom Query ensure that incremental field will be returned in the first position of the Query result.
Example:
- agent.sources.sql-source.custom.query = SELECT incrementalField,field2 FROM table1 WHERE incrementalField > $@$
这段话的意思大意就是为了避免出现问题,把增量字段写在查询的第一个位置。
flume-ng-sql-source实现oracle增量数据读取的更多相关文章
- 使用PL/SQL能查询oracle中数据,在for update 语句中一直卡住
原因:在oracle中,执行了update或者insert语句后,都会要求commit,如果不commit却强制关闭连接,oracle就会将这条提交的记录锁住.下次就不能执行增删操作. 解决:1.查询 ...
- solr增量数据配置说明
转帖地址:http://www.blogjava.net/conans/articles/379546.html 以下资料整理自网络,觉的有必要合并在一起,这样方便查看.主要分为两部分,第一部分是对& ...
- Flume NG Getting Started(Flume NG 新手入门指南)
Flume NG Getting Started(Flume NG 新手入门指南)翻译 新手入门 Flume NG是什么? 有什么改变? 获得Flume NG 从源码构建 配置 flume-ng全局选 ...
- MS SQL到Oracle的数据迁移笔记
MS SQL到Oracle的数据迁移笔记 一.任务背景 旧系统使用MS SQL Server数据库,新系统使用Oracle数据库,现在需要将旧系统中的数据迁移到新系统中,旧数据按照约定的规则转换后,能 ...
- 关于 Oracle 的数据导入导出及 Sql Loader (sqlldr) 的用法
在 Oracle 数据库中,我们通常在不同数据库的表间记录进行复制或迁移时会用以下几种方法: 1. A 表的记录导出为一条条分号隔开的 insert 语句,然后执行插入到 B 表中2. 建立数据库间的 ...
- 关于sql server远程访问Oracle数据库 OpenQuery查询返回多条数据的问题
在Sql Server远程访问Oracle 中的数据库表时: 远程语法通常为: select * from OpenQuery(Oracle链接服务器名称,‘查询语句’) eg: select * f ...
- sql server 与oracle数据互导的一种思路--sql server链接服务器
思路:通过在sql server数据库中添加链接服务器,可以远程查询oracle数据库的表环境准备,安装sql server数据库,并安装好oracle驱动,在配置好tnsname文件中配置好orac ...
- java.sql.SQLException: ORA-01578: ORACLE 数据块损坏问题解决办法
错误信息: java.sql.SQLException: ORA-01578: ORACLE 数据块损坏 (文件号 17, 块号 315703) ORA-01110: 数据文件 17: 'D:\ORA ...
- flume使用之exec source收集各端数据汇总到另外一台服务器
转载:http://blog.csdn.net/liuxiao723846/article/details/78133375 一.场景一描述: 线上api接口服务通过log4j往本地磁盘上打印日志,在 ...
随机推荐
- asp.net mvc webapi 实用的接口加密方法(转载)
在很多项目中,因为webapi是对外开放的,这个时候,我们就要得考虑接口交换数据的安全性. 安全机制也比较多,如andriod与webapi 交换数据的时候,可以走双向证书方法,但是开发成本比较大, ...
- pandas的聚合操作: groupyby与agg
pandas提供基于行和列的聚合操作,groupby可理解为是基于行的,agg则是基于列的 从实现上看,groupby返回的是一个DataFrameGroupBy结构,这个结构必须调用聚合函数(如su ...
- Markdown显示反引号(`)
/********************************************************************** * Markdown显示反引号(`) * 说明: * 在 ...
- 【图像基础】图像不变性特征HU矩和Zernike矩
参考 1. 图像不变性特征: 2. matlab实现: 3. HU矩和Zernike矩: 完
- JavaBasic_01
计算机和编程语言 谷歌pagerank算法:给每一个网页有一个权值 被越多网页引用的网页越重要 被越重要的网页引用越重要 给每一个网页赋予权值,空网页权值为0 (马尔科夫链) 机器语言 汇编语言 高级 ...
- 使用libcurl作为Http client
产品通过HTTP协议为外部提供接口服务,常规情况是客户通过HTTP协议请求服务,服务结束后通过HTTP协议将服务记录POST到请求方. 用原生C实现了一个简单的HTTP Client,只有简单的功能: ...
- Docker第一个应用:Hello World
Docker应用:Hello World 前言: 最近学习了Docker相关技术点,国内关于Docker的资料大多是基于Linux系统的,但是我对Linux又不熟(实际上没用过,掩面哭笑.Jpg). ...
- Git图形化界面客户端大汇总
文,还在不断更新,网上搜到的同名文章都是未经同意就从这里复制过去的) 一.TortoiseGit - The coolest Interface to Git Version Control Tort ...
- Linux命令速查手册
Others make 通过外部编译器的,比如linux中的gcc集来编译源码 获取Makefile文件的命令触发编译 curl -X GET/POST -I 获取head curl有cache 查看 ...
- linux---文件颜色含义
下面是linux系统约定不同类型文件默认的颜色: 白色:表示普通文件 蓝色:表示目录 绿色:表示可执行文件 红色:表示压缩文件 浅蓝色:链接文件 红色闪烁:表示链接的文件有问题 黄色:表示设备文件 灰 ...