kafka connect 使用说明
KAFKA CONNECT 使用说明
一、概述
kafka connect 是一个可扩展的、可靠的在kafka和其他系统之间流传输的数据工具。简而言之就是他可以通过Connector(连接器)简单、快速的将大集合数据导入和导出kafka。可以接收整个数据库或收集来自所有的应用程序的消息到kafka的topic中,kafka connect 功能包括:
1,kafka连接器通用框架:kafka connect 规范了kafka和其他数据系统集成,简化了开发、部署和管理。
2,分布式和单机式:扩展到大型支持整个organization的集中管理服务,也可以缩小到开发,测试和小规模生产部署。
3,REST接口:通过rest API 来提交(和管理)Connector到kafka connect 集群。
4,offset自动化管理:从Connector 获取少量信息,connect来管理offset提交。
5,分布式和默认扩展:kafka connect建立在现有的组管理协议上,更多的工作可以添加扩展到connect集群。
6,流/批量集成:利用kafka现有能力,connect是一个桥接流和批量数据系统的理想解决方案。
在这里我们测试connect的kafka版本是:0.9.0.0
二,单机模式
单机模式的命令格式如下:
bin/connect-standalone.sh config/connect-standalone.properties Connector1.properties [Connector2.properties ...]
现在就上述文件我的配
1,connect-standalone.sh 是执行单机模式的命令。
#!/bin/sh
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. base_dir=$(dirname $) if [ "x$KAFKA_LOG4J_OPTS" = "x" ]; then
export KAFKA_LOG4J_OPTS="-Dlog4j.configuration=file:$base_dir/../config/connect-log4j.properties"
fi
if [ -z "$KAFKA_HEAP_OPTS" ]; then
export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9901 "
fi
if [ -z "$KAFKA_HEAP_OPTS" ]; then
export KAFKA_HEAP_OPTS="-Xmx1024M"
fi
exec $(dirname $)/kafka-run-class.sh org.apache.kafka.connect.cli.ConnectStandalone "$@"
在这里可以设置给connect的虚拟机内存设置:
if [ -z "$KAFKA_HEAP_OPTS" ]; then
export KAFKA_HEAP_OPTS="-Xmx1024M"
fi 也可以设置JMS配置:
if [ -z "$KAFKA_HEAP_OPTS" ]; then
export KAFKA_JMX_OPTS="-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.port=9901 "
fi 2,connect-standalone.properties的配置:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. # These are defaults. This file just demonstrates how to override some settings.
bootstrap.servers=10.253.129.237:,10.253.129.238:,10.253.129.239: # The converters specify the format of data in Kafka and how to translate it into Connect data. Every Connect user will
# need to configure these based on the format they want their data in when loaded from or stored into Kafka
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
# Converter-specific settings can be passed in by prefixing the Converter's setting with the converter we want to apply
# it to
key.converter.schemas.enable=true
value.converter.schemas.enable=true # The internal converter used for offsets and config data is configurable and must be specified, but most users will
# always want to use the built-in default. Offset and config data is never visible outside of Copcyat in this format.
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false offset.storage.file.filename=/datafs//connect.offsets
# Flush much faster than normal, which is useful for testing/debugging
offset.flush.interval.ms=
这里需要注意broker的配置,其余配置在kafka官网都有说明参考:
http://kafka.apache.org/090/documentation.html#connectconfigs
3,connect-file-source.properties的配置
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. name=test_source2
connector.class=FileStreamSource
tasks.max=
file=/datafs//json2/log.out
topic=TEST_MANAGER5
注意路径和topic的配置
4,connect-file-sink.properties 的配置:
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License. name=test_sink1
connector.class=FileStreamSink
#connector.class=org.apache.kafka.connect.file.FileStreamSinkConnector
tasks.max=
file=/datafs//a.out
topics=TEST_MANAGER5
三,集群模式
命令格式:
bin/connect-distributed.sh config/connect-distributed.properties
{"name":"test","config":{"topic":"TEST_MANAGER","connector.class":"FileStreamSource","tasks.max":"2","file":"/datafs/log1.out"}}
四,REST API
DELETE /Connectors/{name}:删除 Connector, 停止所有的任务并删除其配置
kafka connect 使用说明的更多相关文章
- Kafka Connect使用入门-Mysql数据导入到ElasticSearch
1.Kafka Connect Connect是Kafka的一部分,它为在Kafka和外部存储系统之间移动数据提供了一种可靠且伸缩的方式,它为连接器插件提供了一组API和一个运行时-Connect负责 ...
- Streaming data from Oracle using Oracle GoldenGate and Kafka Connect
This is a guest blog from Robin Moffatt. Robin Moffatt is Head of R&D (Europe) at Rittman Mead, ...
- Build an ETL Pipeline With Kafka Connect via JDBC Connectors
This article is an in-depth tutorial for using Kafka to move data from PostgreSQL to Hadoop HDFS via ...
- Kafka connect快速构建数据ETL通道
摘要: 作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 转载请注明出处 业余时间调研了一下Kafka connect的配置和使用,记录一些自己的理解和心得,欢迎 ...
- 使用kafka connect,将数据批量写到hdfs完整过程
版权声明:本文为博主原创文章,未经博主允许不得转载 本文是基于hadoop 2.7.1,以及kafka 0.11.0.0.kafka-connect是以单节点模式运行,即standalone. 首先, ...
- 基于Kafka Connect框架DataPipeline可以更好地解决哪些企业数据集成难题?
DataPipeline已经完成了很多优化和提升工作,可以很好地解决当前企业数据集成面临的很多核心难题. 1. 任务的独立性与全局性. 从Kafka设计之初,就遵从从源端到目的的解耦性.下游可以有很多 ...
- 基于Kafka Connect框架DataPipeline在实时数据集成上做了哪些提升?
在不断满足当前企业客户数据集成需求的同时,DataPipeline也基于Kafka Connect 框架做了很多非常重要的提升. 1. 系统架构层面. DataPipeline引入DataPipeli ...
- 以Kafka Connect作为实时数据集成平台的基础架构有什么优势?
Kafka Connect是一种用于在Kafka和其他系统之间可扩展的.可靠的流式传输数据的工具,可以更快捷和简单地将大量数据集合移入和移出Kafka的连接器.Kafka Connect为DataPi ...
- 打造实时数据集成平台——DataPipeline基于Kafka Connect的应用实践
导读:传统ETL方案让企业难以承受数据集成之重,基于Kafka Connect构建的新型实时数据集成平台被寄予厚望. 在4月21日的Kafka Beijing Meetup第四场活动上,DataPip ...
随机推荐
- JS---作用域和作用域链
JS---作用域和作用域链 作用域就是变量与函数的可访问范围,即作用域控制着变量与函数的可见性和生命周期.在JavaScript中,变量的作用域有全局作用域和局部作用域两种. //常犯的一个错误 &l ...
- SharePoint Framework 基于团队的开发(四)
博客地址:http://blog.csdn.net/FoxDave 确保代码一致性和质量 软件开发团队常常同项目的一致性和高质量做斗争.不同的开发者有不同的编码风格和偏好.在每个团队都有技术优秀的独立 ...
- Educational Codeforces Round 63 (Rated for Div. 2) D. Beautiful Array (简单DP)
题目:https://codeforces.com/contest/1155/problem/D 题意:给你n,x,一个n个数的序列,你可以选择一段区间,区间的数都乘以x,然后求出最大字段和 思路: ...
- 同一脚本sh 脚本名 报Syntax error: "(" unexpected而./脚本名不报错,求解!!
同一脚本sh 脚本名 执行时报Syntax error: "(" unexpected:而./脚本名执行不报错,为什么呢 脚本内容如下: function usage(){ ech ...
- Oracle学习DaySix(PL/SQL续)
一.游标 在 PL/SQL 程序中,对于处理多行记录的事务经常使用游标来实现.游标是一个指向上下文的句柄( handle)或指针.通过游标,PL/SQL 可以控制上下文区和处理语句时上 下文区会发生些 ...
- Mac 中 PyCharm 配置 Anaconda环境
- windows中obs源码编译的坑
好用的版本: cmake-3.6.1-win64-x64 + vs2015 + qt-opensource-windows-x86-msvc2015_64-5.7.0 + obs-stu ...
- PC能替代服务器吗?
PC能替代服务器吗?全方位解析二者区别_华为服务器_服务器x86服务器-中关村在线http://server.zol.com.cn/536/5366835_all.html
- redis命令Map类型(五)
如果存储一个对象 这个时候使用String 类型就不适合了,如果在String中修改一个数据的话,这就感到烦琐. hash 散列类型 ,他提供了字段与字段值的映射,当时字段值只能是字符串类型 命令: ...
- jQuery-4.动画篇---上卷下拉效果
jQuery中下拉动画slideDown 对于隐藏的元素,在将其显示出来的过程中,可以对其进行一些变化的动画效果.之前学过了show方法,show方法在显示的过程中也可以有动画,但是.show()方法 ...