数据源Source

RPC异构流数据交换

  • Avro Source
  • Thrift Source

文件或目录变化监听

  • Exec Source
  • Spooling Directory Source
  • Taildir Source

MQ或队列订阅数据持续监听

  • JMS Source
  • SSL and JMS Source
  • Kafka Source

Network类数据交换

  • NetCat TCP Source
  • NetCat UDP Source
  • HTTP Source
  • Syslog Sources
  • Syslog TCP Source
  • Multiport Syslog TCP Source
  • Syslog UDP Source

定制源

  • Custom Source

Sink

  • HDFS Sink
  • Hive Sink
  • Logger Sink
  • Avro Sink
  • Thrift Sink
  • IRC Sink
  • File Roll Sink
  • HBaseSinks
  • HBaseSink
  • HBase2Sink
  • AsyncHBaseSink
  • MorphlineSolrSink
  • ElasticSearchSink
  • Kite Dataset Sink
  • Kafka Sink
  • HTTP Sink
  • Custom Sink

案例

1、监听文件变化 

exec-memory-logger.properties

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /tmp/log.txt
a1.sources.s1.shell = /bin/bash -c
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.1.103
a1.sinks.k1.port = 8888
a1.sinks.k1.batch-size = 1
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-logger.properties --name a1 -Dflume.root.logger=INFO,console

测试

echo "asfsafsf" >> /tmp/log.txt

2、TCP NetCat监听

netcat.properties
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444 # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/netcat.properties --name a1 -Dflume.root.logger=INFO,console

测试

telnet localhost 44444

3、Kafka读、写 (读:从kafka到log,写:从file到kafka)

read-kafka.properties 、write-kafka.properties

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.s1.channels = c1
a1.sources.s1.batchSize = 5000
a1.sources.s1.batchDurationMillis = 2000
a1.sources.s1.kafka.bootstrap.servers = 192.168.1.103:9092
a1.sources.s1.kafka.topics = test1
a1.sources.s1.kafka.consumer.group.id = custom.g.id #将sources与channels进行绑定
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = logger #将sinks与channels进行绑定
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.sources = s1
a1.channels = c1
a1.sinks = k1 a1.sources.s1.type=exec
a1.sources.s1.command=tail -F /tmp/kafka.log
a1.sources.s1.channels=c1 #设置Kafka接收器
a1.sinks.k1.type= org.apache.flume.sink.kafka.KafkaSink
#设置Kafka地址
a1.sinks.k1.brokerList=192.168.1.103:9092
#设置发送到Kafka上的主题
a1.sinks.k1.topic=test1
#设置序列化方式
a1.sinks.k1.serializer.class=kafka.serializer.StringEncoder
a1.sinks.k1.channel=c1 a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/read-kafka.properties --name a1 -Dflume.root.logger=INFO,console

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/write-kafka.properties --name a1 -Dflume.root.logger=INFO,console

测试

# 创建用于测试主题
bin/kafka-topics.sh --create \
--bootstrap-server 192.168.1.103:9092 \
--replication-factor 1 \
--partitions 1 \
--topic test1
# 启动 Producer,用于发送测试数据:
bin/kafka-console-producer.sh --broker-list 192.168.1.103:9092 --topic test1

4、定制源

a1.sources = r1
a1.channels = c1
a1.sources.r1.type = org.example.MySource
a1.sources.r1.channels = c1

5、HDFS Sink

spooling-memory-hdfs.properties ,监听目录变化,将新建的文件传到HDFS

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type =spooldir
a1.sources.s1.spoolDir =/tmp/log2
a1.sources.s1.basenameHeader = true
a1.sources.s1.basenameHeaderKey = fileName
#将sources与channels进行绑定
a1.sources.s1.channels =c1 #配置sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H/
a1.sinks.k1.hdfs.filePrefix = %{fileName}
#生成的文件类型,默认是Sequencefile,可用DataStream,则为普通文本
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp = true
#将sinks与channels进行绑定
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory

测试

hdfs dfs -ls /flume/events/19-11-21/15

6、Hive Sink

a1.channels = c1
a1.channels.c1.type = memory
a1.sinks = k1
a1.sinks.k1.type = hive
a1.sinks.k1.channel = c1
a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083
a1.sinks.k1.hive.database = logsdb
a1.sinks.k1.hive.table = weblogs
a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M
a1.sinks.k1.useLocalTimeStamp = false
a1.sinks.k1.round = true
a1.sinks.k1.roundValue = 10
a1.sinks.k1.roundUnit = minute
a1.sinks.k1.serializer = DELIMITED
a1.sinks.k1.serializer.delimiter = "\t"
a1.sinks.k1.serializer.serdeSeparator = '\t'
a1.sinks.k1.serializer.fieldnames =id,,msg

7、Avro Source、Avro Sink

exec-memory-avro.properties、avro-memory-log.properties

#指定agent的sources,sinks,channels
a1.sources = s1
a1.sinks = k1
a1.channels = c1 #配置sources属性
a1.sources.s1.type = exec
a1.sources.s1.command = tail -F /tmp/log.txt
a1.sources.s1.shell = /bin/bash -c
a1.sources.s1.channels = c1 #配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.1.103
a1.sinks.k1.port = 8888
a1.sinks.k1.batch-size = 1
a1.sinks.k1.channel = c1 #配置channel类型
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
#指定agent的sources,sinks,channels
a2.sources = s2
a2.sinks = k2
a2.channels = c2 #配置sources属性
a2.sources.s2.type = avro
a2.sources.s2.bind = 192.168.1.103
a2.sources.s2.port = 8888 #将sources与channels进行绑定
a2.sources.s2.channels = c2 #配置sink
a2.sinks.k2.type = logger #将sinks与channels进行绑定
a2.sinks.k2.channel = c2 #配置channel类型
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100

启动


flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/avro-memory-log.properties --name a2 -Dflume.root.logger=INFO,console

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-avro.properties --name a1 -Dflume.root.logger=INFO,console

测试,使用一个Avro客户端发送数据

import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.event.EventBuilder;
import org.apache.flume.api.SecureRpcClientFactory;
import org.apache.flume.api.RpcClientConfigurationConstants;
import org.apache.flume.api.RpcClient;
import java.nio.charset.Charset;
import java.util.Properties; public class MyApp {
public static void main(String[] args) {
MySecureRpcClientFacade client = new MySecureRpcClientFacade();
// Initialize client with the remote Flume agent's host, port
Properties props = new Properties();
props.setProperty(RpcClientConfigurationConstants.CONFIG_CLIENT_TYPE, "thrift");
props.setProperty("hosts", "h1");
props.setProperty("hosts.h1", "client.example.org"+":"+ String.valueOf(8888)); // Initialize client with the kerberos authentication related properties
props.setProperty("kerberos", "true");
props.setProperty("client-principal", "flumeclient/client.example.org@EXAMPLE.ORG");
props.setProperty("client-keytab", "/tmp/flumeclient.keytab");
props.setProperty("server-principal", "flume/server.example.org@EXAMPLE.ORG");
client.init(props); // Send 10 events to the remote Flume agent. That agent should be
// configured to listen with an AvroSource.
String sampleData = "Hello Flume!";
for (int i = 0; i < 10; i++) {
client.sendDataToFlume(sampleData);
} client.cleanUp();
}
} class MySecureRpcClientFacade {
private RpcClient client;
private Properties properties; public void init(Properties properties) {
// Setup the RPC connection
this.properties = properties;
// Create the ThriftSecureRpcClient instance by using SecureRpcClientFactory
this.client = SecureRpcClientFactory.getThriftInstance(properties);
} public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, Charset.forName("UTF-8")); // Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = SecureRpcClientFactory.getThriftInstance(properties);
}
} public void cleanUp() {
// Close the RPC connection
client.close();
}
}

8、Elasticsearch Sink

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = elasticsearch
a1.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300
a1.sinks.k1.indexName = foo_index
a1.sinks.k1.indexType = bar_type
a1.sinks.k1.clusterName = foobar_cluster
a1.sinks.k1.batchSize = 500
a1.sinks.k1.ttl = 5d
a1.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer
a1.sinks.k1.channel = c1

9、定制Source、Sink开发

public class MySink extends AbstractSink implements Configurable {
private String myProp; @Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation) // Store myProp for later retrieval by process() method
this.myProp = myProp;
} @Override
public void start() {
// Initialize the connection to the external repository (e.g. HDFS) that
// this Sink will forward Events to ..
} @Override
public void stop () {
// Disconnect from the external respository and do any
// additional cleanup (e.g. releasing resources or nulling-out
// field values) ..
} @Override
public Status process() throws EventDeliveryException {
Status status = null; // Start transaction
Channel ch = getChannel();
Transaction txn = ch.getTransaction();
txn.begin();
try {
// This try clause includes whatever Channel operations you want to do Event event = ch.take(); // Send the Event to the external repository.
// storeSomeData(e); txn.commit();
status = Status.READY;
} catch (Throwable t) {
txn.rollback(); // Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
}
return status;
}
}
public class MySource extends AbstractSource implements Configurable, PollableSource {
private String myProp; @Override
public void configure(Context context) {
String myProp = context.getString("myProp", "defaultValue"); // Process the myProp value (e.g. validation, convert to another type, ...) // Store myProp for later retrieval by process() method
this.myProp = myProp;
} @Override
public void start() {
// Initialize the connection to the external client
} @Override
public void stop () {
// Disconnect from external client and do any additional cleanup
// (e.g. releasing resources or nulling-out field values) ..
} @Override
public Status process() throws EventDeliveryException {
Status status = null; try {
// This try clause includes whatever Channel/Event operations you want to do // Receive new data
Event e = getSomeData(); // Store the Event into this Source's associated Channel(s)
getChannelProcessor().processEvent(e); status = Status.READY;
} catch (Throwable t) {
// Log exception, handle individual exceptions as needed status = Status.BACKOFF; // re-throw all Errors
if (t instanceof Error) {
throw (Error)t;
}
} finally {
txn.close();
}
return status;
}
}

Flume的Source、Sink总结,及常用使用场景的更多相关文章

  1. Flume:source和sink

    Flume – 初识flume.source和sink 目录基本概念常用源 Source常用sink 基本概念  什么叫flume? 分布式,可靠的大量日志收集.聚合和移动工具.  events ...

  2. FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC

    FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC 最近做了一个事情,过滤下kakfa中的数据后,做这个就用到了flume,直接使用flume source 和 flume s ...

  3. 一次flume exec source采集日志到kafka因为单条日志数据非常大同步失败的踩坑带来的思考

    本次遇到的问题描述,日志采集同步时,当单条日志(日志文件中一行日志)超过2M大小,数据无法采集同步到kafka,分析后,共踩到如下几个坑.1.flume采集时,通过shell+EXEC(tail -F ...

  4. 泛函编程(36)-泛函Stream IO:IO数据源-IO Source & Sink

    上期我们讨论了IO处理过程:Process[I,O].我们说Process就像电视信号盒子一样有输入端和输出端两头.Process之间可以用一个Process的输出端与另一个Process的输入端连接 ...

  5. 把Flume的Source设置为 Spooling directory source

    把Flume的Source设置为 Spooling directory source,在设定的目录下放置需要读取的文件,一些文件在读取过程中会报错. 文件格式和报错如下: 实验一 读取汉子和“:&qu ...

  6. flume http source示例讲解

    一.介绍 flume自带的Http Source可以通过Http Post接收事件. 场景:对于有些应用程序环境,它可能不能部署Flume SDK及其依赖项,或客户端代码倾向于通过HTTP而不是Flu ...

  7. Redis的Python实践,以及四中常用应用场景详解——学习董伟明老师的《Python Web开发实践》

    首先,简单介绍:Redis是一个基于内存的键值对存储系统,常用作数据库.缓存和消息代理. 支持:字符串,字典,列表,集合,有序集合,位图(bitmaps),地理位置,HyperLogLog等多种数据结 ...

  8. Flume组件source,channel,sink源码分析

    LifeCycleState: IDLE, START, STOP, ERROR [Source]: org.apache.flume.Source 继承LifeCycleAware{stop() + ...

  9. Flume笔记--source端监听目录,sink端上传到HDFS

    官方文档参数解释:http://flume.apache.org/FlumeUserGuide.html#hdfs-sink 需要注意:文件格式,fileType=DataStream 默认为Sequ ...

随机推荐

  1. Form之action提交不刷新不跳转

    <div class="file-box"> <form action="/File/fileUpLoad" id="form1&q ...

  2. ASP.NET MVC IOC 之 Autofac(三)-webform中应用

    在webform中应用autofac,只有global中的写法不一样,其他使用方式都一样 nuget上引用: global中的写法: private void AutoFacRegister() { ...

  3. Redis安装和基本使用

    目录 Redis安装和基本使用 安装 配置 启动服务端 启动客户端 Redis键(key) 与键相关的基本命令 Redis字符串 常用字符串命令: Redis哈希 常用Hash命令 Redis 列表( ...

  4. Robotframework ride ,运行后提示, [WinError 2] 系统找不到指定的文件。

    运行后提示, [WinError 2] 系统找不到指定的文件. command: pybot.bat --argumentfile C:\Users\123\AppData\Local\Temp\RI ...

  5. 分析mybatis中 #{} 和${}的区别

    分析方法: 在 GenericTokenParser这个类的parse方法的这一行下个断点调试一下就明白了 builder.append(handler.handleToken(content)); ...

  6. webpack打包js文件

    当输入 webpack 输入指令 npm run dev  后会自动启动一个浏览器 需要借鉴插件 open-browser-webpack-plugin 下载:npm install open-bro ...

  7. CRM产品主数据在行业解决方案industry solution中的应用

    AG3, choose this role: Create a new Acquisition Contracts: Here our product advances search will be ...

  8. 安装Python,输入pip命令报错———pip Fatal error in launcher: Unable to create process using

    今天把Python的安装位置也从C盘剪切到了D盘, 然后修改了Path环境变量中对应的盘符:D:\Python27\;D:\Python27\Scripts; 不管是在哪个目录,Python可以执行了 ...

  9. nginx无网络启动失败——proxy_pass域名DNS解析出错

    问题: nginx启动或者reload的时候,会对proxy_pass后面的域名进行DNS解析,如果解析失败,启动就会失败或者reload失败. 我们是to B的产品,客户的环境可能是不通公网的,因此 ...

  10. cmake设定boost python3

    在mac上操作的.python3是anaconda环境下装的,3.7.1. boost是用brew装的,1.71.0版本. 按照FindBoost.cmake官方写法的CMakeLists.txt: ...