数据源Source

RPC异构流数据交换

Avro Source
Thrift Source

文件或目录变化监听

Exec Source
Spooling Directory Source
Taildir Source

MQ或队列订阅数据持续监听

JMS Source
SSL and JMS Source
Kafka Source

Network类数据交换

NetCat TCP Source
NetCat UDP Source
HTTP Source
Syslog Sources
Syslog TCP Source
Multiport Syslog TCP Source
Syslog UDP Source

定制源

Custom Source

Sink

HDFS Sink
Hive Sink
Logger Sink
Avro Sink
Thrift Sink
IRC Sink
File Roll Sink
HBaseSinks
HBaseSink
HBase2Sink
AsyncHBaseSink
MorphlineSolrSink
ElasticSearchSink
Kite Dataset Sink
Kafka Sink
HTTP Sink
Custom Sink

案例

1、监听文件变化

exec-memory-logger.properties

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1

#配置sources属性

a1.sources.s1.type = exec

a1.sources.s1.command = tail -F /tmp/log.txt

a1.sources.s1.shell = /bin/bash -c

a1.sources.s1.channels = c1

#配置sink

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = 192.168.1.103

a1.sinks.k1.port = 8888

a1.sinks.k1.batch-size = 1

a1.sinks.k1.channel = c1

#配置channel类型

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-logger.properties --name a1 -Dflume.root.logger=INFO,console

测试

echo "asfsafsf" >> /tmp/log.txt

2、TCP NetCat监听

netcat.properties

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/netcat.properties --name a1 -Dflume.root.logger=INFO,console

测试

telnet localhost 44444

3、Kafka读、写（读：从kafka到log，写：从file到kafka）

read-kafka.properties 、write-kafka.properties

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1  

#配置sources属性

a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource

a1.sources.s1.channels = c1

a1.sources.s1.batchSize = 5000

a1.sources.s1.batchDurationMillis = 2000

a1.sources.s1.kafka.bootstrap.servers = 192.168.1.103:9092

a1.sources.s1.kafka.topics = test1

a1.sources.s1.kafka.consumer.group.id = custom.g.id

#将sources与channels进行绑定

a1.sources.s1.channels = c1

#配置sink

a1.sinks.k1.type = logger

#将sinks与channels进行绑定

a1.sinks.k1.channel = c1  

#配置channel类型

a1.channels.c1.type = memory

a1.sources = s1

a1.channels = c1

a1.sinks = k1                                                                                         

a1.sources.s1.type=exec

a1.sources.s1.command=tail -F /tmp/kafka.log

a1.sources.s1.channels=c1 

#设置Kafka接收器

a1.sinks.k1.type= org.apache.flume.sink.kafka.KafkaSink

#设置Kafka地址

a1.sinks.k1.brokerList=192.168.1.103:9092

#设置发送到Kafka上的主题

a1.sinks.k1.topic=test1

#设置序列化方式

a1.sinks.k1.serializer.class=kafka.serializer.StringEncoder

a1.sinks.k1.channel=c1     

a1.channels.c1.type=memory

a1.channels.c1.capacity=10000

a1.channels.c1.transactionCapacity=100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/read-kafka.properties --name a1 -Dflume.root.logger=INFO,console

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/write-kafka.properties --name a1 -Dflume.root.logger=INFO,console

测试

# 创建用于测试主题

bin/kafka-topics.sh --create \

                    --bootstrap-server 192.168.1.103:9092 \

                    --replication-factor 1 \

                    --partitions 1  \

                    --topic test1

# 启动 Producer,用于发送测试数据：

bin/kafka-console-producer.sh --broker-list 192.168.1.103:9092 --topic test1

4、定制源

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = org.example.MySource

a1.sources.r1.channels = c1

5、HDFS Sink

spooling-memory-hdfs.properties ，监听目录变化，将新建的文件传到HDFS

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1  

#配置sources属性

a1.sources.s1.type =spooldir

a1.sources.s1.spoolDir =/tmp/log2

a1.sources.s1.basenameHeader = true

a1.sources.s1.basenameHeaderKey = fileName

#将sources与channels进行绑定

a1.sources.s1.channels =c1 

#配置sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H/

a1.sinks.k1.hdfs.filePrefix = %{fileName}

#生成的文件类型，默认是Sequencefile，可用DataStream，则为普通文本

a1.sinks.k1.hdfs.fileType = DataStream

a1.sinks.k1.hdfs.useLocalTimeStamp = true

#将sinks与channels进行绑定

a1.sinks.k1.channel = c1

#配置channel类型

a1.channels.c1.type = memory

测试

hdfs dfs -ls /flume/events/19-11-21/15

6、Hive Sink

a1.channels = c1

a1.channels.c1.type = memory

a1.sinks = k1

a1.sinks.k1.type = hive

a1.sinks.k1.channel = c1

a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083

a1.sinks.k1.hive.database = logsdb

a1.sinks.k1.hive.table = weblogs

a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M

a1.sinks.k1.useLocalTimeStamp = false

a1.sinks.k1.round = true

a1.sinks.k1.roundValue = 10

a1.sinks.k1.roundUnit = minute

a1.sinks.k1.serializer = DELIMITED

a1.sinks.k1.serializer.delimiter = "\t"

a1.sinks.k1.serializer.serdeSeparator = '\t'

a1.sinks.k1.serializer.fieldnames =id,,msg

7、Avro Source、Avro Sink

exec-memory-avro.properties、avro-memory-log.properties

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1

#配置sources属性

a1.sources.s1.type = exec

a1.sources.s1.command = tail -F /tmp/log.txt

a1.sources.s1.shell = /bin/bash -c

a1.sources.s1.channels = c1

#配置sink

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = 192.168.1.103

a1.sinks.k1.port = 8888

a1.sinks.k1.batch-size = 1

a1.sinks.k1.channel = c1

#配置channel类型

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

#指定agent的sources,sinks,channels

a2.sources = s2

a2.sinks = k2

a2.channels = c2

#配置sources属性

a2.sources.s2.type = avro

a2.sources.s2.bind = 192.168.1.103

a2.sources.s2.port = 8888

#将sources与channels进行绑定

a2.sources.s2.channels = c2

#配置sink

a2.sinks.k2.type = logger

#将sinks与channels进行绑定

a2.sinks.k2.channel = c2

#配置channel类型

a2.channels.c2.type = memory

a2.channels.c2.capacity = 1000

a2.channels.c2.transactionCapacity = 100

启动

先

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/avro-memory-log.properties --name a2 -Dflume.root.logger=INFO,console

后

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-avro.properties --name a1 -Dflume.root.logger=INFO,console

测试，使用一个Avro客户端发送数据

import org.apache.flume.Event;

import org.apache.flume.EventDeliveryException;

import org.apache.flume.event.EventBuilder;

import org.apache.flume.api.SecureRpcClientFactory;

import org.apache.flume.api.RpcClientConfigurationConstants;

import org.apache.flume.api.RpcClient;

import java.nio.charset.Charset;

import java.util.Properties;

public class MyApp {

  public static void main(String[] args) {

    MySecureRpcClientFacade client = new MySecureRpcClientFacade();

    // Initialize client with the remote Flume agent's host, port

    Properties props = new Properties();

    props.setProperty(RpcClientConfigurationConstants.CONFIG_CLIENT_TYPE, "thrift");

    props.setProperty("hosts", "h1");

    props.setProperty("hosts.h1", "client.example.org"+":"+ String.valueOf(8888));

    // Initialize client with the kerberos authentication related properties

    props.setProperty("kerberos", "true");

    props.setProperty("client-principal", "flumeclient/client.example.org@EXAMPLE.ORG");

    props.setProperty("client-keytab", "/tmp/flumeclient.keytab");

    props.setProperty("server-principal", "flume/server.example.org@EXAMPLE.ORG");

    client.init(props);

    // Send 10 events to the remote Flume agent. That agent should be

    // configured to listen with an AvroSource.

    String sampleData = "Hello Flume!";

    for (int i = 0; i < 10; i++) {

      client.sendDataToFlume(sampleData);

    }

    client.cleanUp();

  }

}

class MySecureRpcClientFacade {

  private RpcClient client;

  private Properties properties;

  public void init(Properties properties) {

    // Setup the RPC connection

    this.properties = properties;

    // Create the ThriftSecureRpcClient instance by using SecureRpcClientFactory

    this.client = SecureRpcClientFactory.getThriftInstance(properties);

  }

  public void sendDataToFlume(String data) {

    // Create a Flume Event object that encapsulates the sample data

    Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));

    // Send the event

    try {

      client.append(event);

    } catch (EventDeliveryException e) {

      // clean up and recreate the client

      client.close();

      client = null;

      client = SecureRpcClientFactory.getThriftInstance(properties);

    }

  }

  public void cleanUp() {

    // Close the RPC connection

    client.close();

  }

}

8、Elasticsearch Sink

a1.channels = c1

a1.sinks = k1

a1.sinks.k1.type = elasticsearch

a1.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300

a1.sinks.k1.indexName = foo_index

a1.sinks.k1.indexType = bar_type

a1.sinks.k1.clusterName = foobar_cluster

a1.sinks.k1.batchSize = 500

a1.sinks.k1.ttl = 5d

a1.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer

a1.sinks.k1.channel = c1

9、定制Source、Sink开发

public class MySink extends AbstractSink implements Configurable {

  private String myProp;

  @Override

  public void configure(Context context) {

    String myProp = context.getString("myProp", "defaultValue");

    // Process the myProp value (e.g. validation)

    // Store myProp for later retrieval by process() method

    this.myProp = myProp;

  }

  @Override

  public void start() {

    // Initialize the connection to the external repository (e.g. HDFS) that

    // this Sink will forward Events to ..

  }

  @Override

  public void stop () {

    // Disconnect from the external respository and do any

    // additional cleanup (e.g. releasing resources or nulling-out

    // field values) ..

  }

  @Override

  public Status process() throws EventDeliveryException {

    Status status = null;

    // Start transaction

    Channel ch = getChannel();

    Transaction txn = ch.getTransaction();

    txn.begin();

    try {

      // This try clause includes whatever Channel operations you want to do

      Event event = ch.take();

      // Send the Event to the external repository.

      // storeSomeData(e);

      txn.commit();

      status = Status.READY;

    } catch (Throwable t) {

      txn.rollback();

      // Log exception, handle individual exceptions as needed

      status = Status.BACKOFF;

      // re-throw all Errors

      if (t instanceof Error) {

        throw (Error)t;

      }

    }

    return status;

  }

}

public class MySource extends AbstractSource implements Configurable, PollableSource {

  private String myProp;

  @Override

  public void configure(Context context) {

    String myProp = context.getString("myProp", "defaultValue");

    // Process the myProp value (e.g. validation, convert to another type, ...)

    // Store myProp for later retrieval by process() method

    this.myProp = myProp;

  }

  @Override

  public void start() {

    // Initialize the connection to the external client

  }

  @Override

  public void stop () {

    // Disconnect from external client and do any additional cleanup

    // (e.g. releasing resources or nulling-out field values) ..

  }

  @Override

  public Status process() throws EventDeliveryException {

    Status status = null;

    try {

      // This try clause includes whatever Channel/Event operations you want to do

      // Receive new data

      Event e = getSomeData();

      // Store the Event into this Source's associated Channel(s)

      getChannelProcessor().processEvent(e);

      status = Status.READY;

    } catch (Throwable t) {

      // Log exception, handle individual exceptions as needed

      status = Status.BACKOFF;

      // re-throw all Errors

      if (t instanceof Error) {

        throw (Error)t;

      }

    } finally {

      txn.close();

    }

    return status;

  }

}

Flume的Source、Sink总结，及常用使用场景的更多相关文章

Flume：source和sink
Flume – 初识flume.source和sink 目录基本概念常用源 Source常用sink 基本概念  什么叫flume? 分布式,可靠的大量日志收集.聚合和移动工具.  events ...
FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC
FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC 最近做了一个事情,过滤下kakfa中的数据后,做这个就用到了flume,直接使用flume source 和 flume s ...
一次flume exec source采集日志到kafka因为单条日志数据非常大同步失败的踩坑带来的思考
本次遇到的问题描述,日志采集同步时,当单条日志(日志文件中一行日志)超过2M大小,数据无法采集同步到kafka,分析后,共踩到如下几个坑.1.flume采集时,通过shell+EXEC(tail -F ...
泛函编程（36）－泛函Stream IO：IO数据源－IO Source & Sink
上期我们讨论了IO处理过程:Process[I,O].我们说Process就像电视信号盒子一样有输入端和输出端两头.Process之间可以用一个Process的输出端与另一个Process的输入端连接 ...
把Flume的Source设置为 Spooling directory source
把Flume的Source设置为 Spooling directory source,在设定的目录下放置需要读取的文件,一些文件在读取过程中会报错. 文件格式和报错如下: 实验一读取汉子和“:&qu ...
flume http source示例讲解
一.介绍 flume自带的Http Source可以通过Http Post接收事件. 场景:对于有些应用程序环境,它可能不能部署Flume SDK及其依赖项,或客户端代码倾向于通过HTTP而不是Flu ...
Redis的Python实践，以及四中常用应用场景详解——学习董伟明老师的《Python Web开发实践》
首先,简单介绍:Redis是一个基于内存的键值对存储系统,常用作数据库.缓存和消息代理. 支持:字符串,字典,列表,集合,有序集合,位图(bitmaps),地理位置,HyperLogLog等多种数据结 ...
Flume组件source，channel，sink源码分析
LifeCycleState: IDLE, START, STOP, ERROR [Source]: org.apache.flume.Source 继承LifeCycleAware{stop() + ...
Flume笔记--source端监听目录，sink端上传到HDFS
官方文档参数解释:http://flume.apache.org/FlumeUserGuide.html#hdfs-sink 需要注意:文件格式,fileType=DataStream 默认为Sequ ...

随机推荐

PIE SDK创建金字塔算法
1.算法功能简介为栅格影像建立了金字塔,这些影像便能快速进行显示.除了在屏幕上显示外,金字塔还包含了很多其他信息.如果没有金字塔,那么在显示时就要访问整理栅格数据集,然后进行大量计算来选择哪些栅格像 ...
vue 强制刷新 demo 神器
this.$forceUpdate() /*关键句,强制更新dom*/
Mybatis映射文件中的标签的使用
<foreach>  <delete id="delMulti" parameterType="java.u ...
ES6 函数的拓展(四)
一.参数带默认值函数1.在函数形参可以赋予函数默认值[即实参严格匹配undefined时,在函数内部使用形参时调用它的默认值]2.函数name属性 [返回函数名称,无名的函数返回空字符串]3.函数le ...
MySQL单表数据不要超过500万行：是经验数值，还是黄金铁律？
本文阅读时间大约3分钟. 梁桂钊 | 作者今天,探讨一个有趣的话题:MySQL 单表数据达到多少时才需要考虑分库分表?有人说 2000 万行,也有人说 500 万行.那么,你觉得这个数值多少才合适呢 ...
oracle linux 7 yum报错解决：COULD NOT RESOLVE HOST: YUM.ORACLE.COM
虚拟机中yum报错 [root@localhost ~]# yum -y install oracle-rdbms-server-11gR2-preinstall Loaded plugins: la ...
windows设置多个JDK环境
1.查看jdk版本 java -version 2.查看JAVA_HOME和PATH的变量值 echo %JAVA_HOME% set path 3.临时修改环境变量JAVA_HOME和PATH的变量 ...
linux Yum相关
python编写,是centos 和 redhat的包管理工具,类似于 pip 常用的yum命令 Yum list 查看所有的包 Yum list python 列出所有python包 yum sea ...
elastalert docker安装
基于对elasticsearch中数据监控需要,我尝试了sentinl和elastalert两款工具.虽然elastalert是纯文本,但易配置管理.elk自带的watch需要付费才可使用. 6.2x ...
短uuid生成
UUID UUID是128位的全局唯一标识符,通常由32字节的字符串表示.它可以保证时间和空间的唯一性,python中称为UUID,其他语言中可能称为GUID. 它通过MAC地址.时间戳.命名空间.随 ...

Flume的Source、Sink总结，及常用使用场景