数据源Source

RPC异构流数据交换

Avro Source
Thrift Source

文件或目录变化监听

Exec Source
Spooling Directory Source
Taildir Source

MQ或队列订阅数据持续监听

JMS Source
SSL and JMS Source
Kafka Source

Network类数据交换

NetCat TCP Source
NetCat UDP Source
HTTP Source
Syslog Sources
Syslog TCP Source
Multiport Syslog TCP Source
Syslog UDP Source

定制源

Custom Source

Sink

HDFS Sink
Hive Sink
Logger Sink
Avro Sink
Thrift Sink
IRC Sink
File Roll Sink
HBaseSinks
HBaseSink
HBase2Sink
AsyncHBaseSink
MorphlineSolrSink
ElasticSearchSink
Kite Dataset Sink
Kafka Sink
HTTP Sink
Custom Sink

案例

1、监听文件变化

exec-memory-logger.properties

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1

#配置sources属性

a1.sources.s1.type = exec

a1.sources.s1.command = tail -F /tmp/log.txt

a1.sources.s1.shell = /bin/bash -c

a1.sources.s1.channels = c1

#配置sink

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = 192.168.1.103

a1.sinks.k1.port = 8888

a1.sinks.k1.batch-size = 1

a1.sinks.k1.channel = c1

#配置channel类型

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-logger.properties --name a1 -Dflume.root.logger=INFO,console

测试

echo "asfsafsf" >> /tmp/log.txt

2、TCP NetCat监听

netcat.properties

# Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

# Describe the sink

a1.sinks.k1.type = logger

# Use a channel which buffers events in memory

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/netcat.properties --name a1 -Dflume.root.logger=INFO,console

测试

telnet localhost 44444

3、Kafka读、写（读：从kafka到log，写：从file到kafka）

read-kafka.properties 、write-kafka.properties

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1  

#配置sources属性

a1.sources.s1.type = org.apache.flume.source.kafka.KafkaSource

a1.sources.s1.channels = c1

a1.sources.s1.batchSize = 5000

a1.sources.s1.batchDurationMillis = 2000

a1.sources.s1.kafka.bootstrap.servers = 192.168.1.103:9092

a1.sources.s1.kafka.topics = test1

a1.sources.s1.kafka.consumer.group.id = custom.g.id

#将sources与channels进行绑定

a1.sources.s1.channels = c1

#配置sink

a1.sinks.k1.type = logger

#将sinks与channels进行绑定

a1.sinks.k1.channel = c1  

#配置channel类型

a1.channels.c1.type = memory

a1.sources = s1

a1.channels = c1

a1.sinks = k1                                                                                         

a1.sources.s1.type=exec

a1.sources.s1.command=tail -F /tmp/kafka.log

a1.sources.s1.channels=c1 

#设置Kafka接收器

a1.sinks.k1.type= org.apache.flume.sink.kafka.KafkaSink

#设置Kafka地址

a1.sinks.k1.brokerList=192.168.1.103:9092

#设置发送到Kafka上的主题

a1.sinks.k1.topic=test1

#设置序列化方式

a1.sinks.k1.serializer.class=kafka.serializer.StringEncoder

a1.sinks.k1.channel=c1     

a1.channels.c1.type=memory

a1.channels.c1.capacity=10000

a1.channels.c1.transactionCapacity=100

启动

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/read-kafka.properties --name a1 -Dflume.root.logger=INFO,console

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/write-kafka.properties --name a1 -Dflume.root.logger=INFO,console

测试

# 创建用于测试主题

bin/kafka-topics.sh --create \

                    --bootstrap-server 192.168.1.103:9092 \

                    --replication-factor 1 \

                    --partitions 1  \

                    --topic test1

# 启动 Producer,用于发送测试数据：

bin/kafka-console-producer.sh --broker-list 192.168.1.103:9092 --topic test1

4、定制源

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = org.example.MySource

a1.sources.r1.channels = c1

5、HDFS Sink

spooling-memory-hdfs.properties ，监听目录变化，将新建的文件传到HDFS

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1  

#配置sources属性

a1.sources.s1.type =spooldir

a1.sources.s1.spoolDir =/tmp/log2

a1.sources.s1.basenameHeader = true

a1.sources.s1.basenameHeaderKey = fileName

#将sources与channels进行绑定

a1.sources.s1.channels =c1 

#配置sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H/

a1.sinks.k1.hdfs.filePrefix = %{fileName}

#生成的文件类型，默认是Sequencefile，可用DataStream，则为普通文本

a1.sinks.k1.hdfs.fileType = DataStream

a1.sinks.k1.hdfs.useLocalTimeStamp = true

#将sinks与channels进行绑定

a1.sinks.k1.channel = c1

#配置channel类型

a1.channels.c1.type = memory

测试

hdfs dfs -ls /flume/events/19-11-21/15

6、Hive Sink

a1.channels = c1

a1.channels.c1.type = memory

a1.sinks = k1

a1.sinks.k1.type = hive

a1.sinks.k1.channel = c1

a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083

a1.sinks.k1.hive.database = logsdb

a1.sinks.k1.hive.table = weblogs

a1.sinks.k1.hive.partition = asia,%{country},%y-%m-%d-%H-%M

a1.sinks.k1.useLocalTimeStamp = false

a1.sinks.k1.round = true

a1.sinks.k1.roundValue = 10

a1.sinks.k1.roundUnit = minute

a1.sinks.k1.serializer = DELIMITED

a1.sinks.k1.serializer.delimiter = "\t"

a1.sinks.k1.serializer.serdeSeparator = '\t'

a1.sinks.k1.serializer.fieldnames =id,,msg

7、Avro Source、Avro Sink

exec-memory-avro.properties、avro-memory-log.properties

#指定agent的sources,sinks,channels

a1.sources = s1

a1.sinks = k1

a1.channels = c1

#配置sources属性

a1.sources.s1.type = exec

a1.sources.s1.command = tail -F /tmp/log.txt

a1.sources.s1.shell = /bin/bash -c

a1.sources.s1.channels = c1

#配置sink

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = 192.168.1.103

a1.sinks.k1.port = 8888

a1.sinks.k1.batch-size = 1

a1.sinks.k1.channel = c1

#配置channel类型

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

#指定agent的sources,sinks,channels

a2.sources = s2

a2.sinks = k2

a2.channels = c2

#配置sources属性

a2.sources.s2.type = avro

a2.sources.s2.bind = 192.168.1.103

a2.sources.s2.port = 8888

#将sources与channels进行绑定

a2.sources.s2.channels = c2

#配置sink

a2.sinks.k2.type = logger

#将sinks与channels进行绑定

a2.sinks.k2.channel = c2

#配置channel类型

a2.channels.c2.type = memory

a2.channels.c2.capacity = 1000

a2.channels.c2.transactionCapacity = 100

启动

先

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/avro-memory-log.properties --name a2 -Dflume.root.logger=INFO,console

后

flume-ng agent --conf conf --conf-file /usr/app/apache-flume-1.8.0-bin/exec-memory-avro.properties --name a1 -Dflume.root.logger=INFO,console

测试，使用一个Avro客户端发送数据

import org.apache.flume.Event;

import org.apache.flume.EventDeliveryException;

import org.apache.flume.event.EventBuilder;

import org.apache.flume.api.SecureRpcClientFactory;

import org.apache.flume.api.RpcClientConfigurationConstants;

import org.apache.flume.api.RpcClient;

import java.nio.charset.Charset;

import java.util.Properties;

public class MyApp {

  public static void main(String[] args) {

    MySecureRpcClientFacade client = new MySecureRpcClientFacade();

    // Initialize client with the remote Flume agent's host, port

    Properties props = new Properties();

    props.setProperty(RpcClientConfigurationConstants.CONFIG_CLIENT_TYPE, "thrift");

    props.setProperty("hosts", "h1");

    props.setProperty("hosts.h1", "client.example.org"+":"+ String.valueOf(8888));

    // Initialize client with the kerberos authentication related properties

    props.setProperty("kerberos", "true");

    props.setProperty("client-principal", "flumeclient/client.example.org@EXAMPLE.ORG");

    props.setProperty("client-keytab", "/tmp/flumeclient.keytab");

    props.setProperty("server-principal", "flume/server.example.org@EXAMPLE.ORG");

    client.init(props);

    // Send 10 events to the remote Flume agent. That agent should be

    // configured to listen with an AvroSource.

    String sampleData = "Hello Flume!";

    for (int i = 0; i < 10; i++) {

      client.sendDataToFlume(sampleData);

    }

    client.cleanUp();

  }

}

class MySecureRpcClientFacade {

  private RpcClient client;

  private Properties properties;

  public void init(Properties properties) {

    // Setup the RPC connection

    this.properties = properties;

    // Create the ThriftSecureRpcClient instance by using SecureRpcClientFactory

    this.client = SecureRpcClientFactory.getThriftInstance(properties);

  }

  public void sendDataToFlume(String data) {

    // Create a Flume Event object that encapsulates the sample data

    Event event = EventBuilder.withBody(data, Charset.forName("UTF-8"));

    // Send the event

    try {

      client.append(event);

    } catch (EventDeliveryException e) {

      // clean up and recreate the client

      client.close();

      client = null;

      client = SecureRpcClientFactory.getThriftInstance(properties);

    }

  }

  public void cleanUp() {

    // Close the RPC connection

    client.close();

  }

}

8、Elasticsearch Sink

a1.channels = c1

a1.sinks = k1

a1.sinks.k1.type = elasticsearch

a1.sinks.k1.hostNames = 127.0.0.1:9200,127.0.0.2:9300

a1.sinks.k1.indexName = foo_index

a1.sinks.k1.indexType = bar_type

a1.sinks.k1.clusterName = foobar_cluster

a1.sinks.k1.batchSize = 500

a1.sinks.k1.ttl = 5d

a1.sinks.k1.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchDynamicSerializer

a1.sinks.k1.channel = c1

9、定制Source、Sink开发

public class MySink extends AbstractSink implements Configurable {

  private String myProp;

  @Override

  public void configure(Context context) {

    String myProp = context.getString("myProp", "defaultValue");

    // Process the myProp value (e.g. validation)

    // Store myProp for later retrieval by process() method

    this.myProp = myProp;

  }

  @Override

  public void start() {

    // Initialize the connection to the external repository (e.g. HDFS) that

    // this Sink will forward Events to ..

  }

  @Override

  public void stop () {

    // Disconnect from the external respository and do any

    // additional cleanup (e.g. releasing resources or nulling-out

    // field values) ..

  }

  @Override

  public Status process() throws EventDeliveryException {

    Status status = null;

    // Start transaction

    Channel ch = getChannel();

    Transaction txn = ch.getTransaction();

    txn.begin();

    try {

      // This try clause includes whatever Channel operations you want to do

      Event event = ch.take();

      // Send the Event to the external repository.

      // storeSomeData(e);

      txn.commit();

      status = Status.READY;

    } catch (Throwable t) {

      txn.rollback();

      // Log exception, handle individual exceptions as needed

      status = Status.BACKOFF;

      // re-throw all Errors

      if (t instanceof Error) {

        throw (Error)t;

      }

    }

    return status;

  }

}

public class MySource extends AbstractSource implements Configurable, PollableSource {

  private String myProp;

  @Override

  public void configure(Context context) {

    String myProp = context.getString("myProp", "defaultValue");

    // Process the myProp value (e.g. validation, convert to another type, ...)

    // Store myProp for later retrieval by process() method

    this.myProp = myProp;

  }

  @Override

  public void start() {

    // Initialize the connection to the external client

  }

  @Override

  public void stop () {

    // Disconnect from external client and do any additional cleanup

    // (e.g. releasing resources or nulling-out field values) ..

  }

  @Override

  public Status process() throws EventDeliveryException {

    Status status = null;

    try {

      // This try clause includes whatever Channel/Event operations you want to do

      // Receive new data

      Event e = getSomeData();

      // Store the Event into this Source's associated Channel(s)

      getChannelProcessor().processEvent(e);

      status = Status.READY;

    } catch (Throwable t) {

      // Log exception, handle individual exceptions as needed

      status = Status.BACKOFF;

      // re-throw all Errors

      if (t instanceof Error) {

        throw (Error)t;

      }

    } finally {

      txn.close();

    }

    return status;

  }

}

Flume的Source、Sink总结，及常用使用场景的更多相关文章

Flume：source和sink
Flume – 初识flume.source和sink 目录基本概念常用源 Source常用sink 基本概念  什么叫flume? 分布式,可靠的大量日志收集.聚合和移动工具.  events ...
FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC
FLUME KAFKA SOURCE 和 SINK 使用同一个 TOPIC 最近做了一个事情,过滤下kakfa中的数据后,做这个就用到了flume,直接使用flume source 和 flume s ...
一次flume exec source采集日志到kafka因为单条日志数据非常大同步失败的踩坑带来的思考
本次遇到的问题描述,日志采集同步时,当单条日志(日志文件中一行日志)超过2M大小,数据无法采集同步到kafka,分析后,共踩到如下几个坑.1.flume采集时,通过shell+EXEC(tail -F ...
泛函编程（36）－泛函Stream IO：IO数据源－IO Source & Sink
上期我们讨论了IO处理过程:Process[I,O].我们说Process就像电视信号盒子一样有输入端和输出端两头.Process之间可以用一个Process的输出端与另一个Process的输入端连接 ...
把Flume的Source设置为 Spooling directory source
把Flume的Source设置为 Spooling directory source,在设定的目录下放置需要读取的文件,一些文件在读取过程中会报错. 文件格式和报错如下: 实验一读取汉子和“:&qu ...
flume http source示例讲解
一.介绍 flume自带的Http Source可以通过Http Post接收事件. 场景:对于有些应用程序环境,它可能不能部署Flume SDK及其依赖项,或客户端代码倾向于通过HTTP而不是Flu ...
Redis的Python实践，以及四中常用应用场景详解——学习董伟明老师的《Python Web开发实践》
首先,简单介绍:Redis是一个基于内存的键值对存储系统,常用作数据库.缓存和消息代理. 支持:字符串,字典,列表,集合,有序集合,位图(bitmaps),地理位置,HyperLogLog等多种数据结 ...
Flume组件source，channel，sink源码分析
LifeCycleState: IDLE, START, STOP, ERROR [Source]: org.apache.flume.Source 继承LifeCycleAware{stop() + ...
Flume笔记--source端监听目录，sink端上传到HDFS
官方文档参数解释:http://flume.apache.org/FlumeUserGuide.html#hdfs-sink 需要注意:文件格式,fileType=DataStream 默认为Sequ ...

随机推荐

k8s时区问题解决方案
前几天在使用k8s中的CronJob时发现了一个很奇怪的问题, 按照官方文档的demo跑起来是没有任何问题的, 但是当我想要设置每天一个固定时间点例如12点20执行一个job的时候,到了时间之后无论如 ...
移动端调试神器vconsole,手机端网页的调试工具Eruda
移动端调试神器vconsole,手机端网页的调试工具Eruda 移动端中使用 vConsole调试移动端调试工具vconsole安装Git地址:https://github.com/WechatFE ...
记录下vue keep-alive IOS下无法保存滚动scroll位置的问题
最近做的项目,遇到了一点小麻烦,就是我一个页面A页面是加载列表数据 ,B页面是展示详细信息的.A进去B时,缓存A页面. 效果做出来后,缓存是缓存数据了,但是当我A页面的列表数据好多,要滚动 ...
Cocos Creator （JavaScript手机类型判断）
手机类型判断 var BrowserInfo = { userAgent: navigator.userAgent.toLowerCase() isAndroid: Boolean(navigator ...
Guava工具类学习
目录一.介绍二.Optional类 1.定义 2.java8自带Optional 3.使用三.Preconditions类 1.定义 2.使用四.Ordering类 1.定义 2.使用五.R ...
Jmeter测试技巧
最近在用Jmeter做接口测试,使用中整理了一些组件的使用技巧. 一. 用户定义的变量都是全局变量,无论是否在某个线程组或请求内,都是采用最新赋值的内容二. 固定定时器在单个请求内是让本请求线程 ...
Centos7 安装配置 Rabbitmq Cluster
Rabbitmq介绍 RabbitMQ是由 LShift 提供的一个 Advanced Message Queuing Protocol (AMQP) 的开源实现,由以高性能.健壮以及可伸缩性出名的 ...
C++ OpenSSL 之五：生成P12文件
1.等同于使用: openssl pkcs12 -export -inkey "key_path" -in "pem_path" -out "save ...
mongo find 时间条件过滤
db.order.find({"order_time":{"$gte": new Date("Tue Jan 01 2017 00:00:00 GMT ...
Alipay支付宝支付报错 invalid [default store dir]: /tmp/
1.如果使用支付宝sdk,首先lotusphp_runtime 文件也要一起使用支付宝现在的php sdk中有lotus框架可以和aop文件. 2.保证AopSdk.php文件中的方法可以走到这个 ...

Flume的Source、Sink总结，及常用使用场景