Flume-Replicating Channel Selector 单数据源多出口

使用 Flume-1 监控文件变动，Flume-1 使用 Replicating Channel Selector 将变动内容传递给 Flume-2，Flume-2 负责存储到 HDFS。同时 Flume-1 将变动内容传递给 Flume-3，Flume-3 负责输出到 Local FileSystem。

一、创建配置文件

1.flume-file-flume.conf

配置 1 个接收日志文件的 source 和两个 channel、两个 sink，分别输送给 flume-flume-hdfs 和 flume-flume-dir。

# Name the components on this agent

a1.sources = r1

a1.sinks = k1 k2

a1.channels = c1 c2

# 将数据流复制给所有 channel

a1.sources.r1.selector.type = replicating

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /tmp/tomcat.log

a1.sources.r1.shell = /bin/bash -c

# Describe the sink

# sink 端的 avro 是一个数据发送者

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = h136

a1.sinks.k1.port = 4141

a1.sinks.k2.type = avro

a1.sinks.k2.hostname = h136

a1.sinks.k2.port = 4142

# Describe the channel

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory

a1.channels.c2.capacity = 1000

a1.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel

a1.sources.r1.channels = c1 c2

a1.sinks.k1.channel = c1

a1.sinks.k2.channel = c2

2.flume-flume-hdfs.conf

配置上级 Flume 输出的 Source，输出是到 HDFS 的 Sink。

# Name the components on this agent

a2.sources = r1

a2.sinks = k1

a2.channels = c1

# Describe/configure the source

# source 端的 avro 是一个数据接收服务

a2.sources.r1.type = avro

a2.sources.r1.bind = h136

a2.sources.r1.port = 4141

# Describe the sink

a2.sinks.k1.type = hdfs

a2.sinks.k1.hdfs.path = hdfs://h136:9000/flume2/%Y%m%d/%H

#上传文件的前缀

a2.sinks.k1.hdfs.filePrefix = flume2-

#是否按照时间滚动文件夹

a2.sinks.k1.hdfs.round = true

#多少时间单位创建一个新的文件夹

a2.sinks.k1.hdfs.roundValue = 1

#重新定义时间单位

a2.sinks.k1.hdfs.roundUnit = hour

#是否使用本地时间戳

a2.sinks.k1.hdfs.useLocalTimeStamp = true

#积攒多少个 Event 才 flush 到 HDFS 一次

a2.sinks.k1.hdfs.batchSize = 100

#设置文件类型，可支持压缩

a2.sinks.k1.hdfs.fileType = DataStream

#多久生成一个新的文件

a2.sinks.k1.hdfs.rollInterval = 600

#设置每个文件的滚动大小大概是 128M

a2.sinks.k1.hdfs.rollSize = 134217700

#文件的滚动与 Event 数量无关

a2.sinks.k1.hdfs.rollCount = 0

# Describe the channel

a2.channels.c1.type = memory

a2.channels.c1.capacity = 1000

a2.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel

a2.sources.r1.channels = c1

a2.sinks.k1.channel = c1

3.flume-flume-dir.conf

配置上级 Flume 输出的 Source，输出是到本地目录的 Sink。

输出的本地目录必须是已经存在的目录，如果该目录不存在，并不会创建新的目录。

# Name the components on this agent

a3.sources = r1

a3.sinks = k1

a3.channels = c2

# Describe/configure the source

a3.sources.r1.type = avro

a3.sources.r1.bind = h136

a3.sources.r1.port = 4142

# Describe the sink

a3.sinks.k1.type = file_roll

a3.sinks.k1.sink.directory = /tmp/flumeData

# Describe the channel

a3.channels.c2.type = memory

a3.channels.c2.capacity = 1000

a3.channels.c2.transactionCapacity = 100

# Bind the source and sink to the channel

a3.sources.r1.channels = c2

a3.sinks.k1.channel = c2

二、测试

需要启动 HDFS，由于 flume-file-flume.conf 向另外两个发送数据，即 flume-flume-hdfs.conf 和 flume-flume-dir.conf 为服务端接收数据，需要在 flume-file-flume.conf 之前启动。

cd /opt/apache-flume-1.9.-bin

bin/flume-ng agent --conf conf/ --name a3 --conf-file /tmp/flume-job/group1/flume-flume-dir.conf -Dflume.root.logger=INFO,console

bin/flume-ng agent --conf conf/ --name a2 --conf-file /tmp/flume-job/group1/flume-flume-hdfs.conf -Dflume.root.logger=INFO,console

bin/flume-ng agent --conf conf/ --name a1 --conf-file /tmp/flume-job/group1/flume-file-flume.conf -Dflume.root.logger=INFO,console

向监听文件追加数据，查看变化。

echo '789qwewqe' >> /tmp/tomcat.log

echo '123cvbcvbcv' >> /tmp/tomcat.log

echo '456jkuikmjh' >> /tmp/tomcat.log

Flume-Replicating Channel Selector 单数据源多出口的更多相关文章

Flume配置Replicating Channel Selector
1 官网内容上面的配置是r1获取到的内容会同时复制到c1 c2 c3 三个channel里面 2 详细配置信息 # Name the components on this agent a1.sour ...
Flume Channel Selector
Flume 基于Channel Selector可以实现扇入.扇出. 同一个数据源分发到不同的目的,如下图. 在source上可以定义channel selector: 1 2 3 4 5 6 7 8 ...
SpringBoot 集成mongodb（1）单数据源配置
新项目要用到mongodb,于是在个人电脑上的虚拟环境linux上安装了下mongodb,练习熟悉下. 1.虚拟机上启动mongodb. 首先查看虚拟机ip地址,忘了哈~~ 命令行>ifconf ...
Flume配置Multiplexing Channel Selector
1 官网内容上面配置的是根据不同的heder当中state值走不同的channels,如果是CZ就走c1 如果是US就走c2 c3 其他默认走c4 2 我的详细配置信息一个监听http端口然后 ...
关于Flume中Chanel.Selector.header解释
flume内置的ChannelSelector有两种,分别是Replicating和Multiplexing. Replicating类型的ChannelSelector会针对每一个Event,拷贝到 ...
Flume 学习笔记之 Flume NG概述及单节点安装
Flume NG概述: Flume NG是一个分布式,高可用,可靠的系统,它能将不同的海量数据收集,移动并存储到一个数据存储系统中.轻量,配置简单,适用于各种日志收集,并支持 Failover和负载均 ...
flume file channel 异常解决
1. 错误提示 -- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$ ...
Flume的Channel
一.Memory Channel 事件将被存储在内存中(指定大小的队列里) 非常适合那些需要高吞吐量且允许数据丢失的场景下属性说明: 二.JDBC Channel 事件会被持久化(存储)到可靠的数据 ...
NIO的Buffer&Channel&Selector
java的NIO和AIO Buffer position.limit.capacity 初始化 Buffer 填充 Buffer 提取 Buffer 中的值 mark() & reset() ...

随机推荐

Redis分布式缓存安装和使用
独立缓存服务器: LinuxCentOS Redis版本: 3.0 下面我们针对于Redis安装做下详细的记录: 编译和安装所需的包: #yum install gcc tcl创建安装目录:贵州中医肝 ...
、M/C/U/简单加/密方法、
............................... 一．STM32Flash组织 STM32的Flash包括主存储器(HD版本,512KB)+信息块.信息块包括2KB的系统存储器(用于系统 ...
遍历二叉树 - 基于栈的DFS
之前已经学过二叉树的DFS的遍历算法[http://www.cnblogs.com/webor2006/p/7244499.html],当时是基于递归来实现的,这次利用栈不用递归也来实现DFS的遍历, ...
PAT乙级1037
题目链接 https://pintia.cn/problem-sets/994805260223102976/problems/994805284923359232 题解还算简单,就是模拟我们在生活 ...
RHEL8 创建本地YUM存储库
yum 的好处及本地yum的好处不在本文讨论范畴,本文针对rhel8中的新功能yum做简要介绍和配置,在 RHEL 8中分为两个存储库: BaseOS 应用程序流(AppStream) BaseOS中 ...
modbus-crc16——c语言
为确保消息数据的完整性,除了验证消息CRC之外,建议实现检查串行端口(UART)成帧错误的代码.如果接收消息中的CRC与接收设备计算的CRC不匹配,则应忽略该消息.下面的C语言代码片段显示了如何使用逐 ...
状压dp做题笔记
CodeChef Factorial to Square (分块决策) Description 给定一个n,要求在[1,n]中删除一些数,并使剩下的数的乘积是一个完全平方数,同时要求乘积最大,求删除方 ...
BZOJ 3744 Gty的妹子序列分块+树状数组
具体分析见搬来大佬博客时间复杂度 O(nnlogn)O(n\sqrt nlogn)O(nnlogn) CODE #include <cmath> #include <cctyp ...
mysql数据库系统学习（一）---一条SQL查询语句是如何执行的？
本文基于----MySQL实战45讲(极客时间----林晓斌 )整理----->https://time.geekbang.org/column/article/68319 一.第一节:一条sq ...
python-platform模块：平台相关属性
import platform x=platform.machine() #返回平台架构 #AMD64 x=platform.node() #网络名称(主机名) #DESKTOP-KIK668C x= ...