Flume:
=====================
Flume是一种分布式的、可靠的、可用的服务,可以有效地收集、聚合和移动大量的日志数据。
它有一个基于流数据的简单而灵活的体系结构。
它具有健壮性和容错能力,具有可调的可靠性机制和许多故障转移和恢复机制。
它使用一个简单的可扩展数据模型,允许在线分析应用程序。

source:源
对channel而言,相当于生产者,通过接收各种格式数据发送给channel进行传输

channel:通道
相当于数据缓冲区,接收source数据发送给sink

sink:沉槽
对channel而言,相当于消费者,通过接收channel数据通过指定数据类型发送到指定位置

Event:
===============
flume传输基本单位:
head + body

flume安装:
================
1、解压
2、符号链接
3、配置环境变量并使其生效
4、修改配置文件
1)重命名flume-env.ps1.template为flume-env.ps1
2)重命名flume-env.sh.template为flume-env.sh
3)修改flume-env.sh,配置jdk目录,添加
export JAVA_HOME=/soft/jdk

5、flume 查看版本
flume-ng version

flume使用:
=========================
//flume可以将配置文件写在zk上

//flume运行命令
flume-ng agent -n a1 -f xxx.conf /flume-ng agent -n xx -f xxx.conf

agent: a1
source: s1
channel:c1
sink: n1

使用方法:
1、编写配置文件r_nc.conf
# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

# 配置sink
a1.sinks.k1.type = logger

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2、启动flume,指定配置文件
flume-ng agent -n a1 -f r_nc.conf

3、启动另一个会话,进行测试
nc localhost 8888

//用户手册
http://flume.apache.org/FlumeUserGuide.html

后台运行程序:
=============================================

ctrl + z :将程序放在后台运行 =====> [1]+ Stopped flume-ng agent -n a1 -f r_nc.conf

通过 bg %1 的方式将程序后台运行

通过jobs查看后台任务

通过 fg %1 的方式将程序放在前台运行

flume:
海量日志数据的收集、聚合和移动

flume-ng agent -n a1 -f xxx.conf

source
相对于channel是生产者 //netcat
channel
类似于缓冲区 //memory
sink
相对于channel是消费者 //logger

Event:
header + body
k v data

source:
============================================
1、序列(seq)源:多用作测试
# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = seq
# 总共发送的事件个数
a1.sources.r1.totalEvents = 1000

# 配置sink
a1.sinks.k1.type = logger

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2、压力(stress)源:多用作负载测试
# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = org.apache.flume.source.StressSource
# 单个事件大小,单位:byte
a1.sources.r1.size = 10240
# 事件总数
a1.sources.r1.maxTotalEvents = 1000000

# 配置sink
a1.sinks.k1.type = logger

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3、滚动目录(Spooldir)源:监听指定目录新文件产生,并将新文件数据作为event发送
# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = spooldir
# 设置监听目录
a1.sources.r1.spoolDir = /home/centos/spooldir

# 通过以下配置指定消费完成后文件后缀
#a1.sources.r1.fileSuffix = .COMPLETED

# 配置sink
a1.sinks.k1.type = logger

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4、exec源 //通过执行linux命令产生新数据
//典型应用 tail -F (监听一个文件,文件增长的时候,输出追加数据)
//不能保证数据完整性,很可能丢失数据

# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = exec
# 配置linux命令
a1.sources.r1.command = tail -F /home/centos/readme.txt

# 配置sink
a1.sinks.k1.type = logger

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

5、Taildir源 //监控目录下文件
//文件类型可通过正则指定
//有容灾机制

# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = TAILDIR
# 设置source组 可设置多个
a1.sources.r1.filegroups = f1
# 设置组员的监控目录和监控文件类型,使用正则表示,只能监控文件
a1.sources.r1.filegroups.f1 = /home/centos/taildir/.*

# 设置定位文件的位置
# a1.sources.r1.positionFile ~/.flume/taildir_position.json

# 配置sink
a1.sinks.k1.type = logger

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

sink:
====================================
1、fileSink //多用作数据收集
# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

# 配置sink
a1.sinks.k1.type = file_roll
# 配置目标文件夹
a1.sinks.k1.sink.directory = /home/centos/file
# 设置滚动间隔,默认30s,设为0则不滚动,成为单个文件
a1.sinks.k1.sink.rollInterval = 0

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2、hdfsSink //默认以seqFile格式写入
//k:LongWritable
//v: BytesWritable
//
# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

# 配置sink
a1.sinks.k1.type = hdfs
# 配置目标文件夹
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/
# 配置文件前缀
a1.sinks.k1.hdfs.filePrefix = events-
# 滚动间隔,秒
a1.sinks.k1.hdfs.rollInterval = 0
# 触发滚动文件大小,byte
a1.sinks.k1.hdfs.rollSize = 1024
# 配置使用本地时间戳
a1.sinks.k1.hdfs.useLocalTimeStamp = true
# 配置输出文件类型,默认SequenceFile
# DataStream文本格式,不能设置压缩编解码器
# CompressedStream压缩文本格式,需要设置编解码器
a1.sinks.k1.hdfs.fileType = DataStream

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3、hiveSink: //hiveserver帮助:hive --service help
//1、hive --service metastore 启动hive的metastore服务,metastore地址:thrift://localhost:9083
//2、将hcatalog的依赖放在/hive/lib下,cp hive-hcatalog* /soft/hive/lib (位置/soft/hive/hcatalog/share/hcatalog)
//3、创建hive事务表
//SET hive.support.concurrency=true;
SET hive.enforce.bucketing=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
SET hive.compactor.initiator.on=true;
SET hive.compactor.worker.threads=1;

//create table myhive.weblogs(id int, name string, age int)
clustered by(id) into 2 buckets
row format delimited
fields terminated by '\t'
stored as orc
tblproperties('transactional'='true');

# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

# 配置sink
a1.sinks.k1.type = hive
a1.sinks.k1.hive.metastore = thrift://127.0.0.1:9083
a1.sinks.k1.hive.database = myhive
a1.sinks.k1.hive.table = weblogs
a1.sinks.k1.useLocalTimeStamp = true
#输入格式,DELIMITED和json
#DELIMITED 普通文本
#json json文件
a1.sinks.k1.serializer = DELIMITED
#输入字段分隔符,双引号
a1.sinks.k1.serializer.delimiter = ","
#输出字段分隔符,单引号
a1.sinks.k1.serializer.serdeSeparator = '\t'
#字段名称,","分隔,不能有空格
a1.sinks.k1.serializer.fieldnames =id,name,age

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4、hbaseSink //SimpleHbaseEventSerializer将rowKey和col设置了默认值,不能自定义
//RegexHbaseEventSerializer可以手动指定rowKey和col字段名称

# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

# 配置sink
a1.sinks.k1.type = hbase
a1.sinks.k1.table = flume_hbase
a1.sinks.k1.columnFamily = f1
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer

# 配置col正则手动指定
# rowKeyIndex手动指定rowKey,索引以0开头
a1.sinks.k1.serializer.colNames = ROW_KEY,name,age
a1.sinks.k1.serializer.regex = (.*),(.*),(.*)
a1.sinks.k1.serializer.rowKeyIndex=0

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

5、asynchbaseSink //异步hbaseSink
//异步机制,写入速度快
# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

# 配置sink
a1.sinks.k1.type = asynchbase
a1.sinks.k1.table = flume_hbase
a1.sinks.k1.columnFamily = f1
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

channel:缓冲区
=====================================
1、memorychannel
a1.channels.c1.type = memory
# 缓冲区中存留的最大event个数
a1.channels.c1.capacity = 1000
# channel从source中每个事务提取的最大event数
# channel发送给sink每个事务发送的最大event数
a1.channels.c1.transactionCapacity = 100

2、fileChannel: //检查点和数据存储在默认位置时,当多个channel同时开启
//会导致文件冲突,引发其他channel会崩溃

# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

# 配置sink
a1.sinks.k1.type = logger

# 配置channel
a1.channels = c1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /home/centos/flume/checkpoint
a1.channels.c1.dataDirs = /home/centos/flume/data

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

memoryChannel:快速,但是当设备断电,数据会丢失

FileChannel: 速度较慢,即使设备断电,数据也不会丢失

Avro
===============================================
source
# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4444

# 配置sink
a1.sinks.k1.type = logger

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

***********************************************************************************************
*启动avro客户端,发送数据: *
* flume-ng avro-client -H localhost -p 4444 -R ~/avro/header.txt -F ~/avro/user0.txt *
* 指定ip 指定端口 指定header文件 指定数据文件 *
***********************************************************************************************

sink
# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /home/centos/taildir/.*

# 配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.bind = 192.168.23.101
a1.sinks.k1.port = 4444

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

Flume跃点:
=====================================
1、将s101的flume发送到其他节点
xsync.sh /soft/flume
xsync.sh /soft/apache-flume-1.8.0-bin/

2、切换到root用户,分发环境变量文件
su root
xsync.sh /etc/profile
exit

3、配置文件
1)配置s101 //hop.conf
设置source:avro
设置sink: hdfs

# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4444

# 配置sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/hop/%y-%m-%d/
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.rollInterval = 0
a1.sinks.k1.hdfs.rollSize = 1024
a1.sinks.k1.hdfs.useLocalTimeStamp = true
a1.sinks.k1.hdfs.fileType = DataStream

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2)配置s102-s104 //hop2.conf
设置source:taildir
设置sink: avro

# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = TAILDIR
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /home/centos/taildir/.*

# 配置sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = 192.168.23.101
a1.sinks.k1.port = 4444

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4、在s102-s104创建~/taildir文件夹
xcall.sh "mkdir ~/taildir"

5、启动s101的flume
flume-ng agent -n a1 -f /soft/flume/conf/hop.conf

6、分别启动s102-s104的flume,并将其放在后台运行
flume-ng agent -n a1 -f /soft/flume/conf/hop2.conf &

7、进行测试,分别在s102-s104的taildir中创建数据,观察hdfs数据情况
s102]$ echo 102 > taildir/1.txt
s103]$ echo 103 > taildir/1.txt
s104]$ echo 104 > taildir/1.txt

interceptor:拦截器
==================================
是source端组件:负责修改或删除event
每个source可以配置多个拦截器 ===> interceptorChain

1、Timestamp Interceptor //时间戳拦截器 + header

# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888
# 给拦截器起名
a1.sources.r1.interceptors = i1
# 指定拦截器类型
a1.sources.r1.interceptors.i1.type = timestamp

# 配置sink
a1.sinks.k1.type = logger

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

2、Static Interceptor //静态拦截器 + header

3、Host Interceptor //主机拦截器 + header

4、设置拦截器链:

# 将agent组件起名
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

a1.sources.r1.interceptors = i1 i2 i3
a1.sources.r1.interceptors.i1.type = timestamp
a1.sources.r1.interceptors.i2.type = host
a1.sources.r1.interceptors.i3.type = static
a1.sources.r1.interceptors.i3.key = location
a1.sources.r1.interceptors.i3.value = NEW_YORK

# 配置sink
a1.sinks.k1.type = logger

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

channel selector:通道挑选器
====================================
是source端组件:负责将event发送到指定的channel,相当于分区

当一个source设置多个channel时,默认以副本形式向每个channel发送一个event拷贝

1、replication副本通道挑选器 //默认挑选器,source将所有channel发送event副本
//设置source x 1, channel x 3, sink x 3
// nc memory file

# 将agent组件起名
a1.sources = r1
a1.sinks = k1 k2 k3
a1.channels = c1 c2 c3

# 配置source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888
a1.sources.r1.selector.type = replicating

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

a1.channels.c3.type = memory
a1.channels.c3.capacity = 1000
a1.channels.c3.transactionCapacity = 100

# 配置sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/centos/file1
a1.sinks.k1.sink.rollInterval = 0

a1.sinks.k2.type = file_roll
a1.sinks.k2.sink.directory = /home/centos/file2
a1.sinks.k2.sink.rollInterval = 0

a1.sinks.k3.type = file_roll
a1.sinks.k3.sink.directory = /home/centos/file3
a1.sinks.k3.sink.rollInterval = 0

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1 c2 c3
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3

2、Multiplexing 多路复用通道挑选器 //选择avro源发送文件

# 将agent组件起名
a1.sources = r1
a1.sinks = k1 k2 k3
a1.channels = c1 c2 c3

# 配置source
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4444
# 配置通道挑选器
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = country
a1.sources.r1.selector.mapping.CN = c1
a1.sources.r1.selector.mapping.US = c2
a1.sources.r1.selector.default = c3

# 配置channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100

a1.channels.c3.type = memory
a1.channels.c3.capacity = 1000
a1.channels.c3.transactionCapacity = 100

# 配置sink
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/centos/file1
a1.sinks.k1.sink.rollInterval = 0

a1.sinks.k2.type = file_roll
a1.sinks.k2.sink.directory = /home/centos/file2
a1.sinks.k2.sink.rollInterval = 0

a1.sinks.k3.type = file_roll
a1.sinks.k3.sink.directory = /home/centos/file3
a1.sinks.k3.sink.rollInterval = 0

# 绑定channel-source, channel-sink
a1.sources.r1.channels = c1 c2 c3
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2
a1.sinks.k3.channel = c3

1、创建file1 file2 file3文件夹,家目录
mkdir file1 file2 file3

2、创建文件夹country,并放入头文件和数据
创建头文件CN.txt、US.txt、OTHER.txt
CN.txt ===> country CN
US.txt ===> country US
OTHER.txt ===> country OTHER

创建数据 1.txt
1.txt ====> helloworld

3、运行flume
flume-ng agent -n a1 -f /soft/flume/selector_multi.conf

4、运行Avro客户端
flume-ng avro-client -H localhost -p 4444 -R ~/country/US.txt -F ~/country/1.txt ===> 查看file2
flume-ng avro-client -H localhost -p 4444 -R ~/country/CN.txt -F ~/country/1.txt ===> 查看file1
flume-ng avro-client -H localhost -p 4444 -R ~/country/OTHER.txt -F ~/country/1.txt ===> 查看file3

sinkProcessor
=================================
sink Runner 运行一个 sink Group

sink Group 是由一个或多个 sink 构成

sink Runner 告诉 sink Group 处理下一批 event

sink Group 含有一个 sink Processor , 负责指定一个 sink 来处理这批数据

2、failover 容灾 //将所有sink设置一个优先级
//数量越大,优先级越高
//当数据传入时,优先级最高的sink负责处理
//当sink挂掉,次高优先级的sink被激活,继续处理数据
//channel和sink必须一对一

a1.sources = r1
a1.sinks = s1 s2 s3
a1.channels = c1 c2 c3

# Describe/configure the source
a1.sources.r1.type = seq

a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = s1 s2 s3
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.s1 = 5
a1.sinkgroups.g1.processor.priority.s2 = 10
a1.sinkgroups.g1.processor.priority.s3 = 15
a1.sinkgroups.g1.processor.maxpenalty = 10000

# Describe the sink
a1.sinks.s1.type = file_roll
a1.sinks.s1.sink.directory = /home/centos/file1
a1.sinks.s2.type = file_roll
a1.sinks.s2.sink.directory = /home/centos/file2
a1.sinks.s3.type = file_roll
a1.sinks.s3.sink.directory = /home/centos/file3

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c2.type = memory
a1.channels.c3.type = memory

# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2 c3
a1.sinks.s1.channel = c1
a1.sinks.s2.channel = c2
a1.sinks.s3.channel = c3

Event事件是由Source端封装输入数据的字节数组得来的
Event event = EventBuilder.withBody(body);

Sink中的process方法返回两种状态:
1、READY //一个或多个event成功分发
2、BACKOFF //channel中没有数据提供给sink

flume中事务的生命周期:

tx.begin() //开启事务,之后执行操作
tx.commit() //提交事务,操作完成后由此提交
tx.rollback() //回滚事务,出现异常可以采取回滚措施
tx.close() //关闭事务,最后一定要关闭事务

flume的配置详解的更多相关文章

  1. Flume NG 配置详解(转)

    原文链接:[转]Flume NG 配置详解 (说明,名词对应解释 源-Source,接收器-Sink,通道-Channel) 配置 设置代理 Flume代理配置存储在本地配置文件.这是一个文本文件格式 ...

  2. Flume NG 配置详解

    配置 设置代理 Flume代理配置存储在本地配置文件.这是一个文本文件格式,是Java属性文件格式.在相同的配置文件,可以指定一个或多个代理的配置.配置文件包括每个源,接收器和通道,把它们连接在一起, ...

  3. flume hdfs配置详解

    flume采集中HDFS参数解析 就是个备忘录,方便以后直接查阅,不用再网上找了!!!! 配置解析 Flume中的HDFS Sink应该是非常常用的,其中的配置参数也比较多,在这里记录备忘一下. ch ...

  4. Log4j配置详解(转)

    一.Log4j简介 Log4j有三个主要的组件:Loggers(记录器),Appenders (输出源)和Layouts(布局).这里可简单理解为日志类别,日志要输出的地方和日志以何种形式输出.综合使 ...

  5. logback 常用配置详解<appender>

    logback 常用配置详解 <appender> <appender>: <appender>是<configuration>的子节点,是负责写日志的 ...

  6. [转]阿里巴巴数据库连接池 druid配置详解

    一.背景 java程序很大一部分要操作数据库,为了提高性能操作数据库的时候,又不得不使用数据库连接池.数据库连接池有很多选择,c3p.dhcp.proxool等,druid作为一名后起之秀,凭借其出色 ...

  7. libCURL开源库在VS2010环境下编译安装,配置详解

    libCURL开源库在VS2010环境下编译安装,配置详解 转自:http://my.oschina.net/u/1420791/blog/198247 http://blog.csdn.net/su ...

  8. logback配置详解3<filter>

    logback 常用配置详解(三) <filter> <filter>: 过滤器,执行一个过滤器会有返回个枚举值,即DENY,NEUTRAL,ACCEPT其中之一.返回DENY ...

  9. logback配置详解2<appender>

    logback 常用配置详解(二) <appender> <appender>: <appender>是<configuration>的子节点,是负责写 ...

随机推荐

  1. 微服务+异步工作流+ Serverless,Netflix 决定弃用稳定运行 7 年的旧平台

    作者 | Frank San Miguel 策划 | 田晓旭 2021 年,Netflix 会将大部分的工作负载从 Reloaded 转移到 Cosmos 平台.Cosmos 是一个计算平台,它将微服 ...

  2. 题解 CF762D Maximum path

    题目传送门 Description 给出一个 \(3\times n\) 的带权矩阵,选出一个 \((1,1)\to (3,n)\) 的路径使得路径上点权之和最大. \(n\le 10^5\) Sol ...

  3. 洛谷3244 落忆枫音 (拓扑图dp+式子)

    题目大意就是 给你一个DAG 然后添加一条边\(x->y\) ,询问以1为根的生成树的个数 QWQ 首先假设没有添加的边 答案就应该是 \[ans=\prod_{i=1}^{n} in[i] \ ...

  4. Johnson 全源最短路径算法学习笔记

    Johnson 全源最短路径算法学习笔记 如果你希望得到带互动的极简文字体验,请点这里 我们来学习johnson Johnson 算法是一种在边加权有向图中找到所有顶点对之间最短路径的方法.它允许一些 ...

  5. FastAPI 学习之路(二十)接口文档配置相关

    系列文章: FastAPI 学习之路(一)fastapi--高性能web开发框架 FastAPI 学习之路(二) FastAPI 学习之路(三) FastAPI 学习之路(四) FastAPI 学习之 ...

  6. 4.7 80--删除排序数组中的重复项 II

    因为python的list可以直接del List[index],因此直接使用了暴力方法,判断是否重复了两次,是的话直接使用del. 在转向使用Java时,因为暴力方法的局限,一直在找怎样对Java的 ...

  7. Coursera Deep Learning笔记 改善深层神经网络:超参数调试 正则化以及梯度相关

    笔记:Andrew Ng's Deeping Learning视频 参考:https://xienaoban.github.io/posts/41302.html 参考:https://blog.cs ...

  8. NKOJ-4573 Falsita

    问题描述: 到海边了呢...... 如果没有那次选择,现在是不是会好些呢...... 都过去了. 仰望着星空,迎面吹过一阵阵海风,倚靠着护栏,Fine 在海边静静地伫立着,在一个个无际的长夜后,Fin ...

  9. 好的编程习惯是减少bug最有效的方法

    公司来了几个新手,有时候很简单的一个功能模块都要耗费好几天时间,总是在一些不相关的问题上死耗一整天,搞出莫名其妙的问题,找不到具体原因,总是怀疑编译出问题了,系统出问题了,板子出问题了,搞到快下班了叫 ...

  10. 2021CCPC河南省省赛

    大一萌新,第一次打比赛,虽然是线下赛,但送气球的环节还是很赞的! 这里主要是补一下自己的弱项和考试时没有做出来的题目. 1002(链接之后再放,官方还没公开题目...) 先说一下第二题,这个题一看就是 ...