Flume(四)【配置文件总结】

一.Agent
二.Source
- taildir
- arvo
- netstat
- exec
- spooldir
三.Sink
四.Channel
五.组件绑定
六.自定义拦截器和channle选择器
七.负载均衡和故障转移
八.启动flume

Agent的配置文件最好根据Flume的拓扑架构，依次写好每个节点的配置文件；

一.Agent

开头都是先要定义agent，sorce，channel，sink名

# Name the components on this agent（ 描述这个Agent，给各个组件取名字）

a1.sources = r1

a1.sinks = k1 k2

a1.channels = c1 c2

二.Source

taildir

# Describe/configure the source

a1.sources.r3.type = TAILDIR

#维护这每个文件读取到的最新的位置

a1.sources.r3.positionFile = /opt/module/flume/tail_dir.json

#可配置多目录

a1.sources.r3.filegroups = f1 f2

#正则匹配文件名

a1.sources.r3.filegroups.f1 = /opt/module/flume/files/.*file.*

a1.sources.r3.filegroups.f2 = /opt/module/flume/files/.*log.*

arvo

# Describe/configure the source

# source端的avro是一个数据接收服务

a1.sources.r1.type = avro

#接收的主机

a1.sources.r1.bind = hadoop102

#要和上级的avro的sink的端口一致

a1.sources.r1.port = 4141

netstat

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

exec

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /opt/module/hive/logs/hive.log

a1.sources.r1.shell = /bin/bash -c

spooldir

# Describe/configure the source

a1.sources.r3.type = spooldir

# 指定文件夹

a1.sources.r3.spoolDir = /opt/module/flume/upload

#指定文件上传后的后缀

a1.sources.r3.fileSuffix = .COMPLETED

a1.sources.r3.fileHeader = true

#忽略所有以.tmp结尾的文件，不上传

a1.sources.r3.ignorePattern = ([^ ]*\.tmp)

三.Sink

hdfs

参考:HDFS Sink参数说明

# Describe the sink

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://hadoop102:8020/flume/%Y%m%d/%H

#上传文件的前缀

a1.sinks.k1.hdfs.filePrefix = logs-

#是否对时间戳取整

a1.sinks.k1.hdfs.round = true

#多少时间单位创建一个新的文件夹

a1.sinks.k1.hdfs.roundValue = 1

#创建文件夹的时间单位

a1.sinks.k1.hdfs.roundUnit = day

#是否使用本地时间戳，如果设置为false，会将Event的timestamp中作为时间输入。参考：数仓(数据采集)

a1.sinks.k1.hdfs.useLocalTimeStamp = true

#积攒多少个Event才flush到HDFS一次

a1.sinks.k1.hdfs.batchSize = 100

#设置文件类型，可支持压缩

a1.sinks.k1.hdfs.fileType = DataStream

#多久生成一个新的文件，单位:s，未滚动时文件的格式是.tmp结尾

a1.sinks.k1.hdfs.rollInterval = 3600

#设置每个文件的滚动大小,一般略小于128M

a1.sinks.k1.hdfs.rollSize = 134217700

#文件的滚动与Event数量无关

a1.sinks.k1.hdfs.rollCount = 0

## 控制输出文件是原生文件。LZO压缩

a1.sinks.k1.hdfs.fileType = CompressedStream

a1.sinks.k1.hdfs.codeC = lzop

kafka(待续)

hbase(待续)

arvo

# Describe the sink

# sink端的avro是一个数据发送者

a1.sinks.k1.type = avro

#发送的目的主机ip

a1.sinks.k1.hostname = hadoop102

a1.sinks.k1.port = 4141

logger

# Describe the sink

a1.sinks.k1.type = logger

本地目录(file_roll)

# Describe the sink

a3.sinks.k1.type = file_roll

a3.sinks.k1.sink.directory = /opt/module/datas/flume3

注意：输出的本地目录必须是已经存在的目录，如果该目录不存在，并不会创建新的目录。

四.Channel

有memory、file、kafka channel三种模式。其中kafka channel不需要配置sink，下游需要用kafka source对接

可以参考:flume->kafka->flume

# Describe the channel

#channel的类型为memory或者file

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

五.组件绑定

# Bind the source and sink to the channel

#组件绑定，1个source，2个channel

a1.sources.r1.channels = c1 c2

a1.sinks.k1.channel = c1

a1.sinks.k2.channel = c2

六.自定义拦截器和channle选择器

channel选择器有两种：replicating(默认)，multiplexing

a1.sources.r1.interceptors = i1

#自定义拦截器的全类名

a1.sources.r1.interceptors.i1.type = com.atguigu.interceptor.TypeInterceptor$Builder

#channel选择器选用multiplexing类型

a1.sources.r1.selector.type = multiplexing

a1.sources.r1.selector.header = type

a1.sources.r1.selector.mapping.hello = c1

a1.sources.r1.selector.mapping.nohello = c2

七.负载均衡和故障转移

# Name the components on this agent

a1.sources = r1

a1.channels = c1

#添加sink组

a1.sinkgroups = g1

a1.sinks = k1 k2

# Describe/configure the source

a1.sources.r1.type = netcat

a1.sources.r1.bind = localhost

a1.sources.r1.port = 44444

#配置为故障转移(failover)

a1.sinkgroups.g1.processor.type = failover

a1.sinkgroups.g1.processor.priority.k1 = 5

a1.sinkgroups.g1.processor.priority.k2 = 10

a1.sinkgroups.g1.processor.maxpenalty = 10000

# Bind the source and sink to the channel

a1.sources.r1.channels = c1

#sink组的绑定

a1.sinkgroups.g1.sinks = k1 k2

a1.sinks.k1.channel = c1

a1.sinks.k2.channel = c1

八.启动flume

#启动脚本           flume的conf目录   agent名字       执行的配置文件

bin/flume-ng agent -c conf/ -n a1 -f job/flume-netcat-logger.conf

logger 打印控制台

bin/flume-ng agent --conf conf/ --name a1 --conf-file job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

#缩写形式

bin/flume-ng agent -c conf/ -n a1 -f job/flume-netcat-logger.conf -Dflume.root.logger=INFO,console

Flume(四)【配置文件总结】的更多相关文章

WCF编程系列(四)配置文件
WCF编程系列(四)配置文件 .NET应用程序的配置文件前述示例中Host项目中的App.config以及Client项目中的App.config称为应用程序配置文件,通过该文件配置可控制程序的 ...
flume修改配置文件
flume修改配置文件后,flume进程会自动将配置文件更新至服务中,同时会初始化日志,重新对于metrics进行记录的. 所以拿api做监控的同学要注意这点啦
golang开发:类库篇(四)配置文件解析器goconfig的使用
为什么要使用goconfig解析配置文件目前各语言框架对配置文件书写基本都差不多,基本都是首先配置一些基础变量,基本变量里面有环境的配置,然后通过环境变量去获取该环境下的变量.例如,生产环境跟测试环 ...
Flume的四个使用案例
一.Flume监听端口 1,在linux机器上下载telnet工具 yum search telnet yumm install telnet.x86_64 2.编写flume的配置文件,并将文件复制 ...
大数据入门第二十四天——SparkStreaming（二）与flume、kafka整合
前一篇中数据源采用的是从一个socket中拿数据,有点属于“旁门左道”,正经的是从kafka等消息队列中拿数据! 主要支持的source,由官网得知如下: 获取数据的形式包括推送push和拉取pull ...
Flume的安装与配置
Flume的安装与配置一. 资源下载资源地址:http://flume.apache.org/download.html 程序地址:http://apache.fayea.com/fl ...
flume+kafka+storm单机部署
flume-1.6.0 kafka0.9.0.0 storm0.9.6 一.部署flume 1.解压 tar -xzvf apache-flume-1.6.0-bin.tar.gz -C ../app ...
nginx+ flume
nginx 作用: 做负载均衡 nginx和lvs的区别:nginx可以做反向代理 1.上传nginx安装包 tar -zxvf tengine-2.1.02.安装环境依赖 gcc opens ...
Flume的Sink
一.Logger Sink 记录指定级别(比如INFO,DEBUG,ERROR等)的日志,通常用于调试要求,在 --conf(-c )参数指定的目录下有log4j的配置文件根据设计,logger ...

随机推荐

Python SyntaxError: Missing parentheses in call to 'print'
下面的代码 print "hello world" 会出现下面的错误 SyntaxError: Missing parentheses in call to 'print' 因为写 ...
Swift-技巧（一）缩放并填充图片
摘要直接操作图片来实现它的缩放或者填充多余空间,首选 UIGraphicsBeginImageContext 函数来实现,它就相当于一个画布,你甚至可以用它来涂鸦. 最近有一个需求,就是将图片先等比 ...
DeWeb和WebXone的区别
DeWeb和WebXone的区别相同点: 1 两者为同一开发者研发.QQ:45300355,碧树西风 2 都是为了解决Delphi开发Web的问题区别: 1 WebXone采用的ActiveX/N ...
新手使用python以及pycharm看过来
前言随着互联网时代的进步,人类与计算机之前的沟通交流越来越便捷,自此交流的媒介--编程语言吸引力更多的人学习,今天我们就来谈谈当前市面上最火的编程语言 1.文件的概念什么是文件夹.文件其实是操作 ...
PTA 7-2 畅通工程之局部最小花费问题 (35分)
PTA 7-2 畅通工程之局部最小花费问题 (35分) 某地区经过对城镇交通状况的调查,得到现有城镇间快速道路的统计数据,并提出"畅通工程"的目标:使整个地区任何两个城镇间都可以实 ...
【不费脑筋系列】发布个人的代码包到Nuget服务器上，并通过VS引用进行使用的方法
打打酱油,写点不需要费脑筋的博客先压压惊. 下面讲个关于个人如何开发nuget包,并部署到nuget服务器上的例子.为了保证.net framework和 .net core都可以访问到我的包,我 ...
MarkDown学习内容总结
MarkDown学习内容标题使用方法:通过 # 的个数实现多级标题. 举例如下: 一级标题格式为:# + 空格 + 标题名: 二级标题格式为:## + 空格 + 标题名: 三级标题格式为:### ...
剖析虚幻渲染体系（12）- 移动端专题Part 2（GPU架构和机制）
目录 12.4 移动渲染技术要点 12.4.1 Tile-based (Deferred) Rendering 12.4.2 Hierarchical Tiling 12.4.3 Early-Z 12 ...
Python 注释和键盘输入，输出数据格式化
Python中的注释有单行注释和多行注释: Python中单行注释以 # 开头,例如: # 这是一个注释 print("Hello, World!") 多行注释用三个单引号 ''' ...
基于Netty4手把手实现一个带注册中心和注解的Dubbo框架
阅读这篇文章之前,建议先阅读和这篇文章关联的内容. 1. 详细剖析分布式微服务架构下网络通信的底层实现原理(图解) 2. (年薪60W的技巧)工作了5年,你真的理解Netty以及为什么要用吗?(深度干 ...

Flume(四)【配置文件总结】

一.Agent

二.Source

taildir

arvo

netstat

exec

spooldir

三.Sink

hdfs

kafka(待续)

hbase(待续)

arvo

logger

本地目录(file_roll)

四.Channel

五.组件绑定

六.自定义拦截器和channle选择器

七.负载均衡和故障转移

八.启动flume

Flume(四)【配置文件总结】的更多相关文章

随机推荐

热门专题