【Flume学习之二】Flume 使用场景
环境
apache-flume-1.6.0
一、多agent连接

1、node101配置 option2
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = node101
a1.sources.r1.port = # Describe the sink
# a1.sinks.k1.type = logger
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = node102
a1.sinks.k1.port = # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity = # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
2、node102配置 option1
############################################################
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = node102
a1.sources.r1.port = # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity = # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
############################################################
3、启动顺序
先启动node102-flume,后启动node101-flume,看一下flume启动顺序就知道,要先创建sink,然后创建channel,最后创建source;然后channel连接sink和channel;最后启动channel、sink、source
[root@node102 conf]# flume-ng agent -c /usr/local/apache-flume-1.6.-bin/conf -f /usr/local/apache-flume-1.6.-bin/conf/option1 -n a1 -Dflume.root.logger=INFO,console
[root@node101 conf]# flume-ng agent -c /usr/local/apache-flume-1.6.-bin/conf -f /usr/local/apache-flume-1.6.-bin/conf/option2 -n a1 -Dflume.root.logger=INFO,console
4、测试:在node101 telnet测试,在node102查看输出日志
node101 telnet:
[root@node101 ~]# telnet node101
Trying 192.168.118.101...
Connected to node101.
Escape character is '^]'.
hello world
OK
haha wjy
OK
hi xiaoming
OK
^]
telnet> quit
Connection closed.
[root@node101 ~]#
node102 flume日志:
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body: 6C 6C 6F 6F 6C 0D hello world. }
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body: 6A 0D haha wjy. }
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body: 6F 6D 6E 0D hi xiaoming. }
二、Exec Source
Source类型选择Exec
1、配置 option3
############################################################
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/flume.exec.log # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity = # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
############################################################
2、启动
[root@node101 conf]# flume-ng agent -c /usr/local/apache-flume-1.6.-bin/conf -f /usr/local/apache-flume-1.6.-bin/conf/option3 -n a1 -Dflume.root.logger=INFO,console
3、测试
[root@node101 home]# echo "wjy" >> flume.exec.log
[root@node101 home]# echo "hi" >> flume.exec.log
[root@node101 home]# echo "hello wjy" >> flume.exec.log
flume输出:
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body: 6A wjy }
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body: hi }
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body: 6C 6C 6F 6A hello wjy }
三、Spooling Directory Source
监测配置的目录下新增的文件,并将文件中的数据读取出来:
1)拷贝到spool目录下的文件不可以再打开编辑;
2) spool目录下不可包含相应的子目录;
1、配置
############################################################
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/logs
a1.sources.r1.fileHeader = true # Describe the sink
a1.sinks.k1.type = logger # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity = # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
############################################################
2、启动
[root@node101 conf]# flume-ng agent -c /usr/local/apache-flume-1.6.-bin/conf -f /usr/local/apache-flume-1.6.-bin/conf/option4 -n a1 -Dflume.root.logger=INFO,console
3、测试
日志目录:/home/logs
[root@node101 home]# cat flume.exec.log
hello
hello
hello
wjy
hi
hello wjy
[root@node101 home]# mkdir logs && mv flume.exec.log ./logs && cd logs && ls
flume.exec.log.COMPLETED
flume输出:
-- ::, (pool--thread-) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile(ReliableSpoolingFileEventReader.java:)] Preparing to move file /home/logs/flume.exec.log to /home/logs/flume.exec.log.COMPLETED
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body: 6C 6C 6F hello }
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body: 6C 6C 6F hello }
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body: 6C 6C 6F hello }
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body: 6A wjy }
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body: hi }
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body: 6C 6C 6F 6A hello wjy }
四、日志输出到HDFS
1、配置
############################################################
a1.sources = r1
a1.sinks = k1
a1.channels = c1 # Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/logs
a1.sources.r1.fileHeader = true # Describe the sink
***只修改上一个spool sink的配置代码块 a1.sinks.k1.type = logger
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://node101:8020/flume/%Y-%m-%d/%H%M ##每隔60s或者文件大小超过10M的时候产生新文件
# hdfs有多少条消息时新建文件,0不基于消息个数
a1.sinks.k1.hdfs.rollCount=
# hdfs创建多长时间新建文件,0不基于时间
a1.sinks.k1.hdfs.rollInterval=
# hdfs多大时新建文件,0不基于文件大小
a1.sinks.k1.hdfs.rollSize=
# 当目前被打开的临时文件在该参数指定的时间(秒)内,没有任何数据写入,则将该临时文件关闭并重命名成目标文件
a1.sinks.k1.hdfs.idleTimeout= a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.useLocalTimeStamp=true ## 每五分钟生成一个目录:
# 是否启用时间上的”舍弃”,这里的”舍弃”,类似于”四舍五入”,后面再介绍。如果启用,则会影响除了%t的其他所有时间表达式
a1.sinks.k1.hdfs.round=true
# 时间上进行“舍弃”的值;
a1.sinks.k1.hdfs.roundValue=
# 时间上进行”舍弃”的单位,包含:second,minute,hour
a1.sinks.k1.hdfs.roundUnit=minute # Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity =
a1.channels.c1.transactionCapacity = # Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
############################################################
创建HDFS目录
[root@node101 conf]# hdfs dfs -mkdir /flume
2、启动
[root@node101 conf]# flume-ng agent -c /usr/local/apache-flume-1.6.-bin/conf -f /usr/local/apache-flume-1.6.-bin/conf/option5 -n a1 -Dflume.root.logger=INFO,console
3、测试
制造测试数据:
[root@node101 home]# echo "hello wjy" >> test.log
[root@node101 home]# echo "hello xiaoming" >> test.log
[root@node101 home]# echo "hi xiaowang" >> test.log
[root@node101 home]# cp test.log ./logs
flume执行日志:
-- ::, (pool--thread-) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:)] Last read took us just up to a file boundary. Rolling to the next file, if there is one.
-- ::, (pool--thread-) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile(ReliableSpoolingFileEventReader.java:)] Preparing to move file /home/logs/test.log to /home/logs/test.log.COMPLETED
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:)] Serializer = TEXT, UseRawLocalFileSystem = false
-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:)] Creating hdfs://node101:8020/flume/2019-07-01/1845/FlumeData.1561978100198.tmp
-- ::, (hdfs-k1-roll-timer-) [INFO - org.apache.flume.sink.hdfs.BucketWriter$.call(BucketWriter.java:)] Closing idle bucketWriter hdfs://node101:8020/flume/2019-07-01/1845/FlumeData.1561978100198.tmp at 1561978108285
-- ::, (hdfs-k1-roll-timer-) [INFO - org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:)] Closing hdfs://node101:8020/flume/2019-07-01/1845/FlumeData.1561978100198.tmp
-- ::, (hdfs-k1-call-runner-) [INFO - org.apache.flume.sink.hdfs.BucketWriter$.call(BucketWriter.java:)] Renaming hdfs://node101:8020/flume/2019-07-01/1845/FlumeData.1561978100198.tmp to hdfs://node101:8020/flume/2019-07-01/1845/FlumeData.1561978100198
-- ::, (hdfs-k1-roll-timer-) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink$.run(HDFSEventSink.java:)] Writer callback called.
hdfs文件:

五、其他:
多路日志合并

多路日志输出

【Flume学习之二】Flume 使用场景的更多相关文章
- 【Flume学习之一】Flume简介
环境 apache-flume-1.6.0 Flume是分布式日志收集系统.可以将应用产生的数据存储到任何集中存储器中,比如HDFS,HBase:同类工具:Facebook Scribe,Apache ...
- Flume 学习笔记之 Flume NG概述及单节点安装
Flume NG概述: Flume NG是一个分布式,高可用,可靠的系统,它能将不同的海量数据收集,移动并存储到一个数据存储系统中.轻量,配置简单,适用于各种日志收集,并支持 Failover和负载均 ...
- Flume 学习笔记之 Flume NG+Kafka整合
Flume NG集群+Kafka集群整合: 修改Flume配置文件(flume-kafka-server.conf),让Sink连上Kafka hadoop1: #set Agent name a1. ...
- Flume 学习笔记之 Flume NG高可用集群搭建
Flume NG高可用集群搭建: 架构总图: 架构分配: 角色 Host 端口 agent1 hadoop3 52020 collector1 hadoop1 52020 collector2 had ...
- 分布式实时日志系统(二) 环境搭建之 flume 集群搭建/flume ng资料
最近公司业务数据量越来越大,以前的基于消息队列的日志系统越来越难以满足目前的业务量,表现为消息积压,日志延迟,日志存储日期过短,所以,我们开始着手要重新设计这块,业界已经有了比较成熟的流程,即基于流式 ...
- Flume学习之路 (一)Flume的基础介绍
一.背景 Hadoop业务的整体开发流程: 从Hadoop的业务开发流程图中可以看出,在大数据的业务处理过程中,对于数据的采集是十分重要的一步,也是不可避免的一步. 许多公司的平台每天会产生大量的日志 ...
- Flume学习总结
Flume学习总结 flume是一个用来采集数据的软件,它可以从数据源采集数据到一个集中存放的地方. 最常用flume的数据采集场景是对日志的采集,不过,lume也可以用来采集其他的各种各样的数据,因 ...
- flume学习以及ganglia(若是要监控hive日志,hive存放在/tmp/hadoop/hive.log里,只要运行过hive就会有)
python3.6hdfs的使用 https://blog.csdn.net/qq_29863961/article/details/80291654 https://pypi.org/ 官网直接搜 ...
- flume学习安装
近期项目组有需求点击流日志须要自己收集,学习了一下flume而且成功安装了.相关信息记录一下. 1)下载flume1.5版本号 wget http://www.apache.org/dyn/clos ...
随机推荐
- IntelliJ IDEA 2019.2破解
IntelliJ IDEA 2019.2破解 我是参考这个激活的,使用的激活码的方式,需要在百度云盘下载压缩包 https://zhile.io/2018/08/25/jetbrains-licens ...
- FTP服务FileZilla Server上传提示550 Permission denied
原文地址:https://help.aliyun.com/knowledge_detail/5989224.html 相关文章 1.filezilla通过root账户远程连接管理ubuntu serv ...
- python的整数相除
在python2中: 10/4=2 在python3中: 10/4=2.5 10//4=2
- php正则表达示的定界符
在学习正则表达示前,我们先要来学习正则表达示的定界符. 定界符,就是定一个边界,边界已内的就是正则表达示. PHP的正则表达示定界符的规定如下: 定界符,不能用a-zA-Z0-9\ 其他的都可以用.必 ...
- 将idea中xml文件背景颜色去除(转)
原贴链接:https://blog.csdn.net/weixin_43215250/article/details/89403678 第一步:除去SQL代码块的背景颜色,步骤如下 设置后还是很影响视 ...
- proxysql 学习一 proxysql docker 运行试用
proxysql 是一个比较强大的mysql proxy 服务,支持动态mysql 实例调整,查询重写,查询cache,监控,数据镜像,读写分离 以及ha,最近已经发布了2.0 ,很值得试用下 环境准 ...
- nodejs+nvm历史版本
官网:http://nodejs.org/dist/ 淘宝镜像:https://npm.taobao.org/mirrors/node/ nvm历史版本:https://github.com/core ...
- 2016级移动应用开发在线测试14-MediaPlayer
有趣有内涵的文章第一时间送达! 喝酒I创作I分享 生活中总有些东西值得分享 @醉翁猫咪 1. MediaStore类是android系统提供的一个多媒体数据库,android中多媒体信息都可以从这里提 ...
- hdfs、yarn集成ranger
一.安装hdfs插件 从源码安装ranger的服务器上拷贝hdfs的插件到你需要安装的地方 1.解压安装 # tar zxvf ranger-2.1.0-hdfs-plugin.tar.gz -C / ...
- python3 Paramiko模块学习
简介 ssh是一个协议,OpenSSH是其中一个开源实现,paramiko是Python的一个库,实现了SSHv2协议(底层使用cryptography). 有了Paramiko以后,我们就可以在Py ...