【Flume学习之二】Flume 使用场景

环境
　　apache-flume-1.6.0

一、多agent连接

1、node101配置 option2

    # Name the components on this agent

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = netcat

    a1.sources.r1.bind = node101

    a1.sources.r1.port = 

    # Describe the sink

    # a1.sinks.k1.type = logger

    a1.sinks.k1.type = avro

    a1.sinks.k1.hostname = node102

    a1.sinks.k1.port = 

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity =

    a1.channels.c1.transactionCapacity = 

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

2、node102配置 option1

############################################################

    # Name the components on this agent

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = avro

    a1.sources.r1.bind = node102

    a1.sources.r1.port = 

    # Describe the sink

    a1.sinks.k1.type = logger

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity =

    a1.channels.c1.transactionCapacity = 

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    ############################################################

3、启动顺序
先启动node102-flume,后启动node101-flume，看一下flume启动顺序就知道,要先创建sink，然后创建channel,最后创建source；然后channel连接sink和channel;最后启动channel、sink、source

[root@node102 conf]# flume-ng agent -c /usr/local/apache-flume-1.6.-bin/conf -f /usr/local/apache-flume-1.6.-bin/conf/option1 -n a1 -Dflume.root.logger=INFO,console

[root@node101 conf]# flume-ng agent -c /usr/local/apache-flume-1.6.-bin/conf -f /usr/local/apache-flume-1.6.-bin/conf/option2 -n a1 -Dflume.root.logger=INFO,console

4、测试：在node101 telnet测试，在node102查看输出日志
node101 telnet:

[root@node101 ~]# telnet node101

Trying 192.168.118.101...

Connected to node101.

Escape character is '^]'.

hello world

OK

haha wjy

OK

hi xiaoming

OK

^]

telnet> quit

Connection closed.

[root@node101 ~]#

node102 flume日志：

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body:   6C 6C 6F   6F  6C  0D hello world. }

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body:       6A  0D haha wjy. }

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body:       6F 6D  6E  0D hi xiaoming. }

二、Exec Source
Source类型选择Exec
1、配置 option3

############################################################

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = exec

    a1.sources.r1.command = tail -F /home/flume.exec.log

    # Describe the sink

    a1.sinks.k1.type = logger

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity =

    a1.channels.c1.transactionCapacity = 

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    ############################################################

2、启动

[root@node101 conf]# flume-ng agent -c /usr/local/apache-flume-1.6.-bin/conf -f /usr/local/apache-flume-1.6.-bin/conf/option3 -n a1 -Dflume.root.logger=INFO,console

3、测试

[root@node101 home]# echo "wjy" >> flume.exec.log

[root@node101 home]# echo "hi" >> flume.exec.log

[root@node101 home]# echo "hello wjy" >> flume.exec.log

flume输出：

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body:  6A  wjy }

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body:   hi }

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{} body:   6C 6C 6F   6A  hello wjy }

三、Spooling Directory Source

监测配置的目录下新增的文件，并将文件中的数据读取出来：
1)拷贝到spool目录下的文件不可以再打开编辑；
2) spool目录下不可包含相应的子目录；

1、配置

    ############################################################

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = spooldir

    a1.sources.r1.spoolDir = /home/logs

    a1.sources.r1.fileHeader = true

    # Describe the sink

    a1.sinks.k1.type = logger

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity =

    a1.channels.c1.transactionCapacity = 

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    ############################################################

2、启动

[root@node101 conf]# flume-ng agent -c /usr/local/apache-flume-1.6.-bin/conf -f /usr/local/apache-flume-1.6.-bin/conf/option4 -n a1 -Dflume.root.logger=INFO,console

3、测试

日志目录：/home/logs

[root@node101 home]# cat flume.exec.log

hello

hello

hello

wjy

hi

hello wjy

[root@node101 home]# mkdir logs && mv flume.exec.log ./logs && cd logs && ls

flume.exec.log.COMPLETED

flume输出：

-- ::, (pool--thread-) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile(ReliableSpoolingFileEventReader.java:)] Preparing to move file /home/logs/flume.exec.log to /home/logs/flume.exec.log.COMPLETED

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body:   6C 6C 6F                                  hello }

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body:   6C 6C 6F                                  hello }

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body:   6C 6C 6F                                  hello }

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body:  6A                                         wjy }

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body:                                             hi }

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:)] Event: { headers:{file=/home/logs/flume.exec.log} body:   6C 6C 6F   6A                       hello wjy }

四、日志输出到HDFS

1、配置

############################################################

    a1.sources = r1

    a1.sinks = k1

    a1.channels = c1

    # Describe/configure the source

    a1.sources.r1.type = spooldir

    a1.sources.r1.spoolDir = /home/logs

    a1.sources.r1.fileHeader = true

    # Describe the sink

    ***只修改上一个spool sink的配置代码块 a1.sinks.k1.type = logger

    a1.sinks.k1.type=hdfs

    a1.sinks.k1.hdfs.path=hdfs://node101:8020/flume/%Y-%m-%d/%H%M

    ##每隔60s或者文件大小超过10M的时候产生新文件

    # hdfs有多少条消息时新建文件，0不基于消息个数

    a1.sinks.k1.hdfs.rollCount=

    # hdfs创建多长时间新建文件，0不基于时间

    a1.sinks.k1.hdfs.rollInterval=

    # hdfs多大时新建文件，0不基于文件大小

    a1.sinks.k1.hdfs.rollSize=

    # 当目前被打开的临时文件在该参数指定的时间（秒）内，没有任何数据写入，则将该临时文件关闭并重命名成目标文件

    a1.sinks.k1.hdfs.idleTimeout=

    a1.sinks.k1.hdfs.fileType=DataStream

    a1.sinks.k1.hdfs.useLocalTimeStamp=true

    ## 每五分钟生成一个目录:

    # 是否启用时间上的”舍弃”，这里的”舍弃”，类似于”四舍五入”，后面再介绍。如果启用，则会影响除了%t的其他所有时间表达式

    a1.sinks.k1.hdfs.round=true

    # 时间上进行“舍弃”的值；

    a1.sinks.k1.hdfs.roundValue=

    # 时间上进行”舍弃”的单位，包含：second,minute,hour

    a1.sinks.k1.hdfs.roundUnit=minute

    # Use a channel which buffers events in memory

    a1.channels.c1.type = memory

    a1.channels.c1.capacity =

    a1.channels.c1.transactionCapacity = 

    # Bind the source and sink to the channel

    a1.sources.r1.channels = c1

    a1.sinks.k1.channel = c1

    ############################################################

创建HDFS目录

[root@node101 conf]# hdfs dfs -mkdir /flume

2、启动

[root@node101 conf]# flume-ng agent -c /usr/local/apache-flume-1.6.-bin/conf -f /usr/local/apache-flume-1.6.-bin/conf/option5 -n a1 -Dflume.root.logger=INFO,console

3、测试

制造测试数据：

[root@node101 home]# echo "hello wjy" >> test.log

[root@node101 home]# echo "hello xiaoming" >> test.log

[root@node101 home]# echo "hi xiaowang" >> test.log

[root@node101 home]# cp test.log ./logs

flume执行日志：

-- ::, (pool--thread-) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.readEvents(ReliableSpoolingFileEventReader.java:)] Last read took us just up to a file boundary. Rolling to the next file, if there is one.

-- ::, (pool--thread-) [INFO - org.apache.flume.client.avro.ReliableSpoolingFileEventReader.rollCurrentFile(ReliableSpoolingFileEventReader.java:)] Preparing to move file /home/logs/test.log to /home/logs/test.log.COMPLETED

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:)] Serializer = TEXT, UseRawLocalFileSystem = false

-- ::, (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:)] Creating hdfs://node101:8020/flume/2019-07-01/1845/FlumeData.1561978100198.tmp

-- ::, (hdfs-k1-roll-timer-) [INFO - org.apache.flume.sink.hdfs.BucketWriter$.call(BucketWriter.java:)] Closing idle bucketWriter hdfs://node101:8020/flume/2019-07-01/1845/FlumeData.1561978100198.tmp at 1561978108285

-- ::, (hdfs-k1-roll-timer-) [INFO - org.apache.flume.sink.hdfs.BucketWriter.close(BucketWriter.java:)] Closing hdfs://node101:8020/flume/2019-07-01/1845/FlumeData.1561978100198.tmp

-- ::, (hdfs-k1-call-runner-) [INFO - org.apache.flume.sink.hdfs.BucketWriter$.call(BucketWriter.java:)] Renaming hdfs://node101:8020/flume/2019-07-01/1845/FlumeData.1561978100198.tmp to hdfs://node101:8020/flume/2019-07-01/1845/FlumeData.1561978100198

-- ::, (hdfs-k1-roll-timer-) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink$.run(HDFSEventSink.java:)] Writer callback called.

hdfs文件：

五、其他：
多路日志合并

多路日志输出

【Flume学习之二】Flume 使用场景的更多相关文章

【Flume学习之一】Flume简介
环境 apache-flume-1.6.0 Flume是分布式日志收集系统.可以将应用产生的数据存储到任何集中存储器中,比如HDFS,HBase:同类工具:Facebook Scribe,Apache ...
Flume 学习笔记之 Flume NG概述及单节点安装
Flume NG概述: Flume NG是一个分布式,高可用,可靠的系统,它能将不同的海量数据收集,移动并存储到一个数据存储系统中.轻量,配置简单,适用于各种日志收集,并支持 Failover和负载均 ...
Flume 学习笔记之 Flume NG+Kafka整合
Flume NG集群+Kafka集群整合: 修改Flume配置文件(flume-kafka-server.conf),让Sink连上Kafka hadoop1: #set Agent name a1. ...
Flume 学习笔记之 Flume NG高可用集群搭建
Flume NG高可用集群搭建: 架构总图: 架构分配: 角色 Host 端口 agent1 hadoop3 52020 collector1 hadoop1 52020 collector2 had ...
分布式实时日志系统（二）环境搭建之 flume 集群搭建/flume ng资料
最近公司业务数据量越来越大,以前的基于消息队列的日志系统越来越难以满足目前的业务量,表现为消息积压,日志延迟,日志存储日期过短,所以,我们开始着手要重新设计这块,业界已经有了比较成熟的流程,即基于流式 ...
Flume学习之路（一）Flume的基础介绍
一.背景 Hadoop业务的整体开发流程: 从Hadoop的业务开发流程图中可以看出,在大数据的业务处理过程中,对于数据的采集是十分重要的一步,也是不可避免的一步. 许多公司的平台每天会产生大量的日志 ...
Flume学习总结
Flume学习总结 flume是一个用来采集数据的软件,它可以从数据源采集数据到一个集中存放的地方. 最常用flume的数据采集场景是对日志的采集,不过,lume也可以用来采集其他的各种各样的数据,因 ...
flume学习以及ganglia(若是要监控hive日志，hive存放在/tmp/hadoop/hive.log里，只要运行过hive就会有)
python3.6hdfs的使用 https://blog.csdn.net/qq_29863961/article/details/80291654 https://pypi.org/ 官网直接搜 ...
flume学习安装
近期项目组有需求点击流日志须要自己收集,学习了一下flume而且成功安装了.相关信息记录一下. 1)下载flume1.5版本号 wget http://www.apache.org/dyn/clos ...

随机推荐

Spring Cloud Zuul网关(快速搭建)
zuul 是netflix开源的一个API Gateway 服务器, 本质上是一个web servlet应用. 在云平台上提供动态路由,监控,弹性,安全等边缘服务的框架.相当于是设备和 Netflix ...
第七篇：ORM框架SQLAlchemy
阅读目录一介绍二创建表三增删改查四其他查询相关五正查.反查一介绍 SQLAlchemy是Python编程语言下的一款ORM框架,该框架建立在数据库API之上,使用关系对象映射进 ...
JS关闭当前窗口
function logOut() { $('#logging-out').on('click', function () { stopPreventDefault(); $.messager.con ...
ibatis 中动态SQL查询和动态标签嵌套的使用
ibatis 动态查询对于从事 Java EE 的开发人员来说,iBatis 是一个再熟悉不过的持久层框架了,在 Hibernate.JPA 这样的一站式对象 / 关系映射(O/R Mapping)解 ...
RedisTemplate在项目中的应用
如下主要通去年无聊做的 "涂涂影院后台管理系统" 一个 demo,看 RedisTemplate 的使用. 体验地址:http://video.71xun.com:8080 账户 ...
英语听力，如何成为更好的交谈着https://www.bilibili.com/video/av4279405?from=search&seid=5889429711390689339
and how many of you know at least one person that you because you just do not want to talk to them.y ...
LeetCode 958. Check Completeness of a Binary Tree
原题链接在这里:https://leetcode.com/problems/check-completeness-of-a-binary-tree/ 题目: Given a binary tree, ...
solidworks 学习（四）
旋钮三维建模
小程序支付及H5支付前端代码小结
小程序支付和H5支付前端都不需要引入其他的js , 只需要后台将相关的参数 ( timeStamp: '', nonceStr: '', package: '', signType: 'MD5', p ...
Python爬虫 | Beautifulsoup解析html页面
引入大多数情况下的需求,我们都会指定去使用聚焦爬虫,也就是爬取页面中指定部分的数据值,而不是整个页面的数据.因此,在聚焦爬虫中使用数据解析.所以,我们的数据爬取的流程为: 指定url 基于reque ...

【Flume学习之二】Flume 使用场景

【Flume学习之二】Flume 使用场景的更多相关文章

随机推荐

热门专题