flume读取日志文件并存储到HDFS
配置hadoop环境
配置flume环境
配置flume文件
D:\Soft\apache-flume-1.8.0-bin\conf
将 flume-conf.properties.template 重新命名为 hdfs.properties
# 组装 agent
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# 配置source:从目录中读取文件
a1.sources.s1.type = spooldir
a1.sources.s1.channels = c1
a1.sources.s1.spoolDir = E:\log2s
# 包括所有日志文件
a1.sources.s1.includePattern=^.*$
# 忽略当前正在写入的日志文件
a1.sources.s1.ignorePattern=^.*log$
a1.sources.s1.deletePolicy=never
a1.sources.s1.fileHeader = true
## 增加时间header
a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=timestamp
# 配置channel:缓存到文件中
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 1000
# 配置sink:保存到hdfs中
a1.sinks.k1.channel=c1
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://127.0.0.1:9000/flume/accesslog/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix=logs
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.hdfs.rollSize=0
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.batchSize=100
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.minBlockReplicas=1
flume启动命令
flume-ng agent --conf conf --conf-file ../conf/hdfs.properties --name a1
编写日志java程序
public class App
{
protected static final Logger logger = Logger.getLogger(App.class); public static void main( String[] args )
{
while (true) {
logger.info("hello world:"+ String.valueOf(new Date().getTime()));
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
log4j配置
### set log levels ###
log4j.rootLogger=INFO, stdout, file
### stdout ###
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Threshold=INFO
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %c{1} [%p] %m%n
### file ###
log4j.appender.file=org.apache.log4j.DailyRollingFileAppender
# 日志路径
log4j.appender.file.file=E:/log2s/log.log
log4j.appender.file.Threshold=INFO
log4j.appender.file.Append=true
# 每分钟生成1个新文件
log4j.appender.file.DatePattern='.'yyyy-MM-dd-HH-mm
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %c{1} [%p] %m%n
启动java程序生成日志

flume执行结果
07/24 17:19:27 INFO node.Application: Starting Channel c1
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
07/24 17:19:27 INFO node.Application: Starting Sink k1
07/24 17:19:27 INFO node.Application: Starting Source s1
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
07/24 17:19:27 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: E:log2s
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: s1: Successfully registered new MBean.
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: s1 started
07/24 17:19:28 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
07/24 17:19:28 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file E:\log2s\log.log.2018-07-24-16-46 to E:\log2s\log.log.2018-07-24-16-46.COMPLETED
07/24 17:19:28 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
07/24 17:19:28 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file E:\log2s\log.log.2018-07-24-16-47 to E:\log2s\log.log.2018-07-24-16-47.COMPLETED
07/24 17:19:28 INFO hdfs.HDFSSequenceFile: writeFormat = Text, UseRawLocalFileSystem = false
07/24 17:19:28 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532423968027.tmp
07/24 17:19:39 INFO hdfs.BucketWriter: Closing hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532423968027.tmp
07/24 17:19:39 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532423968027.tmp to hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532423968027
07/24 17:19:39 INFO hdfs.HDFSEventSink: Writer callback called.
07/24 17:19:59 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
07/24 17:19:59 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file E:\log2s\log.log.2018-07-24-16-48 to E:\log2s\log.log.2018-07-24-16-48.COMPLETED
07/24 17:20:00 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
07/24 17:20:00 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file E:\log2s\log.log.2018-07-24-17-19 to E:\log2s\log.log.2018-07-24-17-19.COMPLETED
07/24 17:20:02 INFO hdfs.HDFSSequenceFile: writeFormat = Text, UseRawLocalFileSystem = false
07/24 17:20:02 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424002903.tmp
07/24 17:20:13 INFO hdfs.BucketWriter: Closing hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424002903.tmp
07/24 17:20:13 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424002903.tmp to hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424002903
07/24 17:20:13 INFO hdfs.HDFSEventSink: Writer callback called.
07/24 17:21:00 INFO hdfs.HDFSSequenceFile: writeFormat = Text, UseRawLocalFileSystem = false
07/24 17:21:00 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
07/24 17:21:00 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file E:\log2s\log.log.2018-07-24-17-20 to E:\log2s\log.log.2018-07-24-17-20.COMPLETED
07/24 17:21:00 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424060382.tmp
07/24 17:21:10 INFO hdfs.BucketWriter: Closing hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424060382.tmp
07/24 17:21:10 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424060382.tmp to hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424060382
07/24 17:21:10 INFO hdfs.HDFSEventSink: Writer callback called.
HDFS目录

flume读取日志文件并存储到HDFS的更多相关文章
- 大数据学习day20-----spark03-----RDD编程实战案例(1 计算订单分类成交金额,2 将订单信息关联分类信息,并将这些数据存入Hbase中,3 使用Spark读取日志文件,根据Ip地址,查询地址对应的位置信息
1 RDD编程实战案例一 数据样例 字段说明: 其中cid中1代表手机,2代表家具,3代表服装 1.1 计算订单分类成交金额 需求:在给定的订单数据,根据订单的分类ID进行聚合,然后管理订单分类名称, ...
- Java实时读取日志文件
古怪的需求 在实习的公司碰到一个古怪的需求:在一台服务器上写日志文件,每当日志文件写到一定大小时,比如是1G,会将这个日志文件改名成另一个名字,并新建一个与原文件名相同的日志文件,再往这个新建的日志文 ...
- 读取日志文件,搜索关键字,打印关键字前5行。yield、deque实例
from collections import deque def search(lines, pattern, history=5): previous_lines = deque(maxlen=h ...
- Flume采集处理日志文件
Flume简介 Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据:同时,Flume提供对数据 ...
- Docker 搭建 ELK 读取微服务项目的日志文件
思路: 在docker搭建elasticsearch与kibana来展示日志,在微服务部署的机子上部署logstash来收集日志传到elasticsearch中,通过kibana来展示,logstas ...
- Flume 自定义拦截器 多行读取日志+截断
前言: Flume百度定义如下: Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据:同时,F ...
- SCCM2007日志文件
Microsoft System Center Configuration Manager 2007 中的所有客户端和站点服务器组件都将过程信息记录在单个日志文件中.您可以使用客户端和站点服务器日志文 ...
- logback.xml日志文件配置
放在resources目录下面就可以自动读取<?xml version="1.0" encoding="UTF-8"?> <configura ...
- Django实现web端tailf日志文件
这是Django Channels系列文章的第二篇,以web端实现tailf的案例讲解Channels的具体使用以及跟Celery的结合 通过上一篇<Django使用Channels实现WebS ...
随机推荐
- 高级搜索树-AVL树
目录 平衡因子 AVL树节点和AVL树的定义 失衡调整 插入和删除操作 完整源码 AVL树是平衡二叉搜索树中的一种,在渐进意义下,AVL树可以将高度始终控制在O(log n) 以内,以保证每次查找.插 ...
- sql 语句(精品)
GROUP BY: select avg(latency),projectName,data_trunc('hour'm\,_time_) as hour group by projectName,h ...
- 听说同学你搞不懂Java的LinkedHashMap,可笑
先看再点赞,给自己一点思考的时间,微信搜索[沉默王二]关注这个有颜值却假装靠才华苟且的程序员.本文 GitHub github.com/itwanger 已收录,里面还有我精心为你准备的一线大厂面试题 ...
- 封装react antd的表格table组件
封装组件是为了能在开发过程中高度复用功能和样式相似的组件,以便我们只关注于业务逻辑层的处理,提高开发效率,提高逼格,降低代码重复率,降低劳动时间,减少加班的可能. 本次组件的封装采用了函数式组件即无状 ...
- 【luogu1352】没有上司的舞会 - 树形DP
题目描述 某大学有N个职员,编号为1~N.他们之间有从属关系,也就是说他们的关系就像一棵以校长为根的树,父结点就是子结点的直接上司.现在有个周年庆宴会,宴会每邀请来一个职员都会增加一定的快乐指数Ri, ...
- 【算法•日更•第十期】树型动态规划&区间动态规划:加分二叉树题解
废话不多说,直接上题: 1580:加分二叉树 时间限制: 1000 ms 内存限制: 524288 KB提交数: 121 通过数: 91 [题目描述] 原题来自:NOIP 20 ...
- HYSBZ-1045 糖果传递
有n个小朋友坐成一圈,每人有ai个糖果.每人只能给左右两人传递糖果.每人每次传递一个糖果代价为1. 假设当所有人获得均等的糖果的时候: 每个人手上的糖果的数量为\(ave\) 第\(i\)个人初始时的 ...
- C/C++经典面试题1,const关键字用法总结
本文主要说明了const关键字的作用,包括了用于对数组,指针与类相关的修饰方法,作为笔记总结使用.若有错误与不足,欢迎指正. const关键字 用于修饰一个常类型,常类型的变量或对象的值无法被改变,即 ...
- Nginx反向代理的使用
一.Nginx的基本命令 nginx:启动 nginx nginx -t :测试配置文件是否有语法错误 nginx -s reopen:重启Nginx nginx -s reload:重新加载Ngin ...
- ZERO:新手应该如何学习SEO优化
http://www.wocaoseo.com/thread-325-1-1.html 有一个10000小时理论,说是在各行各业,想成为大师级的人物就要付出10000小时的努力,在SEO这边也是如此. ...