flume读取日志文件并存储到HDFS
配置hadoop环境
配置flume环境
配置flume文件
D:\Soft\apache-flume-1.8.0-bin\conf
将 flume-conf.properties.template 重新命名为 hdfs.properties
# 组装 agent
a1.sources = s1
a1.channels = c1
a1.sinks = k1
# 配置source:从目录中读取文件
a1.sources.s1.type = spooldir
a1.sources.s1.channels = c1
a1.sources.s1.spoolDir = E:\log2s
# 包括所有日志文件
a1.sources.s1.includePattern=^.*$
# 忽略当前正在写入的日志文件
a1.sources.s1.ignorePattern=^.*log$
a1.sources.s1.deletePolicy=never
a1.sources.s1.fileHeader = true
## 增加时间header
a1.sources.s1.interceptors=i1
a1.sources.s1.interceptors.i1.type=timestamp
# 配置channel:缓存到文件中
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 1000
# 配置sink:保存到hdfs中
a1.sinks.k1.channel=c1
a1.sinks.k1.type=hdfs
a1.sinks.k1.hdfs.path=hdfs://127.0.0.1:9000/flume/accesslog/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix=logs
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.hdfs.rollSize=0
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.batchSize=100
a1.sinks.k1.hdfs.writeFormat=Text
a1.sinks.k1.hdfs.minBlockReplicas=1
flume启动命令
flume-ng agent --conf conf --conf-file ../conf/hdfs.properties --name a1
编写日志java程序
public class App
{
protected static final Logger logger = Logger.getLogger(App.class); public static void main( String[] args )
{
while (true) {
logger.info("hello world:"+ String.valueOf(new Date().getTime()));
try {
Thread.sleep(500);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
log4j配置
### set log levels ###
log4j.rootLogger=INFO, stdout, file
### stdout ###
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.Threshold=INFO
log4j.appender.stdout.Target=System.out
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %c{1} [%p] %m%n
### file ###
log4j.appender.file=org.apache.log4j.DailyRollingFileAppender
# 日志路径
log4j.appender.file.file=E:/log2s/log.log
log4j.appender.file.Threshold=INFO
log4j.appender.file.Append=true
# 每分钟生成1个新文件
log4j.appender.file.DatePattern='.'yyyy-MM-dd-HH-mm
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %c{1} [%p] %m%n
启动java程序生成日志

flume执行结果
07/24 17:19:27 INFO node.Application: Starting Channel c1
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
07/24 17:19:27 INFO node.Application: Starting Sink k1
07/24 17:19:27 INFO node.Application: Starting Source s1
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
07/24 17:19:27 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: E:log2s
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: s1: Successfully registered new MBean.
07/24 17:19:27 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: s1 started
07/24 17:19:28 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
07/24 17:19:28 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file E:\log2s\log.log.2018-07-24-16-46 to E:\log2s\log.log.2018-07-24-16-46.COMPLETED
07/24 17:19:28 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
07/24 17:19:28 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file E:\log2s\log.log.2018-07-24-16-47 to E:\log2s\log.log.2018-07-24-16-47.COMPLETED
07/24 17:19:28 INFO hdfs.HDFSSequenceFile: writeFormat = Text, UseRawLocalFileSystem = false
07/24 17:19:28 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532423968027.tmp
07/24 17:19:39 INFO hdfs.BucketWriter: Closing hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532423968027.tmp
07/24 17:19:39 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532423968027.tmp to hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532423968027
07/24 17:19:39 INFO hdfs.HDFSEventSink: Writer callback called.
07/24 17:19:59 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
07/24 17:19:59 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file E:\log2s\log.log.2018-07-24-16-48 to E:\log2s\log.log.2018-07-24-16-48.COMPLETED
07/24 17:20:00 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
07/24 17:20:00 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file E:\log2s\log.log.2018-07-24-17-19 to E:\log2s\log.log.2018-07-24-17-19.COMPLETED
07/24 17:20:02 INFO hdfs.HDFSSequenceFile: writeFormat = Text, UseRawLocalFileSystem = false
07/24 17:20:02 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424002903.tmp
07/24 17:20:13 INFO hdfs.BucketWriter: Closing hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424002903.tmp
07/24 17:20:13 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424002903.tmp to hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424002903
07/24 17:20:13 INFO hdfs.HDFSEventSink: Writer callback called.
07/24 17:21:00 INFO hdfs.HDFSSequenceFile: writeFormat = Text, UseRawLocalFileSystem = false
07/24 17:21:00 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
07/24 17:21:00 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file E:\log2s\log.log.2018-07-24-17-20 to E:\log2s\log.log.2018-07-24-17-20.COMPLETED
07/24 17:21:00 INFO hdfs.BucketWriter: Creating hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424060382.tmp
07/24 17:21:10 INFO hdfs.BucketWriter: Closing hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424060382.tmp
07/24 17:21:10 INFO hdfs.BucketWriter: Renaming hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424060382.tmp to hdfs://127.0.0.1:9000/flume/accesslog/2018-07-24/logs.1532424060382
07/24 17:21:10 INFO hdfs.HDFSEventSink: Writer callback called.
HDFS目录

flume读取日志文件并存储到HDFS的更多相关文章
- 大数据学习day20-----spark03-----RDD编程实战案例(1 计算订单分类成交金额,2 将订单信息关联分类信息,并将这些数据存入Hbase中,3 使用Spark读取日志文件,根据Ip地址,查询地址对应的位置信息
1 RDD编程实战案例一 数据样例 字段说明: 其中cid中1代表手机,2代表家具,3代表服装 1.1 计算订单分类成交金额 需求:在给定的订单数据,根据订单的分类ID进行聚合,然后管理订单分类名称, ...
- Java实时读取日志文件
古怪的需求 在实习的公司碰到一个古怪的需求:在一台服务器上写日志文件,每当日志文件写到一定大小时,比如是1G,会将这个日志文件改名成另一个名字,并新建一个与原文件名相同的日志文件,再往这个新建的日志文 ...
- 读取日志文件,搜索关键字,打印关键字前5行。yield、deque实例
from collections import deque def search(lines, pattern, history=5): previous_lines = deque(maxlen=h ...
- Flume采集处理日志文件
Flume简介 Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据:同时,Flume提供对数据 ...
- Docker 搭建 ELK 读取微服务项目的日志文件
思路: 在docker搭建elasticsearch与kibana来展示日志,在微服务部署的机子上部署logstash来收集日志传到elasticsearch中,通过kibana来展示,logstas ...
- Flume 自定义拦截器 多行读取日志+截断
前言: Flume百度定义如下: Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据:同时,F ...
- SCCM2007日志文件
Microsoft System Center Configuration Manager 2007 中的所有客户端和站点服务器组件都将过程信息记录在单个日志文件中.您可以使用客户端和站点服务器日志文 ...
- logback.xml日志文件配置
放在resources目录下面就可以自动读取<?xml version="1.0" encoding="UTF-8"?> <configura ...
- Django实现web端tailf日志文件
这是Django Channels系列文章的第二篇,以web端实现tailf的案例讲解Channels的具体使用以及跟Celery的结合 通过上一篇<Django使用Channels实现WebS ...
随机推荐
- java 模拟斗地主发牌洗牌
一 模拟斗地主洗牌发牌 1.案例需求 按照斗地主的规则,完成洗牌发牌的动作. 具体规则: 1. 组装54张扑克牌 2. 将54张牌顺序打乱 3. 三个玩家参与游戏,三人交替摸牌,每人17张牌,最后三张 ...
- 2020-06-02:千万级数据量的list找一个数据。
福哥答案2020-06-02: 对于千万级长度的数组单值查找:序号小的,单线程占明显优势:序号大的,多线程占明显优势.单线程时间不稳定,多线程时间稳定. go语言测试代码如下: package mai ...
- ZoneJS 的原理与应用
目录 序言 Zone 是什么 ZoneJS 的原理 ZoneJS 的应用场景 参考 1. 序言 ZoneJS 是 Angular 团队受到 Dart 的 Zone 的启发,为 Angular v2 及 ...
- Solon Ioc 的注解对比Spring及JSR330
注解对比 Solon 1.0.10 Spring JSR 330 @XInject * @Autowired @Inject 字段或参数注入 @XBean * @Component @Named Be ...
- Mybatis 和 Solon 勾搭在一起,也是个漂亮组合
故事相关的源码 https://gitee.com/noear/solon_demo/tree/master/demo08.solon_mybatis 故事开讲 Solon 是Java世界里一个新的极 ...
- 运用sklearn进行主成分分析(PCA)代码实现
基于sklearn的主成分分析代码实现 一.前言及回顾 二.sklearn的PCA类介绍 三.分类结果区域可视化函数 四.10行代码完成葡萄酒数据集分类 五.完整代码 六.总结 基于sklearn的主 ...
- MySQL设置跳过密码验证
1.linux系统下 在/etc/my.cnf文件中, [mysqld]下面新增skip-grant-tables,然后重启服务器.
- 【POJ2976】Dropping tests - 01分数规划
Description In a certain course, you take n tests. If you get ai out of bi questions correct on test ...
- Jmeter 常用函数(26)- 详解 __chooseRandom
如果你想查看更多 Jmeter 常用函数可以在这篇文章找找哦 https://www.cnblogs.com/poloyy/p/13291704.html 作用 从指定的范围里面取值 语法格式 ${_ ...
- pandas 数据类型转换及描述统计
处理数据的时候往往需要对原始数据进行类型转换和预览等操作,下面介绍常用的处理预览和数据转换方法 预览:例: import pandas as pdsec_weather = pd.read_table ...