Flume_企业中日志处理

企业中的日志存放_1

201611/20161112.log.tmp

　　第二天文件变为20161112.log与20161113.log.tmp

拷贝一份flume-conf.properties.template改名为dir-mem-hdfs.properties

实现监控某一目录,如有新文件产生则上传至hdfs,另外过滤掉新文件中tmp文件

dir-mem-hdfs.properties

　　a1.sources = s1

　　a1.channels = c1

　　a1.sinks = k1

　　# defined the source

　　a1.sources.s1.type = spooldir

　　a1.sources.s1.spoolDir = /opt/data/log_hive/20161109

　　a1.sources.s1.includePattern = ([^ ]*\.log$) # 包含某些字段

　　a1.sources.s1.ignorePattern = ([^ ]*\.tmp$)  # 忽略某些字段

　　# defined the channel

　　a1.channels.c1.type = memory

　　a1.channels.c1.capacity = 1000

　　a1.channels.c1.transactionCapacity = 1000

　　# defined the sink

　　a1.sinks.k1.type = hdfs

　　a1.sinks.k1.hdfs.useLocalTimeStamp = true

　　a1.sinks.k1.hdfs.path = /flume/spdir

　　a1.sinks.k1.hdfs.fileType = DataStream

　　a1.sinks.k1.hdfs.rollInterval = 0

　　a1.sinks.k1.hdfs.rollSize = 20480

　　a1.sinks.k1.hdfs.rollCount = 0

　　# The channel can be defined as follows.

　　a1.sources.s1.channels = c1

　　a1.sinks.k1.channel = c1

flmue目录下执行

　　bin/flume-ng agent -c conf/ -n a1 -f conf/dir-mem-hdfs.properties -Dflume.root.logger=INFO,console

　　这里使用了memory channel,可以使用file channel更加安全

企业中的日志存放_2

201611/20161112.log

　　第二天文件继续往20161112.log写

这样,既要使用exec和spoolingdir,如何处理

编译flume1.7版tail dir source,并集成到我们已有的flume环境

　　1. window上下载安装git

　　2. 在某个目录下加一个空的文件夹(文件夹路径尽量不要有中文),例GitHub

　　3. 使用github常用命令

　　　　$ pwd

　　　　$ ls

　　　　$ cd /C/Users/Administrator/Desktop/GitHub

　　　　$ git clone (https|git)://github.com/apache/flume.git

　　　　$ cd flume

　　　　$ git branch -r # 查看有哪些分支

　　　　$ git branch -r # 查看当前属于哪个分支

　　　　$ git checkout origin/flume-1.7 #别换分支

　　拷贝flume\flume-ng-sources\flume-taildir-source

　　使用eclipse导入flume-taildir-source项目

　　修改pom.xml

　　<repositories>

　　　　<repository>

　　　　　　<id>cloudera</id>

　　　　　　<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>

　　　　</repository>

　　</repositories>

　　<modelVersion>4.0.0</modelVersion>

　　<groupId>org.apache.flume.flume-ng-sources</groupId>

　　<artifactId>flume-taildir-source</artifactId>

　　<version>1.5.0-cdh5.3.6</version>

　　<name>Flume Taildir Source</name>

　　<build>

　　　　<plugins>

　　　　　　<plugin>

　　　　　　　　<groupId>org.apache.maven.plugins</groupId>

　　　　　　　　<artifactId>maven-compiler-plugin</artifactId>

　　　　　　　　<version>2.3.2</version>

　　　　　　　　<configuration>

　　　　　　　　　　<source>1.7</source>

　　　　　　　　　　<target>1.7</target>

　　　　　　　　</configuration>

　　　　　　</plugin>

　　　　</plugins>

　　</build>

　　<dependencies>

　　　　<dependency>

　　　　　　<groupId>org.apache.flume</groupId>

　　　　　　<artifactId>flume-ng-core</artifactId>

　　　　　　<version>1.5.0-cdh5.3.6</version>

　　　　</dependency>

　　　　<dependency>

　　　　　　<groupId>junit</groupId>

　　　　　　<artifactId>junit</artifactId>

　　　　　　<version>4.10</version>

　　　　　　<scope>test</scope>

　　　　</dependency>

　　</dependencies>

　　4. MAVEN_BULID项目,获取jar包并放到当前flume的环境中(lib目录)

　　5. 创建文件夹和文件

　　　　$ mkdir -p /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/position

　　　　$ mkdir -p /opt/data/tail/hadoop-dir/

　　　　$ echo "" > /opt/data/tail/hadoop.log

　　　　拷贝一份flume-conf.properties.template改名为tail-mem-hdfs.properties

　　　　可从源码看出需要的参数

　　　　　　a1.sources = s1

　　　　　　a1.channels = c1

　　　　　　a1.sinks = k1

　　　　　　# defined the source

　　　　　　a1.sources.s1.type = org.apache.flume.source.taildir.TaildirSource

　　　　　　a1.sources.s1.positionFile = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/position/taildir_position.json

　　　　　　a1.sources.s1.filegroups = f1 f2

　　　　　　a1.sources.s1.filegroups.f1 = /opt/data/tail/hadoop.log

　　　　　　a1.sources.s1.filegroups.f2 = /opt/data/tail/hadoop-dir/.*

　　　　　　a1.sources.s1.headers.f1.headerKey1 = value1

　　　　　　a1.sources.s1.headers.f2.headerKey1 = value2-1

　　　　　　a1.sources.s1.headers.f2.headerKey2 = value2-2

　　　　　　a1.sources.s1.fileHeader = true

　　　　　　# defined the channel

　　　　　　a1.channels.c1.type = memory

　　　　　　a1.channels.c1.capacity = 1000

　　　　　　a1.channels.c1.transactionCapacity = 1000

　　　　　　# defined the sink

　　　　　　a1.sinks.k1.type = hdfs

　　　　　　a1.sinks.k1.hdfs.useLocalTimeStamp = true

　　　　　　a1.sinks.k1.hdfs.path = /flume/spdir

　　　　　　a1.sinks.k1.hdfs.fileType = DataStream

　　　　　　a1.sinks.k1.hdfs.rollInterval = 0

　　　　　　a1.sinks.k1.hdfs.rollSize = 20480

　　　　　　a1.sinks.k1.hdfs.rollCount = 0

　　　　　　# The channel can be defined as follows.

　　　　　　a1.sources.s1.channels = c1

　　　　　　a1.sinks.k1.channel = c1

　　flmue目录下执行

　　　　bin/flume-ng agent -c conf/ -n a1 -f conf/tail-mem-hdfs.properties -Dflume.root.logger=INFO,console

　　　　测试文件或新数据

企业中常用架构 Flume多sink

同一份数据采集到不同框架处理

采集source: 一份数据

管道channel: 多个

目标sink: 多个

如果多个sink从一个channel取数据将取不完整,而source会针对channel分别发送

设计: source--hive.log channel--file sink--hdfs(不同路径)

拷贝一份flume-conf.properties.template改名为hive-file-sinks.properties

hive-file-sinks.properties

　　a1.sources = s1

　　a1.channels = c1 c2

　　a1.sinks = k1 k2

　　# defined the source

　　a1.sources.s1.type = exec

　　a1.sources.s1.command = tail -F /opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs/hive.log

　　a1.sources.s1.shell = /bin/sh -c

　　# defined the channel 1

　　a1.channels.c1.type = file

　　a1.channels.c1.checkpointDir = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/checkp1

　　a1.channels.c1.dataDirs = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/data1

　　# defined the channel 2

　　a1.channels.c2.type = file

　　a1.channels.c2.checkpointDir = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/checkp2

　　a1.channels.c2.dataDirs = /opt/cdh-5.6.3/apache-flume-1.5.0-cdh5.3.6-bin/datas/data2

　　# defined the sink 1

　　a1.sinks.k1.type = hdfs

　　a1.sinks.k1.hdfs.path = /flume/hdfs/sink1

　　a1.sinks.k1.hdfs.fileType = DataStream

　　# defined the sink 2

　　a1.sinks.k2.type = hdfs

　　a1.sinks.k2.hdfs.path = /flume/hdfs/sink2

　　a1.sinks.k2.hdfs.fileType = DataStream

　　# The channel can be defined as follows.

　　a1.sources.s1.channels = c1 c2

　　a1.sinks.k1.channel = c1

　　a1.sinks.k2.channel = c2

flmue目录下执行

　　bin/flume-ng agent -c conf/ -n a1 -f conf/hive-file-sinks.properties -Dflume.root.logger=INFO,console

hive目录下执行

　　bin/hive -e "show databases"

Flume_企业中日志处理的更多相关文章

东正王增涛浅析OA信息化整合平台系统在企业中的应用价值
王增涛说OA信息化整合平台系统作为企业管理中最基础的管理软件,已在企业成长道路上存在多年,它的应用开启了智能移动办公的先河,也让企业的办公流程管理更加的便捷.高效.流畅.省时.省力,它的使用不但让企业 ...
利用log4j+mongodb实现分布式系统中日志统一管理
背景在分布式系统当中,我们有各种各样的WebService,这些服务可能分别部署在不同的服务器上,并且有各自的日志输出.为了方便对这些日志进行统一管理和分析.我们可以将日志统一输出到指定的数 ...
再谈SQL Server中日志的的作用
简介之前我已经写了一个关于SQL Server日志的简单系列文章.本篇文章会进一步挖掘日志背后的一些概念,原理以及作用.如果您没有看过我之前的文章,请参阅: 浅谈SQL Server ...
LR中日志设置和日志函数
LR中日志参数的设置与使用 1.Run-Time Setting日志参数的设置在loadrunner的vuser菜单下的Run-Time Setting的General的LOG选项中可以对在执行脚本 ...
Android中日志信息的打印方式
Android中日志信息的打印方式主要有以下7种: 1)System.out(i级别) 2)System.err(w级别) 3)Log.v 4)Log.d 5)Log.i 6)Log.w 7)Log. ...
SQL Server中日志
再谈SQL Server中日志的的作用简介之前我已经写了一个关于SQL Server日志的简单系列文章.本篇文章会进一步挖掘日志背后的一些概念,原理以及作用.如果您没有看过我之前的文章,请参阅: ...
详解BOM用途分类及在汽车企业中的应用
摘要:在整车企业中,信息系统的BOM是联系CAD.CAPP.PDM和ERP的纽带,按照用途划分产品要经过产品设计,工程设计.工艺制造设计.生产制造4个阶段,相应的在这4个过程中分别产生了名称十分相似但 ...
YII2中日志的配置与使用
YII2中给我们提供了非常方便的日志组件,只需要简单配置一下就可以使用. 我们在config/web.php中配置如下: return [ //log必须在bootstrap期间就被加载,便于及时调度 ...
Tomcat中日志组件
Tomcat日志组件 AccessLog接口 public interface AccessLog { public void log(Request request, Response respon ...

随机推荐

关于imageOrientation
用相机拍出来的照片都含有EXIF信息,UIImage的imageOrientation属性指的就是EXIF中的orientation信息.如果我们忽略orientation信心,而直接对照片进行想速处 ...
记录NS_ASSUME_NONNULL_BEGIN和NS_ASSUME_NONNULL_END。
Nonnull区域设置(Audited Regions) 如果需要每个属性或每个方法都去指定nonnull和nullable,是一件非常繁琐的事.苹果为了减轻我们的工作量,专门提供了两个宏:NS_AS ...
javaScript判断浏览器类型
<script type="text/javascript"> function getBrowserInfo(){ var OsObject=navigator.us ...
memcache入门笔记
向memcached保存数据时可以指定期限(秒).不指定期限时,memcached按照LRU算法保存数据. 这三个方法的区别如下: 选项说明 add 仅当存储空间中不存在键相同的数据时才保存 rep ...
php Your system does not support any of these drivers: gmagick,imagick,gd2
缺少这些库时,安装 : apt-get install php5-gd 就可以.
ppmoney 总结一
1.JQ $.get() <!DOCTYPE html> <html lang="en"> <head> <meta charset=&q ...
【leetcode】LRU Cache
题目简述: Design and implement a data structure for Least Recently Used (LRU) cache. It should support t ...
CozyRSS开发记录22-界面退化
CozyRSS开发记录22-界面退化 1.问题1-HtmlTextBlock 找的这个HtmlTextBlock有很严重的bug,有时候显示不完全,有时候直接就崩了.然后看了下代码,完全是学生仔水平写 ...
[NHibernate]利用LINQPad查看NHibernate生成SQL语句
上篇文章中我们提到可以通过重写NHibernate的 EmptyInterceptor 拦截器来监控NHibernate发送给数据库的SQL脚本,今天看到有朋友用LINQPad工具来进行NHibern ...
java spring 配置文件的读取
java读取本地配置文件主要分为两类,一类为class相关文件或子文件夹下,一类文件为jar包外配置文件. class相关文件夹或子文件夹下读取配置文件可以使用Object.class.getRes ...

Flume_企业中日志处理

Flume_企业中日志处理的更多相关文章

随机推荐

热门专题