flume 1.7 安装与使用
Flume安装
系统要求:
需安装JDK 1.7及以上版本
1、 下载二进制包
下载页面:http://flume.apache.org/download.html
1.7.0下载地址:http://www.apache.org/dyn/closer.lua/flume/1.7.0/apache-flume-1.7.0-bin.tar.gz
2、解压
$ cp ~/Downloads/apache-flume-1.7.0-bin.tar.gz ~
$ cd
$ tar -zxvf apache-flume-1.7.0-bin.tar.gz
$ cd apache-flume-1.7.0-bin
3、创建flume-env.sh文件
$ cp conf/flume-env.sh.template conf/flume-env.sh
简单实例-传输指定文件
场景:两台机器,一台为client,一台为agent,在client上将指定文件传输到agent机器上。
1、创建配置文件
根据flume自身提供的模板,创建flume.conf配置文件。
$ cp conf/flume-conf.properties.template conf/flume.conf
编辑文件flume.conf:
$ vi conf/flume.conf
在文件末尾加入以下配置:
# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory # Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.avro-source1.channels = ch1
agent1.sources.avro-source1.type = avro
agent1.sources.avro-source1.bind = 0.0.0.0
agent1.sources.avro-source1.port = 41414 # Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.log-sink1.channel = ch1
agent1.sinks.log-sink1.type = logger # Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = avro-source1
agent1.sinks = log-sink1
保存,并且退出:
2、启动flume server
在作为agent的机器上执行以下:
bin/flume-ng agent --conf ./conf/ -f conf/flume.conf -Dflume.root.logger=DEBUG,console -n agent1
3、在新的窗口开启client
在作为client的机器上执行以下:
(由于当前环境是在单机上模拟两台机器,所以,直接在新的终端中输入以下命令)
$ bin/flume-ng avro-client --conf conf -H localhost -p 41414 -F /etc/passwd -Dflume.root.logger=DEBUG,console
4、结果
这个时候,你可以看到以下消息:
2012-03-16 16:39:17,124 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:175)] Finished
2012-03-16 16:39:17,127 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:178)] Closing reader
2012-03-16 16:39:17,127 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.run(AvroCLIClient.java:183)] Closing transceiver
2012-03-16 16:39:17,129 (main) [DEBUG - org.apache.flume.client.avro.AvroCLIClient.main(AvroCLIClient.java:73)] Exiting
在前面那个开启flume server的窗口,可以看到如下消息:
2012-03-16 16:39:16,738 (New I/O server boss #1 ([id: 0x49e808ca, /0:0:0:0:0:0:0:0:41414])) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /1
27.0.0.1:39577 => /127.0.0.1:41414] OPEN
2012-03-16 16:39:16,742 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 => /127.0.0.1:41414] BOU
ND: /127.0.0.1:41414
2012-03-16 16:39:16,742 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 => /127.0.0.1:41414] CON
NECTED: /127.0.0.1:39577
2012-03-16 16:39:17,129 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 :> /127.0.0.1:41414] DISCONNECTED
2012-03-16 16:39:17,129 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 :> /127.0.0.1:41414] UNBOUND
2012-03-16 16:39:17,129 (New I/O server worker #1-1) [INFO - org.apache.avro.ipc.NettyServer$NettyServerAvroHandler.handleUpstream(NettyServer.java:123)] [id: 0x0b92a848, /127.0.0.1:39577 :> /127.0.0.1:41414] CLOSED
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@5c1ae90c }
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@6aba4211 }
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@6a47a0d4 }
2012-03-16 16:39:17,302 (Thread-1) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:68)] Event: { headers:{} body:[B@48ff4cf }
...
简单实例-将目录文件上传到HDFS
场景:将机器上的某个文件夹下的文件上传到HDFS上。
1、配置conf/flume.conf
# Define a memory channel called ch1 on agent1
agent1.channels.ch1.type = memory # Define an Avro source called avro-source1 on agent1 and tell it
# to bind to 0.0.0.0:41414. Connect it to channel ch1.
agent1.sources.spooldir-source1.channels = ch1
agent1.sources.spooldir-source1.type = spooldir
agent1.sources.spooldir-source1.spoolDir=/home/hadoop/flume-1.7.0/tmpData
agent1.sources.spooldir-source1.bind = 0.0.0.0
agent1.sources.spooldir-source1.port = 41414 # Define a logger sink that simply logs all events it receives
# and connect it to the other end of the same channel.
agent1.sinks.hdfs-sink1.channel = ch1
agent1.sinks.hdfs-sink1.type = hdfs
agent1.sinks.hdfs-sink1.hdfs.path = hdfs://master:9000/test
agent1.sinks.hdfs-sink1.hdfs.filePrefix = events-
agent1.sinks.hdfs-sink1.hdfs.useLocalTimeStamp = true
agent1.sinks.hdfs-sink1.hdfs.round = true
agent1.sinks.hdfs-sink1.hdfs.roundValue = 10 # Finally, now that we've defined all of our components, tell
# agent1 which ones we want to activate.
agent1.channels = ch1
agent1.sources = spooldir-source1
agent1.sinks = hdfs-sink1
其中,/home/hadoop/flume-1.7.0/tmpData是我要上传的文件所在目录,也就是,我要将此文件夹下的文件都上传到HDFS上的hdfs://master:9000/test目录。
注意:
- 这样的配置会产生许多小文件,因为默认情况下,一个文件存储10个event,这个配置由rollCount控制,默认为10,此外还有一个参数为rollSize,这个是控制一个文件的大小,如果文件大于这个数值,就是另起一文件。
- 此时的文件名都是以event开头,如果想保留原来文件的名字,可以使用以下配置(其中,basenameHeader是相对source而言,filePrefix是相对sink而言,分别这样设置之后,上传到hdfs上的文件名就会变成“原始文件名.时间戳”):
agent1.sources.spooldir-source1.basenameHeader = true
agent1.sinks.hdfs-sink1.hdfs.filePrefix = %{basename}
2、启动agent
使用以下命令启动agent:
bin/flume-ng agent --conf ./conf/ -f ./conf/flume.conf --name agent1 -Dflume.root.logger=DEBUG,console
3、查看结果
到Hadoop提供的WEB GUI界面可以看到刚刚上传的文件是否成功。
GUI界面地址为:http://master:50070/explorer.html#/test
其中,master为Hadoop的Namenode所在的机器名。
4、总结
在这个场景,需要将文件上传到HDFS上,会使用到几个Hadoop的jar包,分别是:
${HADOOP_HOME}share/hadoop/common/hadoop-common-2.4.0.jar
${HADOOP_HOME}share/hadoop/common/lib/commons-configuration-1.6.jar
${HADOOP_HOME}share/hadoop/common/lib/hadoop-auth-2.4.0.jar
${HADOOP_HOME}share/hadoop/hdfs/hadoop-hdfs-2.4.0.jar
异常
Failed to start agent because dependencies were not found in classpath. Error follows. java.lang.NoClassDefFoundError org/apache/hadoop/io/SequenceFile$CompressionType
2016-11-03 14:49:35,278 (conf-file-poller-0) [ERROR - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:146)] Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hadoop/io/SequenceFile$CompressionType
问题原因:缺少依赖包,这个依赖包是以下jar文件:
${HADOOP_HOME}share/hadoop/common/hadoop-common-2.4.0.jar
解决方法:找到这个jar文件,copy到flume安装目录下的lib目录下就ok了。
java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
2016-11-03 16:32:06,741 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed
java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:256)
at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:465)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:368)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
at java.lang.Thread.run(Thread.java:745)
解决方法:
编辑conf/flume.conf文件,其中agent1,sink1替换成你自己的agent和sink
agent1.sinks.sink1.hdfs.useLocalTimeStamp = true
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
2016-11-03 16:32:55,594 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed
java.lang.NoClassDefFoundError: org/apache/commons/configuration/Configuration
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<init>(DefaultMetricsSystem.java:38)
at org.apache.hadoop.metrics2.lib.DefaultMetricsSystem.<clinit>(DefaultMetricsSystem.java:36)
at org.apache.hadoop.security.UserGroupInformation$UgiMetrics.create(UserGroupInformation.java:106)
at org.apache.hadoop.security.UserGroupInformation.<clinit>(UserGroupInformation.java:208)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2554)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:2546)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2412)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:240)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:232)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:668)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:665)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 18 more
解决方法:
缺少的依赖在commons-configuration-1.6.jar包里,这个包在${HADOOP_HOME}share/hadoop/common/lib/下,将其拷贝到flume的lib目录下。
cp ${HADOOP_HOME}share/hadoop/common/lib/commons-configuration-1.6.jar ${FLUME_HOME}/lib/
java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
2016-11-03 16:41:54,629 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:447)] process failed
java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
解决方法:
缺少hadoop-auth-2.4.0.jar依赖,同样将其拷贝到flume的lib目录下:
cp ${HADOOP_HOME}share/hadoop/common/lib/hadoop-auth-2.4.0.jar ${FLUME_HOME}/lib/
HDFS IO error java.io.IOException: No FileSystem for scheme: hdfs
2016-11-03 16:49:26,638 (SinkRunner-PollingRunner-DefaultSinkProcessor) [WARN - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:443)] HDFS IO error
java.io.IOException: No FileSystem for scheme: hdfs
缺少依赖:hadoop-hdfs-2.4.0.jar
cp ${HADOOP_HOME}share/hadoop/hdfs/hadoop-hdfs-2.4.0.jar ${FLUME_HOME}/lib/
flume 1.7 安装与使用的更多相关文章
- flume伪分布式安装
flume伪分布式安装: 1.导包:apache-flume-1.7.0-bin.tar.gz 2.配置环境变量:/etc/profile export FLUME_HOME=/yang/apache ...
- 具体说明 Flume介绍、安装和配置
社论: 本文总结"Hadoop生态系统"中的当中一员--Apache Flume 写在前面二: 所用软件说明: 一.什么是Apache Flume 官网:Flume is a di ...
- Flume简介及安装
Hadoop业务的大致开发流程以及Flume在业务中的地位: 从Hadoop的业务开发流程图中可以看出,在大数据的业务处理过程中,对于数据的采集是十分重要的一步,也是不可避免的一步,从而引出我们本文的 ...
- flume 1.8 安装部署
环境 centos:7.2 JDK:1.8 Flume:1.8 一.Flume 安装 1) 下载 wget http://mirrors.tuna.tsinghua.edu.cn/apa ...
- Flume 案例 Telnet安装及采集Telnet发送信息到控制台
Telnet安装 一.查看本机是否安装telnet #rpm -qa | grep telnet 如果什么都不显示.说明你没有安装telnet 二.开始安装 yum install xinetd yu ...
- Flume入门:安装、部署
一.什么是Flume? flume 作为 cloudera 开发的实时日志收集系统,受到了业界的认可与广泛应用.Flume 初始的发行版本目前被统称为 Flume OG(original genera ...
- Apache Flume的介绍安装及简单案例
概述 Flume 是 一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的软件.Flume 的核心是把数据从数据源(source)收集过来,再将收集到的数据送到指定的目的地(sink).为了保证 ...
- Apache Flume简介及安装部署
概述 Flume 是 Cloudera 提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的软件. Flume 的核心是把数据从数据源(source)收集过来,再将收集到的数据送到指定的目 ...
- 具体图解 Flume介绍、安装配置
写在前面一: 本文总结"Hadoop生态系统"中的当中一员--Apache Flume 写在前面二: 所用软件说明: 一.什么是Apache Flume 官网:Flume is a ...
随机推荐
- util 常用方法
C:\Program Files\Java\jdk1.8.0_171\src.zip!\java\lang\System.java /** * Returns the current time in ...
- 多线程入门-第五章-线程的调度与控制之yield
yield与sleep类似,只是不能指定暂停多长时间,并且只能让同优先级的线程有执行的机会,让位时间不固定. /* yield使用 */ public class ThreadTest04 { pub ...
- requests设置Authorization
headers = {"Authorization", "Bearer {}".format(token_string)} r = requests.get(& ...
- crontab下设置ntpdate的问题
1.在crontab里设置了ntpdate 同步时间,一段时间发现没有起作用 原来的写法是 20 00 × × × ntpdate cn.pool.ntp.org 单独拿出来执行也是没问题的,最近好好 ...
- beego
https://www.kancloud.cn/hello123/beego/126087
- 12.Project Fields to Return from Query-官方文档摘录
1 插入例句 db.inventory.insertMany( [ { item: "journal", status: "A", size: { h: 14, ...
- Python 集合(set)的使用总结
集合的特点:去重.无序,因此无法通过下标取值. 1. 定义集合 s = set() #定义空的集合 s2 = {'} # 不是key -value形式的话就是集合,不是字典 s3 ={'} print ...
- Linux系统——awk命令
awk命令不仅仅是Linux系统的命令,也是一种编程语言,用来处理数据和生成报告(Exel),处理的数据可以是一个或多个文件(标准输入和管道获取标准输入).可在命令行上编辑操作,也可以写成awk程序运 ...
- 关于js的异常
遇到异常,通常会有两种处理办法1.处理异常 try{ //可能出现异常的代码 }catch(e){ //处理异常 } 2.抛出异常 public void getName throws Excepti ...
- javascript与jQuery的each,map回调函数参数顺序问题
<script> var arr = [2,3,6,7,9]; //javascript中的forEach 和 map方法 arr.forEach(function(value,index ...