现在的需求是在一台Flume采集机器上,往Hadoop集群上写HDFS,该机器没有安装Hadoop。

这里的Flume版本是1.6.0,Hadoop版本是2.7.1.

把Hadoop集群的hdfs-site.xml、core-site.xml两个配置文件复制到 flume安装目录的conf目录去,把hadoop-hdfs-2.7.1.jar复制到 Flume  lib目录。

一、Flume配置文件:

a1.sources = r1
a1.channels = c1
a1.sinks = k1 a1.sources.r1.type = syslogtcp
a1.sources.r1.bind = 192.168.110.160 # 本机ip
a1.sources.r1.port = 23003
a1.sources.r1.workerThreads = 10 a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 100000
a1.channels.c1.keep-alive = 6
a1.channels.c1.byteCapacityBufferPercentage = 20 a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://clusterpc/test/flume/%y-%m-%d
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.useLocalTimeStamp=true a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

  启动: bin/flume-ng agent --conf conf --conf-file conf/flume-tcp-memory-hdfs.conf --name a1 -Dflume.root.logger=info,console

二、错误集:

1、 找不到主机名

2016-09-19 16:15:48,518 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:459)] process failed
java.lang.IllegalArgumentException: java.net.UnknownHostException: cluster
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:378)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:310)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:176)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:678)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: java.net.UnknownHostException: cluster

  cluster是公司Hadoop集群NameService的名字,这个错误是由于找不到Hadoop集群NameService造成的,所以需要把hdfs-site.xml复制到flume/conf目录。

2、

java.io.IOException: Mkdirs failed to create /test/flume/16-09-19 (exists=false, cwd=file:/data/apache-flume-1.6.0-bin)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:450)
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787)
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:776)
at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:96)
at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:78)
at org.apache.flume.sink.hdfs.HDFSSequenceFile.open(HDFSSequenceFile.java:69)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:246)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

  把 core-site.xml复制到flume/conf目录

3、

java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:170)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:355)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:243)
at org.apache.flume.sink.hdfs.BucketWriter$1.call(BucketWriter.java:235)
at org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
at org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
at org.apache.flume.sink.hdfs.BucketWriter$9.call(BucketWriter.java:676)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

  把hadoop-hdfs-2.7.1.jar复制到flume/lib目录下

4、HDFS权限不足,这里往HDFS写文件的用户是登录Flume采集机器的用户。

org.apache.hadoop.security.AccessControlException: Permission denied: user=kafka, access=WRITE, inode="/test/flume/16-09-19/events-.1474268726127.tmp":hadoop:supergroup:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:213)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1698)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1682)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1665)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2517)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2452)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2335)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:623)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:397)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)

  HDFS 权限不足,要授权。hadoop fs -chmod -R 777 /test/

5、时间戳

java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:228)
at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:432)
at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:380)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
at java.lang.Thread.run(Thread.java:744)

  原因是Event对象headers没有设置timestamp造成的,解决办法:设置a1.sinks.k1.hdfs.useLocalTimeStamp=true,使用本地时间戳。

Flume 远程写HDFS的更多相关文章

  1. Flume中的HDFS Sink配置参数说明【转】

    转:http://lxw1234.com/archives/2015/10/527.htm 关键字:flume.hdfs.sink.配置参数 Flume中的HDFS Sink应该是非常常用的,其中的配 ...

  2. shell脚本监控Flume输出到HDFS上文件合法性

    在使用flume中发现由于网络.HDFS等其它原因,使得经过Flume收集到HDFS上得日志有一些异常,表现为: 1.有未关闭的文件:以tmp(默认)结尾的文件.加入存到HDFS上得文件应该是gz压缩 ...

  3. 客户端用java api 远程操作HDFS以及远程提交MR任务(源码和异常处理)

    两个类,一个HDFS文件操作类,一个是wordcount 词数统计类,都是从网上看来的.上代码: package mapreduce; import java.io.IOException; impo ...

  4. prometheus远程写参数优化

    一.概述 prometheus可以通过远程存储来解决自身存储的瓶颈,所以其提供了远程存储接口,并可以通过过配置文件进行配置(prometheus.yml).一般情况下我们使用其默认的配置参数,但是为了 ...

  5. flume远程调试

    项目开发的时候,出现问题的时候,通常在IDE里面直接进行调试,但有时候我们可能用的是另外的一些开源框架,甚至运行程序里面没有一行代码是我们自己写的,如果出现一些较复杂的问题,那么我们可能就会用到远程调 ...

  6. webhdfs追加写HDFS异常

    问题 {:timestamp=>"2015-03-04T00:02:47.224000+0800", :message=>"Retrying webhdfs ...

  7. flume data to hdfs

    flume 开发梳理 flume 数据到hadoop conf/hdfsAgent.conf #配置sources.channels.sinks a1.sources=r1 a1.channels=c ...

  8. flume 中的 hdfs sink round 和roll

    http://blog.csdn.net/kntao/article/details/49278239 http://flume.apache.org/FlumeUserGuide.html#exec ...

  9. Nginx日志通过Flume导入到HDFS中

    关注公众号:分享电脑学习回复"百度云盘" 可以免费获取所有学习文档的代码(不定期更新) flume上传到hdfs: 当我们的数据量比较大时,比如每天的日志文件达到5G以上 使用ha ...

随机推荐

  1. iOS类实现里面怎么用属性

    属性(properity)是一个很好用的东西,简单而直接.Objective-C还创建了一个点语法来帮助大家使用.根据以前C++的习惯,类外访问实例变量时,最好通过getter/setter方法,也就 ...

  2. 看啦这么就别人的博客 我也来写一篇! Object转换其他类型

    package com.sinitek.framework.util; import java.math.BigDecimal;import java.sql.Timestamp;import jav ...

  3. cocos2d项目 打包apk 项目名称相关设置

    修改android项目名称(打包生成的默认apk名称),直接找到proj.android目录下.project文件夹里面比较靠前的xml配置,修改<name>项目名称</name&g ...

  4. windows Apache+cgi的配置方法

    1.  配置config line 119 :打开#LoadModule rewrite_module modules/mod_rewrite.so line 192 :<Directory / ...

  5. 数迹学——Asp.Net MVC4入门指南(4):添加一个模型

    一.添加模型类 二.添加MovieDBContext类,连接数据库 DbContext类继承自 System.Data.Entity; 负责在数据库中获取,存储,更新,处理实例 MovieDBCont ...

  6. ajax请求后弹开新页面被浏览器拦截

    window.open()我想应该很多人都不陌生吧,它可以实现除用a标签以外来实现打开新窗口! 最近开发项目用到时,却遇到了麻烦,本来好好的弹出窗口,结果被浏览器无情的给拦截了! 代码如下: $.ge ...

  7. spring mvc 4数据校验 validator

    注解式控制器的数据验证.类型转换及格式化——跟着开涛学SpringMVC http://jinnianshilongnian.iteye.com/blog/1733708Spring4新特性——集成B ...

  8. Win10外包公司——长年承接Win10App外包、Win10通用应用外包

    在几天前的WinHEC大会中,微软特意在大会中展示了其对通用应用的称呼规范,现在,适用于Windows通用平台的应用的正式名称为“Windows应用”(Windows apps),简洁明了. 总而言之 ...

  9. 将WeX5部署到自己的Tomcat服务器上

    页面服务UIServer布署 WeX5自带页面服务UIServer的是标准Web应用,可以部署在Java Web应用服务器上.下面介绍如何在Tomcat和WebLogic中部署WeX5的UIServe ...

  10. 测试工具之Charles视频教程(更新中。。。)

    应群里小伙伴学习需求,录制新版 Charles V4 系列教程,后续内容抽空更新,测试工具系列带你上王者...(ノ°ο°)ノ前方高能预警 链接:http://pan.baidu.com/s/1c16P ...