现象: 执行mapreduce任务时失败

通过hadoop fsck -openforwrite命令查看发现有文件没有关闭。

[root@com ~]# hadoop fsck -openforwrite /data/rc/click/mpp/15-08-05/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://com.hunantv.hadoopnamenode:50070
FSCK started by root (auth:SIMPLE) from /10.100.1.46 for path /data/rc/click/mpp/15-08-05/ at Thu Aug 06 14:05:03 CST 2015
....................................................................................................
....................................................................................................
........./data/rc/click/mpp/15-08-05/FlumeData.1438758322864 42888 bytes, 1 block(s), OPENFORWRITE:
/data/rc/click/mpp/15-08-05/FlumeData.1438758322864: Under replicated BP-1672356070-10.100.1.36-1412072991411:blk_1120646538_47162789{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-f4fff5f3-f3fd-4054-a75c-1d7da53a73af:NORMAL|FINALIZED], ReplicaUnderConstruction[[DISK]DS-26f54bc5-5026-4e6a-94ec-8435224e4aa9:NORMAL|RWR], ReplicaUnderConstruction[[DISK]DS-4ab3fffc-6468-47df-8023-79f23a330371:NORMAL|FINALIZED]]}. Target Replicas is 3 but found 2 replica(s).
..........................................................................................
............................Status: HEALTHY
Total size: 99186583 B
Total dirs: 1
Total files: 328
Total symlinks: 0
Total blocks (validated): 328 (avg. block size 302398 B)
Minimally replicated blocks: 328 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (0.30487806 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.996951
Corrupt blocks: 0
Missing replicas: 1 (0.101626016 %)
Number of data-nodes: 59
Number of racks: 6
FSCK ended at Thu Aug 06 14:05:03 CST 2015 in 36 milliseconds

The filesystem under path '/data/rc/click/mpp/15-08-05/' is HEALTHY

查看FLume日志

[root@10.100.1.117] out: 05 Aug 2015 11:15:19,322 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:234) - Creating hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp
[root@10.100.1.117] out: 05 Aug 2015 11:16:20,493 INFO [hdfs-sin_hdfs_201-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter$5.call:429) - Closing idle bucketWriter hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp at 1438744580493
[root@10.100.1.117] out: 05 Aug 2015 11:16:20,497 INFO [hdfs-sin_hdfs_201-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:363) - Closing hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp
[root@10.100.1.117] out: 05 Aug 2015 11:16:30,501 WARN [hdfs-sin_hdfs_201-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:370) - failed to close() HDFSWriter for file (hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp). Exception follows.
[root@10.100.1.117] out: java.io.IOException: Callable timed out after 10000 ms on file: hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp
[root@10.100.1.117] out: 05 Aug 2015 11:16:30,503 INFO [hdfs-sin_hdfs_201-call-runner-7] (org.apache.flume.sink.hdfs.BucketWriter$8.call:629) - Renaming hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp to hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293

关闭hdfs文件操作因为超时失败,

查看源码:

 public synchronized void close(boolean callCloseCallback)
throws IOException, InterruptedException {
checkAndThrowInterruptedException();
try {
flush();
} catch (IOException e) {
LOG.warn("pre-close flush failed", e);
}
boolean failedToClose = false;
LOG.info("Closing {}", bucketPath);
CallRunner<Void> closeCallRunner = createCloseCallRunner();
if (isOpen) {
try {
callWithTimeout(closeCallRunner);
sinkCounter.incrementConnectionClosedCount();
} catch (IOException e) {
LOG.warn(
"failed to close() HDFSWriter for file (" + bucketPath +
"). Exception follows.", e);
sinkCounter.incrementConnectionFailedCount();
failedToClose = true;
}
isOpen = false;
} else {
LOG.info("HDFSWriter is already closed: {}", bucketPath);
} // NOTE: timed rolls go through this codepath as well as other roll types
if (timedRollFuture != null && !timedRollFuture.isDone()) {
timedRollFuture.cancel(false); // do not cancel myself if running!
timedRollFuture = null;
} if (idleFuture != null && !idleFuture.isDone()) {
idleFuture.cancel(false); // do not cancel myself if running!
idleFuture = null;
} if (bucketPath != null && fileSystem != null) {
// could block or throw IOException
try {
renameBucket(bucketPath, targetPath, fileSystem);
} catch(Exception e) {
LOG.warn(
"failed to rename() file (" + bucketPath +
"). Exception follows.", e);
sinkCounter.incrementConnectionFailedCount();
final Callable<Void> scheduledRename =
createScheduledRenameCallable();
timedRollerPool.schedule(scheduledRename, retryInterval,
TimeUnit.SECONDS);
}
}
if (callCloseCallback) {
runCloseAction();
closed = true;
}
}

默认超时为10000ms,失败后没有重试,代码中有 failedToClose变量, 但未用到,可能开发人员忘了处理了。。。

解决方法:

1. 配置调用操作超时时间,将其调大一点,如5分钟。Flume hdfs sink配置如下:

agent12.sinks.sin_hdfs_201.type=hdfs
agent12.sinks.sin_hdfs_201.channel=ch_hdfs_201
agent12.sinks.sin_hdfs_201.hdfs.path=hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-%{month}-%{day}
agent12.sinks.sin_hdfs_201.hdfs.round=true
agent12.sinks.sin_hdfs_201.hdfs.roundValue=10
agent12.sinks.sin_hdfs_201.hdfs.roundUnit=minute
agent12.sinks.sin_hdfs_201.hdfs.fileType=DataStream
agent12.sinks.sin_hdfs_201.hdfs.writeFormat=Text
agent12.sinks.sin_hdfs_201.hdfs.rollInterval=0
agent12.sinks.sin_hdfs_201.hdfs.rollSize=209715200
agent12.sinks.sin_hdfs_201.hdfs.rollCount=0
agent12.sinks.sin_hdfs_201.hdfs.idleTimeout=300
agent12.sinks.sin_hdfs_201.hdfs.batchSize=100
agent12.sinks.sin_hdfs_201.hdfs.minBlockReplicas=1
agent12.sinks.sin_hdfs_201.hdfs.callTimeout=300000

  

2. 修改源码,增加重试。如下:

 public synchronized void close(boolean callCloseCallback)
throws IOException, InterruptedException {
checkAndThrowInterruptedException();
try {
flush();
} catch (IOException e) {
LOG.warn("pre-close flush failed", e);
}
boolean failedToClose = false;
LOG.info("Closing {}", bucketPath);
CallRunner<Void> closeCallRunner = createCloseCallRunner();
int tryTime = 1;
while (isOpen && tryTime <= 5) {
try {
callWithTimeout(closeCallRunner);
sinkCounter.incrementConnectionClosedCount();
} catch (IOException e) {
LOG.warn(
"failed to close() HDFSWriter for file (try times:" + tryTime + "): " + bucketPath +
". Exception follows.", e);
sinkCounter.incrementConnectionFailedCount();
failedToClose = true;
}
if (failedToClose) {
isOpen = true;
tryTime++;
Thread.sleep(this.callTimeout);
} else {
isOpen = false;
}
}
//如果isopen失敗
if (isOpen) {
LOG.error("failed to close file: " + bucketPath + " after " + tryTime + " tries.");
} else {
LOG.info("HDFSWriter is already closed: {}", bucketPath);
} // NOTE: timed rolls go through this codepath as well as other roll types
if (timedRollFuture != null && !timedRollFuture.isDone()) {
timedRollFuture.cancel(false); // do not cancel myself if running!
timedRollFuture = null;
} if (idleFuture != null && !idleFuture.isDone()) {
idleFuture.cancel(false); // do not cancel myself if running!
idleFuture = null;
} if (bucketPath != null && fileSystem != null) {
// could block or throw IOException
try {
renameBucket(bucketPath, targetPath, fileSystem);
} catch (Exception e) {
LOG.warn(
"failed to rename() file (" + bucketPath +
"). Exception follows.", e);
sinkCounter.incrementConnectionFailedCount();
final Callable<Void> scheduledRename =
createScheduledRenameCallable();
timedRollerPool.schedule(scheduledRename, retryInterval,
TimeUnit.SECONDS);
}
}
if (callCloseCallback) {
runCloseAction();
closed = true;
}
}

  

												

[bigdata] 使用Flume hdfs sink, hdfs文件未关闭的问题的更多相关文章

  1. HDFS Sink使用技巧

    1.文件滚动策略 在HDFS Sink的文件滚动就是文件生成,即关闭当前文件,创建新文件.它的滚动策略由以下几个属性控制: hdfs.rollInterval 基于时间间隔来进行文件滚动,默认是30, ...

  2. Flume采集处理日志文件

    Flume简介 Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据:同时,Flume提供对数据 ...

  3. flume中sink到hdfs,文件系统频繁产生文件,文件滚动配置不起作用?

    在测试hdfs的sink,发现sink端的文件滚动配置项起不到任何作用,配置如下: a1.sinks.k1.type=hdfs a1.sinks.k1.channel=c1 a1.sinks.k1.h ...

  4. flume中sink到hdfs,文件系统频繁产生文件和出现乱码,文件滚动配置不起作用?

    问题描述  解决办法 先把这个hdfs目录下的数据删除.并修改配置文件flume-conf.properties,重新采集. # Licensed to the Apache Software Fou ...

  5. Flume中的HDFS Sink配置参数说明【转】

    转:http://lxw1234.com/archives/2015/10/527.htm 关键字:flume.hdfs.sink.配置参数 Flume中的HDFS Sink应该是非常常用的,其中的配 ...

  6. Flume启动报错[ERROR - org.apache.flume.sink.hdfs. Hit max consecutive under-replication rotations (30); will not continue rolling files under this path due to under-replication解决办法(图文详解)

    前期博客 Flume自定义拦截器(Interceptors)或自带拦截器时的一些经验技巧总结(图文详解)   问题详情 -- ::, (SinkRunner-PollingRunner-Default ...

  7. shell脚本监控Flume输出到HDFS上文件合法性

    在使用flume中发现由于网络.HDFS等其它原因,使得经过Flume收集到HDFS上得日志有一些异常,表现为: 1.有未关闭的文件:以tmp(默认)结尾的文件.加入存到HDFS上得文件应该是gz压缩 ...

  8. [ETL] Flume 理论与demo(Taildir Source & Hdfs Sink)

    一.Flume简介 1. Flume概述 Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据: ...

  9. Flume采集目录及文件到HDFS案例

    采集目录到HDFS 使用flume采集目录需要启动hdfs集群 vi spool-hdfs.conf # Name the components on this agent a1.sources = ...

随机推荐

  1. EntityFramework 数据库连接可用代码动态设定

    摘自:http://blog.csdn.net/dyllove98/article/details/9289553 数据库生成位置可控制(其实主要就是DbContext的构造函数) 1.使用DbCon ...

  2. Windows 8.1 低功耗蓝牙开发

    1. 概述 在蓝牙4.0发布以前,给大家的直观印象就是蓝牙耳机,它就是用来满足短距离内中等带宽的音频通信需求.然而蓝牙4.0发布之后,用途就大不一样了,特别是现在物联网和可穿戴之风盛行的年代,很多小玩 ...

  3. 在Windows上安装虚拟机详细图文教程

    用虚拟机来安装最新的软件,安装最新的系统等等比较保险,可以避免安装不顺利影响自己原来的系统和应用,想尝鲜又担心自己完全更换系统不适应的朋友可以尝试. 虚拟机下载:https://yunpan.cn/c ...

  4. NopCommerce 增加 Customer Settings

    预期: 仿照Customer 的 Phone number enabled 和 required 增加MemberType 相关步骤如下: 1.运行站点 Admin -> Settings -& ...

  5. [tem]RMQ(st)

    倍增思想 代码中有两个测试 #include <iostream> #include <cmath> using namespace std; const int N=1e5; ...

  6. linux 、 jmeter部署安装

    1.安装&配置 可在Linux服务器上利用服务器强大的性能,执行JMeter进行性能测试. 当然,可在Windows机器上先编好测试计划(注意版本匹配,否则可能产生莫名错误),然后下载到Lin ...

  7. 使用JS实现前端缓存

    在前端浏览器中,有些数据(比如数据字典中的数据),可以在第一次请求的时候全部拿过来保存在js对象中,以后需要的时候就不用每次都去请求服务器了.对于那些大量使用数据字典来填充下拉框的页面,这种方法可以极 ...

  8. React-Router学习整理

    欢迎大家指导与讨论 : ) 一.前言 本文摘要:react-router的基本用法,动画效果与路由,路由权限控制,路由离开确认,根据路由切分的按需加载,路由组件的属性.本文是笔者在学习时整理的笔记,由 ...

  9. docker-compose启动的tomcat无法远程连接jmx

    最近想学习下java GC优化,就用了一下VisualVM,在远程服务器启动了一个非docker的tomcat,很顺利的就连接了,但是用docker-compose启动的服务却 怎么也连不上,一定是d ...

  10. 解决Firefox/Opera 不支持onselectstart事件实现不允许用户select

    在IE/Safari/Chrome中我们可以使用onselectstart事件来阻止用户选定元素内文本,本文为大家解决下火狐中如何实现不能选择,由此需求的朋友可以参考下,希望对大家有所帮助       ...