现象: 执行mapreduce任务时失败

通过hadoop fsck -openforwrite命令查看发现有文件没有关闭。

[root@com ~]# hadoop fsck -openforwrite /data/rc/click/mpp/15-08-05/
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Connecting to namenode via http://com.hunantv.hadoopnamenode:50070
FSCK started by root (auth:SIMPLE) from /10.100.1.46 for path /data/rc/click/mpp/15-08-05/ at Thu Aug 06 14:05:03 CST 2015
....................................................................................................
....................................................................................................
........./data/rc/click/mpp/15-08-05/FlumeData.1438758322864 42888 bytes, 1 block(s), OPENFORWRITE:
/data/rc/click/mpp/15-08-05/FlumeData.1438758322864: Under replicated BP-1672356070-10.100.1.36-1412072991411:blk_1120646538_47162789{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[[DISK]DS-f4fff5f3-f3fd-4054-a75c-1d7da53a73af:NORMAL|FINALIZED], ReplicaUnderConstruction[[DISK]DS-26f54bc5-5026-4e6a-94ec-8435224e4aa9:NORMAL|RWR], ReplicaUnderConstruction[[DISK]DS-4ab3fffc-6468-47df-8023-79f23a330371:NORMAL|FINALIZED]]}. Target Replicas is 3 but found 2 replica(s).
..........................................................................................
............................Status: HEALTHY
Total size: 99186583 B
Total dirs: 1
Total files: 328
Total symlinks: 0
Total blocks (validated): 328 (avg. block size 302398 B)
Minimally replicated blocks: 328 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 1 (0.30487806 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 3
Average block replication: 2.996951
Corrupt blocks: 0
Missing replicas: 1 (0.101626016 %)
Number of data-nodes: 59
Number of racks: 6
FSCK ended at Thu Aug 06 14:05:03 CST 2015 in 36 milliseconds

The filesystem under path '/data/rc/click/mpp/15-08-05/' is HEALTHY

查看FLume日志

[root@10.100.1.117] out: 05 Aug 2015 11:15:19,322 INFO [SinkRunner-PollingRunner-DefaultSinkProcessor] (org.apache.flume.sink.hdfs.BucketWriter.open:234) - Creating hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp
[root@10.100.1.117] out: 05 Aug 2015 11:16:20,493 INFO [hdfs-sin_hdfs_201-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter$5.call:429) - Closing idle bucketWriter hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp at 1438744580493
[root@10.100.1.117] out: 05 Aug 2015 11:16:20,497 INFO [hdfs-sin_hdfs_201-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:363) - Closing hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp
[root@10.100.1.117] out: 05 Aug 2015 11:16:30,501 WARN [hdfs-sin_hdfs_201-roll-timer-0] (org.apache.flume.sink.hdfs.BucketWriter.close:370) - failed to close() HDFSWriter for file (hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp). Exception follows.
[root@10.100.1.117] out: java.io.IOException: Callable timed out after 10000 ms on file: hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp
[root@10.100.1.117] out: 05 Aug 2015 11:16:30,503 INFO [hdfs-sin_hdfs_201-call-runner-7] (org.apache.flume.sink.hdfs.BucketWriter$8.call:629) - Renaming hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293.tmp to hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-08-05/FlumeData.1438744519293

关闭hdfs文件操作因为超时失败,

查看源码:

 public synchronized void close(boolean callCloseCallback)
throws IOException, InterruptedException {
checkAndThrowInterruptedException();
try {
flush();
} catch (IOException e) {
LOG.warn("pre-close flush failed", e);
}
boolean failedToClose = false;
LOG.info("Closing {}", bucketPath);
CallRunner<Void> closeCallRunner = createCloseCallRunner();
if (isOpen) {
try {
callWithTimeout(closeCallRunner);
sinkCounter.incrementConnectionClosedCount();
} catch (IOException e) {
LOG.warn(
"failed to close() HDFSWriter for file (" + bucketPath +
"). Exception follows.", e);
sinkCounter.incrementConnectionFailedCount();
failedToClose = true;
}
isOpen = false;
} else {
LOG.info("HDFSWriter is already closed: {}", bucketPath);
} // NOTE: timed rolls go through this codepath as well as other roll types
if (timedRollFuture != null && !timedRollFuture.isDone()) {
timedRollFuture.cancel(false); // do not cancel myself if running!
timedRollFuture = null;
} if (idleFuture != null && !idleFuture.isDone()) {
idleFuture.cancel(false); // do not cancel myself if running!
idleFuture = null;
} if (bucketPath != null && fileSystem != null) {
// could block or throw IOException
try {
renameBucket(bucketPath, targetPath, fileSystem);
} catch(Exception e) {
LOG.warn(
"failed to rename() file (" + bucketPath +
"). Exception follows.", e);
sinkCounter.incrementConnectionFailedCount();
final Callable<Void> scheduledRename =
createScheduledRenameCallable();
timedRollerPool.schedule(scheduledRename, retryInterval,
TimeUnit.SECONDS);
}
}
if (callCloseCallback) {
runCloseAction();
closed = true;
}
}

默认超时为10000ms,失败后没有重试,代码中有 failedToClose变量, 但未用到,可能开发人员忘了处理了。。。

解决方法:

1. 配置调用操作超时时间,将其调大一点,如5分钟。Flume hdfs sink配置如下:

agent12.sinks.sin_hdfs_201.type=hdfs
agent12.sinks.sin_hdfs_201.channel=ch_hdfs_201
agent12.sinks.sin_hdfs_201.hdfs.path=hdfs://com.hunantv.hadoopnamenode:8020/data/logs/amobile/vod/15-%{month}-%{day}
agent12.sinks.sin_hdfs_201.hdfs.round=true
agent12.sinks.sin_hdfs_201.hdfs.roundValue=10
agent12.sinks.sin_hdfs_201.hdfs.roundUnit=minute
agent12.sinks.sin_hdfs_201.hdfs.fileType=DataStream
agent12.sinks.sin_hdfs_201.hdfs.writeFormat=Text
agent12.sinks.sin_hdfs_201.hdfs.rollInterval=0
agent12.sinks.sin_hdfs_201.hdfs.rollSize=209715200
agent12.sinks.sin_hdfs_201.hdfs.rollCount=0
agent12.sinks.sin_hdfs_201.hdfs.idleTimeout=300
agent12.sinks.sin_hdfs_201.hdfs.batchSize=100
agent12.sinks.sin_hdfs_201.hdfs.minBlockReplicas=1
agent12.sinks.sin_hdfs_201.hdfs.callTimeout=300000

  

2. 修改源码,增加重试。如下:

 public synchronized void close(boolean callCloseCallback)
throws IOException, InterruptedException {
checkAndThrowInterruptedException();
try {
flush();
} catch (IOException e) {
LOG.warn("pre-close flush failed", e);
}
boolean failedToClose = false;
LOG.info("Closing {}", bucketPath);
CallRunner<Void> closeCallRunner = createCloseCallRunner();
int tryTime = 1;
while (isOpen && tryTime <= 5) {
try {
callWithTimeout(closeCallRunner);
sinkCounter.incrementConnectionClosedCount();
} catch (IOException e) {
LOG.warn(
"failed to close() HDFSWriter for file (try times:" + tryTime + "): " + bucketPath +
". Exception follows.", e);
sinkCounter.incrementConnectionFailedCount();
failedToClose = true;
}
if (failedToClose) {
isOpen = true;
tryTime++;
Thread.sleep(this.callTimeout);
} else {
isOpen = false;
}
}
//如果isopen失敗
if (isOpen) {
LOG.error("failed to close file: " + bucketPath + " after " + tryTime + " tries.");
} else {
LOG.info("HDFSWriter is already closed: {}", bucketPath);
} // NOTE: timed rolls go through this codepath as well as other roll types
if (timedRollFuture != null && !timedRollFuture.isDone()) {
timedRollFuture.cancel(false); // do not cancel myself if running!
timedRollFuture = null;
} if (idleFuture != null && !idleFuture.isDone()) {
idleFuture.cancel(false); // do not cancel myself if running!
idleFuture = null;
} if (bucketPath != null && fileSystem != null) {
// could block or throw IOException
try {
renameBucket(bucketPath, targetPath, fileSystem);
} catch (Exception e) {
LOG.warn(
"failed to rename() file (" + bucketPath +
"). Exception follows.", e);
sinkCounter.incrementConnectionFailedCount();
final Callable<Void> scheduledRename =
createScheduledRenameCallable();
timedRollerPool.schedule(scheduledRename, retryInterval,
TimeUnit.SECONDS);
}
}
if (callCloseCallback) {
runCloseAction();
closed = true;
}
}

  

												

[bigdata] 使用Flume hdfs sink, hdfs文件未关闭的问题的更多相关文章

  1. HDFS Sink使用技巧

    1.文件滚动策略 在HDFS Sink的文件滚动就是文件生成,即关闭当前文件,创建新文件.它的滚动策略由以下几个属性控制: hdfs.rollInterval 基于时间间隔来进行文件滚动,默认是30, ...

  2. Flume采集处理日志文件

    Flume简介 Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据:同时,Flume提供对数据 ...

  3. flume中sink到hdfs,文件系统频繁产生文件,文件滚动配置不起作用?

    在测试hdfs的sink,发现sink端的文件滚动配置项起不到任何作用,配置如下: a1.sinks.k1.type=hdfs a1.sinks.k1.channel=c1 a1.sinks.k1.h ...

  4. flume中sink到hdfs,文件系统频繁产生文件和出现乱码,文件滚动配置不起作用?

    问题描述  解决办法 先把这个hdfs目录下的数据删除.并修改配置文件flume-conf.properties,重新采集. # Licensed to the Apache Software Fou ...

  5. Flume中的HDFS Sink配置参数说明【转】

    转:http://lxw1234.com/archives/2015/10/527.htm 关键字:flume.hdfs.sink.配置参数 Flume中的HDFS Sink应该是非常常用的,其中的配 ...

  6. Flume启动报错[ERROR - org.apache.flume.sink.hdfs. Hit max consecutive under-replication rotations (30); will not continue rolling files under this path due to under-replication解决办法(图文详解)

    前期博客 Flume自定义拦截器(Interceptors)或自带拦截器时的一些经验技巧总结(图文详解)   问题详情 -- ::, (SinkRunner-PollingRunner-Default ...

  7. shell脚本监控Flume输出到HDFS上文件合法性

    在使用flume中发现由于网络.HDFS等其它原因,使得经过Flume收集到HDFS上得日志有一些异常,表现为: 1.有未关闭的文件:以tmp(默认)结尾的文件.加入存到HDFS上得文件应该是gz压缩 ...

  8. [ETL] Flume 理论与demo(Taildir Source & Hdfs Sink)

    一.Flume简介 1. Flume概述 Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集.聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据: ...

  9. Flume采集目录及文件到HDFS案例

    采集目录到HDFS 使用flume采集目录需要启动hdfs集群 vi spool-hdfs.conf # Name the components on this agent a1.sources = ...

随机推荐

  1. W3School-CSS 列表实例

    CSS 列表实例 CSS 实例 CSS 背景实例 CSS 文本实例 CSS 字体(font)实例 CSS 边框(border)实例 CSS 外边距 (margin) 实例 CSS 内边距 (paddi ...

  2. Linux Process VS Thread VS LWP

    Process program program==code+data; 一个进程可以对应多个程序,一个程序也可以变成多个进程.程序可以作为一种软件资源长期保存,以文件的形式存放在硬盘 process: ...

  3. 通过例子学习 Keystone - 每天5分钟玩转 OpenStack(19)

    上一节介绍了 Keystone 的核心概念.本节我们通过“查询可用 image”这个实际操作让大家对这些概念建立更加感性的认识. User admin 要查看 Project 中的 image 第 1 ...

  4. logback配置详解1

    一:根节点<configuration>包含的属性: scan: 当此属性设置为true时,配置文件如果发生改变,将会被重新加载,默认值为true. scanPeriod: 设置监测配置文 ...

  5. FineReport中如何进行Informix数据库连接

    报表开发工具Finereport中,对于Informix数据库,定义数据连接处进行如下配置: 数据库:Others 驱动器:com.informix.jdbc.IfxDriver URL: jdbc: ...

  6. Java:类与继承

    Java:类与继承 对于面向对象的程序设计语言来说,类毫无疑问是其最重要的基础.抽象.封装.继承.多态这四大特性都离不开类,只有存在类,才能体现面向对象编程的特点,今天我们就来了解一些类与继承的相关知 ...

  7. selenium对Alert弹框的多种处理

    Alert弹框是一个很烦人的控件,因为当前页面如果弹出了该弹框,你必须要处理它,不然你就不能操作页面的其它元素,下面我列出了alert弹框在多种场景下的处理办法. 明确知道系统哪个地方会弹alert ...

  8. Something Wrong or Something Right

    其实,你还可以和高中一样 其实,你还可以和高中一样,每天不情愿的早早起床,走在冬天漆黑的清晨里.食堂还没有开门,你就去商店买面包和牛奶,接着快步走进教学楼,轻声咒骂一声老师要求的时间太早,然后打开一本 ...

  9. Django 1.9 支持中文(转)

    昨天Django1.9发布了,今天我才刚开始学习Django,然后有一个问题就卡住了——如何支持中文?上网上查了好多资料都不好使,最后我搜索Django文件夹才发现,在1.9版本里,简体中文代码是zh ...

  10. BZOJ 2120 数颜色&2453 维护队列 [带修改的莫队算法]【学习笔记】

    2120: 数颜色 Time Limit: 6 Sec  Memory Limit: 259 MBSubmit: 3665  Solved: 1422[Submit][Status][Discuss] ...