hadoop map端的超时参数

目前集群上某台机器卡住导致出现大量的Map端任务FAIL，当定位到具体的机器上时，无法ssh或进去后terminal中无响应，退出的相关信息如下：

[hadoop@xxx ~]$ Received disconnect from xxx: Timeout, your session not responding.

任务执行失败的错误日志：

AttemptID:attempt_1413206225298_24177_m_000001_0 Timed out after 1200 secsContainer killed by the ApplicationMaster. Container killed on request. Exit code is 143

经过查找后1200s是配置的参数mapreduce.task.timeout，

关于参数mapreduce.task.timeout的解释：

The number of milliseconds before a task will be terminated if it neither reads an input, writes an output, nor updates its status string. A value of 0 disables the timeout.

通过翻hadoop2.2.0的源代码，类TaskHeartbeatHandler会作为一个独立的线程来运行。它会定期去检查当前所有运行的TaskAttempt，时间间隔为：mapreduce.task.timeout.check-interval-ms（默认30s），

Map.Entry<TaskAttemptId, ReportTime> entry = iterator.next();

          boolean taskTimedOut = (taskTimeOut > 0) &&

              (currentTime > (entry.getValue().getLastProgress() + taskTimeOut));

          if(taskTimedOut) {

            // task is lost, remove from the list and raise lost event

            iterator.remove();

            eventHandler.handle(new TaskAttemptDiagnosticsUpdateEvent(entry

                .getKey(), "AttemptID:" + entry.getKey().toString()

                + " Timed out after " + taskTimeOut / 1000 + " secs"));

            eventHandler.handle(new TaskAttemptEvent(entry.getKey(),

                TaskAttemptEventType.TA_TIMED_OUT));

          }

如果监测到有一个task_attempt没有在规定的时间间隔内（mapreduce.task.timeout）汇报进度，那么就认为该attempt已经失败，并发送一个TA_TIMED_OUT的Event，通知ApplicationMaster去Kill掉该Attempt。

Attempt的进度会定期报告给该线程，调用progressing方法：

public void progressing(TaskAttemptId attemptID) {

  //only put for the registered attempts

    //TODO throw an exception if the task isn't registered.

    ReportTime time = runningAttempts.get(attemptID);

    if(time != null) {

      time.setLastProgress(clock.getTime());

    }

  }

在TaskAttemptListenerImpl类中会调用报告进度的方法，在任务的不同阶段，都会对任务向ApplicationMaster报告，提交进度信息。更详细的方法这里就不再深入研究。

一般情况下，我们的任务都是在运行过程中出现的这个错误，这就需要我们检查哪些资源的限制导致任务无法进行下去而出现这种问题。

在Cloudera中有一篇文章教你如何能够避免这个问题：

http://blog.cloudera.com/blog/2009/05/10-mapreduce-tips/

Report progress

If your task reports no progress for 10 minutes (see the mapred.task.timeout property) then it will be killed by Hadoop. Most tasks don’t encounter this situation since they report progress implicitly by reading input and writing output. However, some jobs which don’t process records in this way may fall foul of this behavior and have their tasks killed. Simulations are a good example, since they do a lot of CPU-intensive processing in each map and typically only write the result at the end of the computation. They should be written in such a way as to report progress on a regular basis (more frequently than every 10 minutes). This may be achieved in a number of ways:

Call setStatus() on Reporter to set a human-readable description of
the task’s progress
Call incrCounter() on Reporter to increment a user counter
Call progress() on Reporter to tell Hadoop that your task is still there (and making progress)

但是，事情还没完，集群中会不定时地有任务卡死在某个点上导致任务无法继续下去：

"main" prio=10 tid=0x000000000293f000 nid=0x1e06 runnable [0x0000000041b20000]

   java.lang.Thread.State: RUNNABLE

at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)

at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:228)

at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:81)

at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)

- locked <0x00000006e243c3f0> (a sun.nio.ch.Util$2)

- locked <0x00000006e243c3e0> (a java.util.Collections$UnmodifiableSet)

- locked <0x00000006e243c1a0> (a sun.nio.ch.EPollSelectorImpl)

at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)

at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)

at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)

at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

读源码分析这个问题，找到SocketIOWithTimeout类中的doIO方法，157行附近，

/now wait for socket to be ready.

      int count = 0;

      try {

        count = selector.select(channel, ops, timeout);

      } catch (IOException e) { //unexpected IOException.

        closed = true;

        throw e;

      }

      if (count == 0) {

        throw new SocketTimeoutException(timeoutExceptionString(channel,

                                                                timeout, ops));

      }

当经过超时时间，但是却并没有读出任何数据时，抛出错误：

Error: java.net.SocketTimeoutException: 70000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=xxx remote=/xxx]

    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)

    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)

    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)

    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)

    at java.io.FilterInputStream.read(FilterInputStream.java:83)

    at java.io.FilterInputStream.read(FilterInputStream.java:83)

    at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:1490)

    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:962)

    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:930)

    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1031)

    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:823)

    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:475)

超时时间通过(dfs.client.socket-timeout)来计算，如果在该时间范围内，没有读到任何的数据，那么就抛出这个异常。

进入SocketIOTimeout.select方法，发现其中会执行一段轮询：

while (true) {

          long start = (timeout == 0) ? 0 : Time.now();

          key = channel.register(info.selector, ops);

          ret = info.selector.select(timeout);

          if (ret != 0) {

            return ret;

          }

          /* Sometimes select() returns 0 much before timeout for

           * unknown reasons. So select again if required.

           */

          if (timeout > 0) {

            timeout -= Time.now() - start;

            if (timeout <= 0) {

              return 0;

            }

          }

          if (Thread.currentThread().isInterrupted()) {

            throw new InterruptedIOException("Interruped while waiting for " +

                                             "IO on channel " + channel +

                                             ". " + timeout +

                                             " millis timeout left.");

          }

        }

此时由于是读数据，ops一般就是指SelectionKey.OP_READ，我们设置的timeout不等于0，也就是说会执行一段总时间为timeout的任务，”Sometimes select() returns 0 much before timeout for * unknown reasons. So select again if required.” 这个注释写的有点含糊，看来NIO有些问题当前都没确定清楚。

我们看一下方法的介绍：

java.nio.channels.Selector

public abstract int select(long timeout)

                   throws java.io.IOException

Selects a set of keys whose corresponding channels are ready for I/O operations.

This method performs a blocking selection operation. It returns only after at least one channel is selected, this selector's wakeup method is invoked, the current thread is interrupted, or the given timeout period expires, whichever comes first.

Selector选择的方法，仅当下面三个事件之一发生的情况下：

至少一个已经注册的Channel被选择，返回的就是被选择的Channel数量；
Selector被中断；
给定的超时时间已到；

如果被中断了，会抛出中断异常，因此当前仅可能是超时时间已到，返回的ret=0，导致抛出上述的异常。

但是，这也没完，难道超时了不会重试？到底会重试几次？

经过继续分析，发现往下的堆栈中的DFSInputStream调用了readBuffer方法，可以看到retryCurrentNode在第一次失败后，将IOException捕获，会进行必要的重试操作，如果还是发生超时，并且找不到就将其加入黑名单作为失败的DataNode（可能下次不会进行重试？），并转移到另外的DataNode上（执行seekToNewSource方法），经过几次后才会将IOException真正抛出。

try {

        return reader.doRead(blockReader, off, len, readStatistics);

      } catch ( ChecksumException ce ) {

        DFSClient.LOG.warn("Found Checksum error for "

            + getCurrentBlock() + " from " + currentNode

            + " at " + ce.getPos());

        ioe = ce;

        retryCurrentNode = false;

        // we want to remember which block replicas we have tried

        addIntoCorruptedBlockMap(getCurrentBlock(), currentNode,

            corruptedBlockMap);

      } catch ( IOException e ) {

        if (!retryCurrentNode) {

          DFSClient.LOG.warn("Exception while reading from "

              + getCurrentBlock() + " of " + src + " from "

              + currentNode, e);

        }

        ioe = e;

      }

      boolean sourceFound = false;

      if (retryCurrentNode) {

        /* possibly retry the same node so that transient errors don't

         * result in application level failures (e.g. Datanode could have

         * closed the connection because the client is idle for too long).

         */

        sourceFound = seekToBlockSource(pos);

      } else {

        addToDeadNodes(currentNode);

        sourceFound = seekToNewSource(pos);

      }

      if (!sourceFound) {

        throw ioe;

      }

      retryCurrentNode = false;

    }

总之，这部分的问题还是很多，继续研究中...

hadoop map端的超时参数的更多相关文章

hadoop的压缩解压缩,reduce端join,map端join
hadoop的压缩解压缩 hadoop对于常见的几种压缩算法对于我们的mapreduce都是内置支持,不需要我们关心.经过map之后,数据会产生输出经过shuffle,这个时候的shuffle过程特别 ...
Hadoop on Mac with IntelliJ IDEA - 10 陆喜恒. Hadoop实战（第2版）6.4.1（Shuffle和排序）Map端内容整理
下午对着源码看陆喜恒. Hadoop实战(第2版)6.4.1 (Shuffle和排序)Map端,发现与Hadoop 1.2.1的源码有些出入.下面作个简单的记录,方便起见,引用自书本的语句都用斜体表 ...
Hadoop基础-Map端链式编程之MapReduce统计TopN示例
Hadoop基础-Map端链式编程之MapReduce统计TopN示例作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.项目需求对“temp.txt”中的数据进行分析,统计出各 ...
hadoop编程小技巧（1）---map端聚合
測试hadoop版本号:2.4 Map端聚合的应用场景:当我们仅仅关心全部数据中的部分数据时,而且数据能够放入内存中. 使用的优点:能够大大减小网络数据的传输量,提高效率: 一般编程思路:在Mapp ...
hadoop mapreduce 端参数优化
在MapReduce执行过程中,特别是Shuffle阶段,尽量使用内存缓冲区存储数据,减少磁盘溢写次数:同时在作业执行过程中增加并行度,都能够显著提高系统性能,这也是配置优化的一个重要依据. 下面分别 ...
(转)hadoop三个配置文件的参数含义说明
hadoop三个配置文件的参数含义说明 1 获取默认配置配置hadoop,主要是配置core-site.xml,hdfs-site.xml,mapred-site.xml三个配 ...
我对Map端spill的理解
一.先看简单理解对于hadoop的map端配置项"mapreduce.task.io.sort.mb"和"mapreduce.map.sort.spill.percen ...
如何确定 Hadoop map和reduce的个数--map和reduce数量之间的关系是什么？
1.map和reduce的数量过多会导致什么情况?2.Reduce可以通过什么设置来增加任务个数?3.一个task的map数量由谁来决定?4.一个task的reduce数量由谁来决定? 一般情况下,在 ...
Hadoop Map/Reduce教程
原文地址:http://hadoop.apache.org/docs/r1.0.4/cn/mapred_tutorial.html 目的先决条件概述输入与输出例子:WordCount v1.0 ...

随机推荐

WIN7不能上网
http://zhidao.baidu.com/link?url=lYL0Sti_nX3JDz3pA3cVh49nyYDEQBJ6P5fxwB4La0FurHlgmWGMdgfMGjQSWxj17sH ...
[Scala]Scala学习笔记六文件
1. 读取行读取文件,可以使用scala.io.Source对象的fromFile方法．如果读取所有行可以使用getLines方法: val source = Source.fromFile(&qu ...
python3精简笔记(二)——函数
函数下面的地址可以查看函数: https://docs.python.org/3/library/functions.html 也可以在交互式命令行通过help()查看函数的帮助信息. 如: > ...
开源一款ftp软件——filezilla
filezilla是一款高性能ftp/sftp文件工具,关于它的具体的介绍可参见其官网:https://www.filezilla.cn/.其原作者是Tim Kosse (tim.kosse@file ...
[Linux] 虚拟环境的配置和使用 virtualenv
1.安装 sudo apt-get install python-virtualenv 2.使用创建虚拟环境: virtualenv [虚拟环境名称] 例如: virtualenv env_test ...
[Python]定时任务框架 APScheduler
1.使用APScheduler教程参考博客地址
GCD基础
一.GCD介绍 1.what is GCD? Grand Central Dispatch 中枢调度器.用很简单的方式实现了极为复杂繁琐的多线程编程.异步执行任务的技术之一. 2.GCD存在于li ...
HihoCoder 1063 : 缩地树形DP第二题（对象边）
时间限制:12000ms 单点时限:1000ms 内存限制:256MB 描述编织者是 Dota 系列中的一个伪核,拥有很强的生存能力和线上消耗能力.编织者的代表性技能是缩地.缩地带来的隐身.极限移动 ...
Java并发--并发容器之ConcurrentHashMap
下面这部分内容转载自: http://www.haogongju.net/art/2350374 JDK5中添加了新的concurrent包,相对同步容器而言,并发容器通过一些机制改进了并发性能.因为 ...
BZOJ2330 SCOI2011 糖果【差分约束】
BZOJ2330 SCOI2011 糖果 Description 幼儿园里有N个小朋友,lxhgww老师现在想要给这些小朋友们分配糖果,要求每个小朋友都要分到糖果.但是小朋友们也有嫉妒心,总是会提出一 ...

hadoop map端的超时参数

Report progress

hadoop map端的超时参数的更多相关文章

随机推荐

热门专题