来自:http://blog.csdn.net/macyang/article/details/7880671

所谓的推测执行,就是当所有task都开始运行之后,Job Tracker会统计所有任务的平均进度,如果某个task所在的task node机器配置比较低或者CPU load很高(原因很多),导致任务执行比总体任务的平均执行要慢,此时Job Tracker会启动一个新的任务(duplicate task),原有任务和新任务(一个task会有多个attempt同时执行)哪个先执行完就把另外一个kill掉,这也是我们经常在Job Tracker页面看到任务执行成功,但是总有些任务被kill,就是这个原因。另外,根据mapreduce job的特点,同一个task执行多次的结果是一样的,所以task只要有一次执行成功,job就是成功的,被kill的task对job的结果没有影响。


配置参数:

mapred.map.tasks.speculative.execution=true

mapred.reduce.tasks.speculative.execution=true

这两个是推测执行的配置项,当然如果你从来不关心这两个选项也没关系,它们默认值是true

而Hadoop 会根据task progress score决定是否killed一个task:

For a map, the progress score is the fraction of input data read.

For a reduce task, the execution is divided into three phases, each of which accounts for 1/3 of the score:
• The copy phase, when the task fetches map outputs.
• The sort phase, when map outputs are sorted by key.
• The reduce phase, when a user-defined function is applied to the list of map outputs with each key.
In each phase, the score is the fraction of data processed.
For example,
• a task halfway through the copy phase has a progress score of 1 / 2 * 1 / 3 = 1 / 6
• a task halfway through the reduce phase has a progress score of 1 / 3 + 1 / 3 + 1 / 2 * 1 / 3 = 5 / 6

Hadoop looks at the
average progress score of each category of tasks (maps and reduces) to
define a threshold for speculative execution. When a task’s progress
score is less than the average for its category by
a threshold, and the task has run for a certain amount of time, it is
considered slow. The scheduler also ensures that at most one speculative
copy of each task is running at a time. When running multiple jobs,
Hadoop uses a FIFO discipline where the earliest
submitted job is asked for a task to run, then the second, etc. There
is also a priority system for putting jobs into higher-priority queues.

(来源:http://adhoop.wordpress.com/2012/02/24/speculative-execution-in-hadoop/

Speculative Execution in Hadoop的更多相关文章

  1. Method and apparatus for speculative execution of uncontended lock instructions

    A method and apparatus for executing lock instructions speculatively in an out-of-order processor ar ...

  2. Hadoop就业面试题

    ----------------------------------------------------------------------------- [申明:资料来源于互联网] 本文链接:htt ...

  3. Hadoop(4)MapReduce 任务的推测(speculative)执行

    Straggle(掉队者)是指那些跑的很慢但最终会成功完成的任务.一个掉队的Map任务会阻止Reduce任务开始执行. Hadoop不能自动纠正掉队任务,但是可以识别那些跑的比较慢的任务,然后它会产生 ...

  4. [Hadoop in Action] 第6章 编程实践

    Hadoop程序开发的独门绝技 在本地,伪分布和全分布模式下调试程序 程序输出的完整性检查和回归测试 日志和监控 性能调优   1.开发MapReduce程序   [本地模式]        本地模式 ...

  5. Hadoop笔记HDFS(2)

    高级Hadoop MapReduce管理 1 调试部署好的Hadoop的配置 2 运行基准测试检验Hadoop的安装 3 重新利用JVM提升性能 4 容错性 5 调试脚本-分析失败任务原因 6 设置失 ...

  6. Hadoop随笔(一):工作流程的源码

    一.几个可能会用到的属性值 1.mapred.map.tasks.speculative.execution和mapred.reduce.tasks.speculative.execution 这两个 ...

  7. hadoop MapReduce 笔记

    1.        MapReduce程序开发步骤 编写map 和 reduce 程序–> 单元测试 -> 编写驱动程序进行验证-> 本地数据集调试 ->  部署到集群运行 用 ...

  8. 从wordcount 开始 mapreduce (C++\hadoop streaming模式)

    序:终于开始接触hadoop了,从wordcount开始 1. 采用hadoop streamming模式 优点:支持C++ pathon shell 等多种语言,学习成本较低,不需要了解hadoop ...

  9. hadoop可能遇到的问题

    1.hadoop运行的原理? 2.mapreduce的原理? 3.HDFS存储的机制? 4.举一个简单的例子说明mapreduce是怎么来运行的 ? 5.面试的人给你出一些问题,让你用mapreduc ...

随机推荐

  1. Any way to start Google Chrome in headless mode?

    Any way to start Google Chrome in headless mode? - Stack Overflow Any way to start Google Chrome in ...

  2. How to exit the entire application from a Python thread?

    If all your threads except the main ones are daemons, the best approach is generally thread.interrup ...

  3. Latest SQLite binary for January 2015

    Latest SQLite binary for January 2015 Well I went through quite a few threads to find an updated, de ...

  4. ORA-00918:未明确定义列

    <script type="text/javascript"><!-- google_ad_client = "pub-9528830580198364 ...

  5. 33.NET对加密和解密的支持

      散列运算 mscorlib.dll下的System.Security.Cryptography下: 抽象类HashAlgorithm     抽象类MD5         MD5CryptoSer ...

  6. [翻译] AsyncImageView 异步下载图片

    AsyncImageView  https://github.com/nicklockwood/AsyncImageView AsyncImageView is a simple extension ...

  7. 程序员眼中的RSA算法

    RSA算法是数学应用于实际的一项伟大发明,起数学过程相对而言还是比较专业的,有兴趣可以看看. RSA算法的证明过程,详见:http://www.ruanyifeng.com/blog/2013/06/ ...

  8. django的单元测试框架unittest、覆盖率

    django的单元测试 指定测试范围: 指定运行某些测试文件./manage.py test --pattern="tests_*.py" -v 2 运行所有测试文件./manag ...

  9. poj 2284 That Nice Euler Circuit 解题报告

    That Nice Euler Circuit Time Limit: 3000MS   Memory Limit: 65536K Total Submissions: 1975   Accepted ...

  10. 混沌数学之Standard模型

    相关软件混沌数学之离散点集图形DEMO 相关代码: class StandardEquation : public DiscreteEquation { public: StandardEquatio ...