在hadoop中,由于一个Task可能由多个节点同时运行,当每个节点完成Task时,一个Task可能会出现多个结果,为了避免这种情况的出现,使用了OutPutCommitter。所以OutPutCommitter主要的功能是在作业或任务完成时,确保结果的正确提交。OutPutCommitter的主要功能是:

1.在作业初始化被调用;例,在初始化Job时,为Job创建临时的输出目录

2.在作业完成时清理后续工作;例,在Job完成后删除临时的输出目录

3.设置任务的临时输出。在Job的临时目录下创建一个side-effect file。

4.检查任务是否需要被提交。如果任务之前结果已经被提交,避免了任务重复提交。

5.提交任务的结果。

6.放弃提交任务。

下面看看OutPutCommitter的源代码

  1 public abstract class OutputCommitter {
2 /**
3 * For the framework to setup the job output during initialization. This is
4 * called from the application master process for the entire job. This will be
5 * called multiple times, once per job attempt.
6 * 在初始化事设置Job的输出。这个方法主要是被整个Job的master调用。它是在每个Job时被调用。
7 * @param jobContext Context of the job whose output is being written.
8 * @throws IOException if temporary output could not be created
9 */
10 public abstract void setupJob(JobContext jobContext) throws IOException;
11
12 /**
13 * For cleaning up the job's output after job completion. This is called
14 * from the application master process for the entire job. This may be called
15 * multiple times.
16 * 在工作完成后清理Job的输出。这个方法主要是被整个Job的master调用。也可能被多次调用。该方法已经不再使用。
17 * 已经被commitJob和commitJob代替。
18 * @param jobContext Context of the job whose output is being written.
19 * @throws IOException
20 * @deprecated Use {@link #commitJob(JobContext)} and
21 * {@link #commitJob(JobContext, JobStatus.State)} instead.
22 */
23 @Deprecated
24 public void cleanupJob(JobContext jobContext) throws IOException { }
25
26 /**
27 * For committing job's output after successful job completion. Note that this
28 * is invoked for jobs with final runstate as SUCCESSFUL. This is called
29 * from the application master process for the entire job. This is guaranteed
30 * to only be called once. If it throws an exception the entire job will
31 * fail.
32 * 当Job成功完成时提交所有Job的输出。这个通过调用Job的最终的状态为SUCCESSFUL,
33 * 该方法仅仅被整个Job的master调用。它仅能被调用一次。
34 * @param jobContext Context of the job whose output is being written.
35 * @throws IOException
36 */
37 public void commitJob(JobContext jobContext) throws IOException {
38 cleanupJob(jobContext);
39 }
40
41
42 /**
43 * For aborting an unsuccessful job's output. Note that this is invoked for
44 * jobs with final runstate as {@link JobStatus.State#FAILED} or
45 * {@link JobStatus.State#KILLED}. This is called from the application
46 * master process for the entire job. This may be called multiple times.
47 * 中止一个不成功作业的输出。该方法需要调用查看Job的最终的运行状态(Failed或Killed),
48 * 该方法也是被Master多次调用。
49 * @param jobContext Context of the job whose output is being written.
50 * @param state final runstate of the job
51 * @throws IOException
52 */
53 public void abortJob(JobContext jobContext, JobStatus.State state)
54 throws IOException {
55 cleanupJob(jobContext);
56 }
57
58 /**
59 * Sets up output for the task. This is called from each individual task's
60 * process that will output to HDFS, and it is called just for that task. This
61 * may be called multiple times for the same task, but for different task
62 * attempts.
63 * 设置任务的输出。每个单一的Task所调用该方法将结果输出到HDFS上,它可以被同一个Task多次调用。
64 * @param taskContext Context of the task whose output is being written.
65 * @throws IOException
66 */
67 public abstract void setupTask(TaskAttemptContext taskContext)
68 throws IOException;
69
70 /**
71 * Check whether task needs a commit. This is called from each individual
72 * task's process that will output to HDFS, and it is called just for that
73 * task.
74 * 检查任务是否需要被提交。
75 * @param taskContext
76 * @return true/false
77 * @throws IOException
78 */
79 public abstract boolean needsTaskCommit(TaskAttemptContext taskContext)
80 throws IOException;
81
82 /**
83 * To promote the task's temporary output to final output location.
84 * If {@link #needsTaskCommit(TaskAttemptContext)} returns true and this
85 * task is the task that the AM determines finished first, this method
86 * is called to commit an individual task's output. This is to mark
87 * that tasks output as complete, as {@link #commitJob(JobContext)} will
88 * also be called later on if the entire job finished successfully. This
89 * is called from a task's process. This may be called multiple times for the
90 * same task, but different task attempts. It should be very rare for this to
91 * be called multiple times and requires odd networking failures to make this
92 * happen. In the future the Hadoop framework may eliminate this race.
93 *
94 * @param taskContext Context of the task whose output is being written.
95 * @throws IOException if commit is not successful.
96 */
97 public abstract void commitTask(TaskAttemptContext taskContext)
98 throws IOException;
99
100 /**
101 * Discard the task output. This is called from a task's process to clean
102 * up a single task's output that can not yet been committed. This may be
103 * called multiple times for the same task, but for different task attempts.
104 * 放弃Task的结果的输出。
105 * @param taskContext
106 * @throws IOException
107 */
108 public abstract void abortTask(TaskAttemptContext taskContext)
109 throws IOException;
110
111 /**
112 * Is task output recovery supported for restarting jobs?
113 *
114 * If task output recovery is supported, job restart can be done more
115 * efficiently.
116 *
117 * @return <code>true</code> if task output recovery is supported,
118 * <code>false</code> otherwise
119 * @see #recoverTask(TaskAttemptContext)
120 */
121 public boolean isRecoverySupported() {
122 return false;
123 }
124
125 /**
126 * Recover the task output.
127 *
128 * The retry-count for the job will be passed via the
129 * {@link MRJobConfig#APPLICATION_ATTEMPT_ID} key in
130 * {@link TaskAttemptContext#getConfiguration()} for the
131 * <code>OutputCommitter</code>. This is called from the application master
132 * process, but it is called individually for each task.
133 *
134 * If an exception is thrown the task will be attempted again.
135 *
136 * This may be called multiple times for the same task. But from different
137 * application attempts.
138 *
139 * @param taskContext Context of the task whose output is being recovered
140 * @throws IOException
141 */
142 public void recoverTask(TaskAttemptContext taskContext)
143 throws IOException
144 {}
145 }

OutputFormat中OutputCommitter解析的更多相关文章

  1. 2016 - 1- 23 iOS中xml解析 (!!!!!!!有坑要解决!!!!!!)

    一: iOS中xml解析的几种方式简介 1.官方原生 NSXMLParser :SAX方式解析,使用起来比较简单 2.第三方框架 libxml2 :纯C 同时支持DOM与SAX GDataXML: D ...

  2. WCF中配置文件解析

    WCF中配置文件解析[1] 2014-06-14 WCF中配置文件解析 参考 WCF中配置文件解析 返回 在WCF Service Configuration Editor的使用中,我们通过配置工具自 ...

  3. Hadoop 中疑问解析

    Hadoop 中疑问解析 FAQ问题剖析 一.HDFS 文件备份与数据安全性分析1 HDFS 原理分析1.1 Hdfs master/slave模型 hdfs采用的是master/slave模型,一个 ...

  4. JAVA方法调用中的解析与分派

    JAVA方法调用中的解析与分派 本文算是<深入理解JVM>的读书笔记,参考书中的相关代码示例,从字节码指令角度看看解析与分派的区别. 方法调用,其实就是要回答一个问题:JVM在执行一个方法 ...

  5. Android中XML解析-Dom解析

    Android中需要解析服务器端传过来的数据,由于XML是与平台无关的特性,被广泛运用于数据通信中,有的时候需要解析xml数据,格式有三种方式,分别是DOM.SAX以及PULL三种方式,本文就简单以D ...

  6. busybox下inittab中runlevel解析

    Order of scripts run in /etc/rc?.d ================================== 0. Overview. All scripts execu ...

  7. Hadoop中Partition解析

    1.解析Partition Map的结果,会通过partition分发到Reducer上,Reducer做完Reduce操作后,通过OutputFormat,进行输出,下面我们就来分析参与这个过程的类 ...

  8. IOS中Json解析的四种方法

    作为一种轻量级的数据交换格式,json正在逐步取代xml,成为网络数据的通用格式. 有的json代码格式比较混乱,可以使用此“http://www.bejson.com/”网站来进行JSON格式化校验 ...

  9. mvc的自带json序列化的datetime在js中的解析

    默认仅序列化后的日期格式是这样的:'/Date(124565787989)/'(数字随便敲的,数字表示相对于1970年的总毫秒数) 在js中借助eval函数,eval函数的意义:将参数中的字符串当作j ...

随机推荐

  1. **【ci框架】PHP的CI框架集成Smarty的最佳方式

    因为CI自带的模板功能不是很方便,所以大家普遍采用集成Smarty的方式来弥补CI这方面的不足. 本人在网上看了不少CI集成Smarty的教程,包括咱们CI论坛里面的一个精华帖子 http://cod ...

  2. 【Linux高频命令专题(4)】sed

    简述 sed是一个很好的文件处理工具,本身是一个管道命令,主要是以行为单位进行处理,可以将数据行进行替换.删除.新增.选取等特定工作,下面先了解一下sed的用法 sed命令行格式为: sed [-ne ...

  3. PCB板的价格是怎么算出来的?

    Part 1 :影响一块PCB板价格的各种因素 PCB的价格是很多采购者一直很困惑的事情,很多人在线下单时也会疑问这些价格是怎么算出来的,下面我们就一起谈论一下PCB价格的组成因素. 1.PCB所用材 ...

  4. 2014--9=17 软工二班 MyEclipse blue==5

    package cn.rwkj.test; import java.io.IOException; import java.io.InputStream; import java.io.OutputS ...

  5. arcengine C# 读写lyr(转)

    写lyr { IFeatureLayer LineLayer = axMapControl1.get_Layer(0) as IFeatureLayer;            ILayerFile ...

  6. C#基础练习(使用委托窗体传值)

    主界面: Form1中的代码: namespace _06委托练习_窗体传值 {     public partial class Form1 : Form     {         public ...

  7. PHP Redis 集群封装类

    <?php /**  * Redis 操作,支持 Master/Slave 的负载集群  *  * @author V哥  */ class RedisCluster{        // 是否 ...

  8. Android开发之“点9”

    “点九”是andriod平台的应用软件开发里的一种特殊的图片形式,文件扩展名为:.9.png智能手机中有自动横屏的功能,同一幅界面会在随着手机(或平板电脑)中的方向传感器的参数不同而改变显示的方向,在 ...

  9. oracle视图总结(转)

    视图简介: 视图是基于一个表或多个表或视图的逻辑表,本身不包含数据,通过它可以对表里面的数据进行查询和修改.视图基于的表称为基表.视图是存储在数据字典里的一条select语句. 通过创建视图可以提取数 ...

  10. Android加速度传感器实现“摇一摇”,带手机振动

    由于代码有点多,所以就分开写了,注释还算详细,方便学习 Activity package com.lmw.android.test;   import android.app.Activity; im ...