A Hadoop job may consist of many map tasks and reduce tasks. Therefore, debugging a
Hadoop job is often a complicated process. It is a good practice to first test a Hadoop job
using unit tests by running it with a subset of the data.
However, sometimes it is necessary to debug a Hadoop job in a distributed mode. To support
such cases, Hadoop provides a mechanism called debug scripts. This recipe explains how to
use debug scripts.

A debug script is a shell script, and Hadoop executes the script whenever a task encounters
an error. The script will have access to the $script, $stdout, $stderr, $syslog, and
$jobconfproperties, as environment variables populated by Hadoop. You can find a
sample script from resources/chapter3/debugscript. We can use the debug scripts
to copy all the logfiles to a single location, e-mail them to a single e-mail account, or perform
some analysis.
LOG_FILE=HADOOP_HOME/error.log
echo "Run the script" >> $LOG_FILE
echo $script >> $LOG_FILE
echo $stdout>> $LOG_FILE
echo $stderr>> $LOG_FILE
echo $syslog >> $LOG_FILE
echo $jobconf>> $LOG_FILE

when you execute this, you should pay attention to the execute path, or else it will not found debug script.

package chapter3;

import java.net.URI;

import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; public class WordcountWithDebugScript {
private static final String scriptFileLocation = "resources/chapter3/debugscript";
private static final String HDFS_ROOT = "/debug"; public static void setupFailedTaskScript(JobConf conf) throws Exception { // create a directory on HDFS where we'll upload the fail scripts
FileSystem fs = FileSystem.get(conf);
// Path debugDir = new Path("/debug");
Path debugDir = new Path(HDFS_ROOT); // who knows what's already in this directory; let's just clear it.
if (fs.exists(debugDir)) {
fs.delete(debugDir, true);
} // ...and then make sure it exists again
fs.mkdirs(debugDir); // upload the local scripts into HDFS
fs.copyFromLocalFile(new Path(scriptFileLocation), new Path(HDFS_ROOT
+ "/fail-script")); FileStatus[] list = fs.listStatus(new Path(HDFS_ROOT));
if (list == null || list.length == 0) {
System.out.println("No File found");
} else {
for (FileStatus f : list) {
System.out.println("File found " + f.getPath());
}
} conf.setMapDebugScript("./fail-script");
conf.setReduceDebugScript("./fail-script");
// this create a simlink from the job directory to cache directory of
// the mapper node
DistributedCache.createSymlink(conf); URI fsUri = fs.getUri(); String mapUriStr = fsUri.toString() + HDFS_ROOT
+ "/fail-script#fail-script";
System.out.println("added " + mapUriStr + "to distributed cache 1");
URI mapUri = new URI(mapUriStr);
// Following copy the map uri to the cache directory of the job node
DistributedCache.addCacheFile(mapUri, conf);
} public static void main(String[] args) throws Exception {
JobConf conf = new JobConf();
setupFailedTaskScript(conf);
Job job = new Job(conf, "word count"); job.setJarByClass(FaultyWordCount.class);
job.setMapperClass(FaultyWordCount.TokenizerMapper.class);
job.setReducerClass(FaultyWordCount.IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileSystem.get(conf).delete(new Path(args[1]), true);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
} }

digest from mapreduce cookbook

hadoop debug script的更多相关文章

  1. Hadoop官方文档翻译——MapReduce Tutorial

    MapReduce Tutorial(个人指导) Purpose(目的) Prerequisites(必备条件) Overview(综述) Inputs and Outputs(输入输出) MapRe ...

  2. hadoop mapreduce核心功能描述

    核心功能描述 应用程序通常会通过提供map和reduce来实现 Mapper和Reducer接口,它们组成作业的核心. Mapper Mapper将输入键值对(key/value pair)映射到一组 ...

  3. hadoop之 hadoop 2.2.X 弃用的配置属性名称及其替换名称对照表

    Deprecated Properties  弃用属性 The following table lists the configuration property names that are depr ...

  4. Hadoop专业解决方案-第5章 开发可靠的MapReduce应用

    本章主要内容: 1.利用MRUnit创建MapReduce的单元测试. 2.MapReduce应用的本地实例. 3.理解MapReduce的调试. 4.利用MapReduce防御式程序设计. 在WOX ...

  5. Hadoop Map/Reduce教程

    原文地址:http://hadoop.apache.org/docs/r1.0.4/cn/mapred_tutorial.html 目的 先决条件 概述 输入与输出 例子:WordCount v1.0 ...

  6. 一步一步跟我学习hadoop(5)----hadoop Map/Reduce教程(2)

    Map/Reduce用户界面 本节为用户採用框架要面对的各个环节提供了具体的描写叙述,旨在与帮助用户对实现.配置和调优进行具体的设置.然而,开发时候还是要相应着API进行相关操作. 首先我们须要了解M ...

  7. VS2015/2013/2012 IIS Express Debug Classic ASP

    参考资料: https://msdn.microsoft.com/en-us/library/ms241740(v=vs.100).aspx When you attach to an ASP Web ...

  8. cdh版本的hue安装配置部署以及集成hadoop hbase hive mysql等权威指南

    hue下载地址:https://github.com/cloudera/hue hue学习文档地址:http://archive.cloudera.com/cdh5/cdh/5/hue-3.7.0-c ...

  9. Hadoop出现错误:WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable,解决方案

    安装Hadoop的时候直接用的bin版本,根据教程安装好之后运行的时候发现出现了:WARN util.NativeCodeLoader: Unable to load native-hadoop li ...

随机推荐

  1. 单例(C#版)

    单例: 一个类只有一个实例.巧妙利用了编程语言的一些语法规则:构造函数private, 然后提供一个public的方法返回类的一个实例:又方法和返回的类的实例都是static类型,所以只能被类所拥有, ...

  2. 你可以做一个更好的Coder为了自己的将来

    小小的星辰 工作已经一年多了,时间真的好快啊!发现自己还是终于走出了当初的阴影!我可以快乐的做我自己了.这两年发现自己改变了很多!很庆幸,我可以不想你了!伤感的心情总会过去的.还记得曾经说过一句:“离 ...

  3. Web基础开发最核心要解决的问题

    Web基础开发要解决的问题,往往也就是那些框架出现的目的 - 要解决问题. 1. 便捷的Db操作: 2. 高效的表单处理: 3. 灵活的Url路由: 4. 合理的代码组织结构: 5. 架构延伸 缓存. ...

  4. 【洛谷 p3382】模板-三分法(算法效率)

    题目:给出一个N次函数,保证在范围[l,r]内存在一点x,使得[l,x]上单调增,[x,r]上单调减.试求出x的值. 解法:与二分法枚举中点使区间分成2份不一样,三分法是枚举三分点,再根据题目的情况修 ...

  5. Web前端小白入门指迷

    前注:这篇文章首发于我自己创办的服务于校园的技术分享 [西邮 Upper -- 004]Web前端小白入门指迷,写得很用心也就发在这里. 大前端之旅 大前端有很多种,Shell 前端,客户端前端,Ap ...

  6. 个人开源作品,即时通讯App支持文本、语音、图片聊天

    开源一个即时通讯类App,支持纯文本.语音.地理位置.图片聊天,同时还加入了好友圈功能,支持分享动态和发送图片,支持搜索附近的人,使用的百度地图定位功能:由Bmob后端云提供服务器支持,欢迎喜欢的伙伴 ...

  7. swift基础二

    import Foundation // MARK: - ?和!的区别 // ?代表可选类型,实质上是枚举类型,里面有None和Some两种类型,其实nil相当于OPtional.None,如果非ni ...

  8. android 进程/线程管理(四)----消息机制的思考(自定义消息机制)

    关于android消息机制 已经写了3篇文章了,想要结束这个系列,总觉得少了点什么? 于是我就在想,android为什么要这个设计消息机制,使用消息机制是现在操作系统基本都会有的特点. 可是andro ...

  9. Mac terminal从bash切换到zsh

    0.预备知识 echo $SHELL命令可以查看当前正在使用什么shell 默认情况下(mbp 10.10.5)使用bash作为默认shell,然而也自带zsh,which zsh命令可以查看zsh的 ...

  10. 如何解决"应用程序无法启动,因为应用程序的并行配置不正确"问题

    应用程序事件日志中: "C:\windows\system32\test.exe"的激活上下文生成失败.找不到从属程序集 Microsoft.VC80.MFC,processorA ...