采用ToolRunner执行Hadoop基本面分析程序
为了简化执行作业的命令行。Hadoop它配备了一些辅助类。GenericOptionsParser它是一类。经常用来解释Hadoop命令行选项,并根据需要。至Configuration采取相应的对象设置值。
通常不直接使用GenericOptionsParser,更方便的方式是:实现Tool接口,通过ToolRunner来执行应用程序,ToolRunner内部调用GenericOptionsParser。
watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQvamVkaWFlbF9sdQ==/font/5a6L5L2T/fontsize/400/fill/I0JBQkFCMA==/dissolve/70/gravity/Center" alt="">
A utility to help run Tools.
ToolRunner can be used to run classes implementing Tool interface. It works in conjunction with GenericOptionsParser to parse the generic hadoop command line arguments and modifies the Configuration of the Tool.
The application-specific options are passed along without being modified.
run
public static int run(Configuration conf,
Tool tool,
is-external=true" style="">String[] args)
throws Exception
- Runs the given Tool by Tool.run(String[]), after parsing with the given generic arguments. Uses
the given Configuration, or builds one if null. Sets the Tool's configuration with the possibly modified version of the conf. -
- Parameters:
- conf - Configuration for the Tool.
- tool - Tool to run.
- args - command-line arguments to the tool.
- Returns:
- exit code of the Tool.run(String[]) method.
- Throws:
is-external=true" style="">Exception
run
public static int run(Tool tool,
is-external=true" style="">String[] args)
throws Exception
- Runs the Tool with its Configuration. Equivalent to run(tool.getConf(), tool, args).
-
- Parameters:
- tool - Tool to run.
- args - command-line arguments to the tool.
- Returns:
- exit code of the Tool.run(String[]) method.
- Throws:
is-external=true" style="">Exception
它们均是静态方法。即能够通过类名调用。
除此以外,另一个方法:
static void
printGenericCommandUsage(PrintStream out)
Prints generic command-line argurments and usage information.
4、ToolRunner完毕下面2个功能:
(1)为Tool创建一个Configuration对象。
(2)使得程序能够方便的读取參数配置。
ToolRunner完整源码例如以下:
package org.apache.hadoop.util; import java.io.PrintStream; import org.apache.hadoop.conf.Configuration; /**
* A utility to help run {@link Tool}s.
*
* <p><code>ToolRunner</code> can be used to run classes implementing
* <code>Tool</code> interface. It works in conjunction with
* {@link GenericOptionsParser} to parse the
* <a href="{@docRoot}/org/apache/hadoop/util/GenericOptionsParser.html#GenericOptions">
* generic hadoop command line arguments</a> and modifies the
* <code>Configuration</code> of the <code>Tool</code>. The
* application-specific options are passed along without being modified.
* </p>
*
* @see Tool
* @see GenericOptionsParser
*/
public class ToolRunner { /**
* Runs the given <code>Tool</code> by {@link Tool#run(String[])}, after
* parsing with the given generic arguments. Uses the given
* <code>Configuration</code>, or builds one if null.
*
* Sets the <code>Tool</code>'s configuration with the possibly modified
* version of the <code>conf</code>.
*
* @param conf <code>Configuration</code> for the <code>Tool</code>.
* @param tool <code>Tool</code> to run.
* @param args command-line arguments to the tool.
* @return exit code of the {@link Tool#run(String[])} method.
*/
public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
if(conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
//set the configuration back, so that Tool can configure itself
tool.setConf(conf); //get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
} /**
* Runs the <code>Tool</code> with its <code>Configuration</code>.
*
* Equivalent to <code>run(tool.getConf(), tool, args)</code>.
*
* @param tool <code>Tool</code> to run.
* @param args command-line arguments to the tool.
* @return exit code of the {@link Tool#run(String[])} method.
*/
public static int run(Tool tool, String[] args)
throws Exception{
return run(tool.getConf(), tool, args);
} /**
* Prints generic command-line argurments and usage information.
*
* @param out stream to write usage information to.
*/
public static void printGenericCommandUsage(PrintStream out) {
GenericOptionsParser.printGenericCommandUsage(out);
} }
Unless explicitly turned off, Hadoop by default specifies two resources, loaded in-order from the classpath:
- core-default.xml : Read-only defaults for hadoop.
- core-site.xml: Site-specific configuration for a given hadoop installation.
static{
//print deprecation warning if hadoop-site.xml is found in classpath
ClassLoader cL = Thread.currentThread().getContextClassLoader();
if (cL == null) {
cL = Configuration.class.getClassLoader();
}
if(cL.getResource("hadoop-site.xml")!=null) {
LOG.warn("DEPRECATED: hadoop-site.xml found in the classpath. " +
"Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, "
+ "mapred-site.xml and hdfs-site.xml to override properties of " +
"core-default.xml, mapred-default.xml and hdfs-default.xml " +
"respectively");
}
addDefaultResource("core-default.xml");
addDefaultResource("core-site.xml");
}
Configuration.java的源码中包括了以上代码,即通过静态语句为程序载入core-default.xml以及core-site.xml中的參数。
同一时候,检查是否还存在hadoop-site.xml,若还存在,则给出warning,提醒此配置文件已经废弃。
is-external=true" title="class or interface in java.lang" style="font-family:Simsun; font-size:14px">Iterable
<Map.Entry<is-external=true" title="class or interface in java.lang" style="font-family:Simsun; font-size:14px">String
,String>>,因此能够通过下面方式对其内容进行遍历:for (Entry<String, String> entry : conf){
.....
}
(四)关于Tool
package org.apache.hadoop.util; import org.apache.hadoop.conf.Configurable; public interface Tool extends Configurable { int run(String [] args) throws Exception;
}
由此可见,Tool自身仅仅有一个方法run(String[]),同一时候它继承了Configuable的2个方法。
package org.apache.hadoop.conf; public interface Configurable { void setConf(Configuration conf); Configuration getConf();
}
2、Configured的源文件例如以下:
package org.apache.hadoop.conf; public class Configured implements Configurable { private Configuration conf;
public Configured() {
this(null);
} public Configured(Configuration conf) {
setConf(conf);
} public void setConf(Configuration conf) {
this.conf = conf;
} public Configuration getConf() {
return conf;
} }
它有2个构造方法。各自是带Configuration參数的方法与不还參数的方法。
package org.jediael.hadoopdemo.toolrunnerdemo; import java.util.Map.Entry; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner; public class ToolRunnerDemo extends Configured implements Tool {
static {
//Configuration.addDefaultResource("hdfs-default.xml");
//Configuration.addDefaultResource("hdfs-site.xml");
//Configuration.addDefaultResource("mapred-default.xml");
//Configuration.addDefaultResource("mapred-site.xml");
} @Override
public int run(String[] args) throws Exception {
Configuration conf = getConf();
for (Entry<String, String> entry : conf) {
System.out.printf("%s=%s\n", entry.getKey(), entry.getValue());
}
return 0;
} public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new ToolRunnerDemo(), args);
System.exit(exitCode);
}
}
io.seqfile.compress.blocksize=1000000
keep.failed.task.files=false
mapred.disk.healthChecker.interval=60000
dfs.df.interval=60000
dfs.datanode.failed.volumes.tolerated=0
mapreduce.reduce.input.limit=-1
mapred.task.tracker.http.address=0.0.0.0:50060
mapred.used.genericoptionsparser=true
mapred.userlog.retain.hours=24
dfs.max.objects=0
mapred.jobtracker.jobSchedulable=org.apache.hadoop.mapred.JobSchedulable
mapred.local.dir.minspacestart=0
hadoop.native.lib=true
color=yello
wc
68 68 3028
<?xml version="1.0"?>
package org.jediael.hadoopdemo.toolrunnerdemo; import java.io.IOException;
import java.util.StringTokenizer; import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner; public class WordCount extends Configured implements Tool{ public static class WordCountMap extends
Mapper<LongWritable, Text, Text, IntWritable> { private final IntWritable one = new IntWritable(1);
private Text word = new Text(); public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer token = new StringTokenizer(line);
while (token.hasMoreTokens()) {
word.set(token.nextToken());
context.write(word, one);
}
}
} public static class WordCountReduce extends
Reducer<Text, IntWritable, Text, IntWritable> { public void reduce(Text key, Iterable<IntWritable> values,
Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
} @Override
public int run(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf);
job.setJarByClass(WordCount.class);
job.setJobName("wordcount"); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class); job.setMapperClass(WordCountMap.class);
job.setReducerClass(WordCountReduce.class); job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class); FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1])); return(job.waitForCompletion(true)? 0:-1); } public static void main(String[] args) throws Exception {
int exitCode = ToolRunner.run(new WordCount(), args);
System.exit(exitCode);
} }
执行程序:
[root@jediael project]# hadoop fs -mkdir wcin2
[root@jediael project]# hadoop fs -copyFromLocal /opt/jediael/apache-nutch-2.2.1/CHANGES.txt wcin2
[root@jediael project]# hadoop jar wordcount2.jar org.jediael.hadoopdemo.toolrunnerdemo.WordCount wcin2 wcout2
版权声明:本文博主原创文章,博客,未经同意不得转载。
采用ToolRunner执行Hadoop基本面分析程序的更多相关文章
- 使用ToolRunner运行Hadoop程序基本原理分析
为了简化命令行方式运行作业,Hadoop自带了一些辅助类.GenericOptionsParser是一个类,用来解释常用的Hadoop命令行选项,并根据需要,为Configuration对象设置相应的 ...
- 使用ToolRunner运行Hadoop程序基本原理分析 分类: A1_HADOOP 2014-08-22 11:03 3462人阅读 评论(1) 收藏
为了简化命令行方式运行作业,Hadoop自带了一些辅助类.GenericOptionsParser是一个类,用来解释常用的Hadoop命令行选项,并根据需要,为Configuration对象设置相应的 ...
- Win7下通过eclipse远程连接CDH集群来执行相应的程序以及错误说明
最近尝试这用用eclipse连接CDH的集群,由于之前尝试过很多次都没连上,有一次发现Cloudera Manager是将连接的端口修改了,所以才导致连接不上CDH的集群,之前Apache hadoo ...
- 使用Python实现Hadoop MapReduce程序
转自:使用Python实现Hadoop MapReduce程序 英文原文:Writing an Hadoop MapReduce Program in Python 根据上面两篇文章,下面是我在自己的 ...
- 分布式配置 tachyon 并执行Hadoop样例 MapReduce
----------此文章.笔者按着tachyon官网教程进行安装并记录. (本地安装tachyon具体解释:http://blog.csdn.net/u012587561/article/detai ...
- C 语言main 函数终极探秘(&& 的含义是:如果 && 前面的程序正常退出,则继续执行 && 后面的程序,否则不执行)
所有的C程序必须定义一个称之为main的外部函数,这个函数是程序的入口,也就是当程序启动时所执行的第一个函数,当这个函数返回时,程序也将终止,并且这个函数的返回值被看成是程序成功或失败的 ...
- 执行Hadoop job提示SequenceFile doesn't work with GzipCodec without native-hadoop code的解决过程记录
参照Hadoop.The.Definitive.Guide.4th的例子,执行SortDataPreprocessor作业时失败,输出的错误信息 SequenceFile doesn't work w ...
- Linux下使用Eclipse开发Hadoop应用程序
在前面一篇文章中介绍了如果在完全分布式的环境下搭建Hadoop0.20.2,现在就再利用这个环境完成开发. 首先用hadoop这个用户登录linux系统(hadoop用户在前面一篇文章中创建的),然后 ...
- C#和asp.net执行外部EXE程序
这两天研究下.Net的执行外部EXE程序问题,就是在一个程序里通过按钮或其他操作运行起来另外一个程序,需要传入参数,如用户名.密码之类(实际上很类似单点登录,不过要简单的多的多):总结如下: 1.CS ...
随机推荐
- Android设备管理器漏洞2--禁止用户取消激活设备管理器
2013年6月,俄罗斯安全厂商卡巴斯基发现了史上最强手机木马-Obad.A.该木马利用了一个未知的Android设备管理器漏洞(ANDROID-9067882),已激活设备管理器权限的手机木马利用该漏 ...
- Wix学习整理(7)——在开始菜单中为HelloWorld添加卸载快捷方式
原文:Wix学习整理(7)--在开始菜单中为HelloWorld添加卸载快捷方式 通过前面的几篇随笔,我们已经给我们的HelloWorld提供了填写注册表信息,以及开始菜单快捷方式和桌面快捷方式.这些 ...
- Jersey框架三:Jersey对HTTPS的支持
Jersey系列文章: Jersey框架一:Jersey RESTful WebService框架简介 Jersey框架二:Jersey对JSON的支持 Jersey框架三:Jersey对HTTPS的 ...
- 关于Opencv2.4.x中stitcher类的简单应用
1.opencv2.4以上版本有stitcher类,可以简单方便的实现图像的拼接,目前只是简单的测试一下stitcher类的拼接功能,也是纠结了好长时间,最终发现是要在链接库中加上opencv_sti ...
- Maven, Ivy, Grape, Gradle, Buildr, SBT, Leiningen, ant
Maven, Ivy, Grape, Gradle, Buildr, SBT, Leiningen, ant
- HDU 5074-Hatsune Miku(DP)
Hatsune Miku Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 262144/262144 K (Java/Others) T ...
- windows之实现3D立体效果的三种方法
第一种:快捷键:win+tab 另外一种:cmd输入rundll32.exe dwmapi #105 第三种:使用软件bumptop
- NYOJ 104 最大子矩阵(二维DP)
最大和 时间限制:1000 ms | 内存限制:65535 KB 难度:5 描写叙述 给定一个由整数组成二维矩阵(r*c),如今须要找出它的一个子矩阵,使得这个子矩阵内的全部元素之和最大,并把这个 ...
- C语言中结构体參数变量的传递
[文章摘要] 在C语言中,结构体參数变量常常作为函数的參数来进行传递.但假设參数设置不当.会出现内存问题. 本文以实际的程序代码为例.具体地介绍怎样正确地使用结构体參数变量.为相关的开发工作提供了參考 ...
- Linux------创建和终止进程
创建进程: Linux创建两个步骤的新处理:fork()和exec().其中fork创建当前进程的能力(父进程)副本,那个孩子.父子进程只有PID不同.在这之后,该系统具有两个进程,运行相同的操作.父 ...