Hadoop with tool interface

Often Hadoop jobsare executed through a command line. Therefore, each Hadoop job has to
support reading, parsing, and processing command-line arguments. To avoid each developer
having to rewrite this code, Hadoop provides a org.apache.hadoop.util.Toolinterface.

Sample code :

public class WordcountWithTools extends Configured implements Tool {



	public int run(String[] args) throws Exception {

		if (args.length < 2) {

			System.out

					.println("chapter3.WordCountWithTools WordCount <inDir> <outDir>");

			ToolRunner.printGenericCommandUsage(System.out);

			System.out.println("");

			return -1;

		}

		System.out.println(Arrays.toString(args));

		// just for test

		System.out.println(getConf().get("test"));

		Job job = new Job(getConf(), "word count");

		job.setJarByClass(WordCount.class);

		job.setMapperClass(TokenizerMapper.class);

		// Uncomment this to

		// job.setCombinerClass(IntSumReducer.class);

		job.setReducerClass(IntSumReducer.class);

		job.setOutputKeyClass(Text.class);

		job.setOutputValueClass(IntWritable.class);

		FileInputFormat.addInputPath(job, new Path(args[0]));

		// delete target if exists

		FileSystem.get(getConf()).delete(new Path(args[1]), true);

		FileOutputFormat.setOutputPath(job, new Path(args[1]));

		job.waitForCompletion(true);

		return 0;

	}

	public static void main(String[] args) throws Exception {

		int res = ToolRunner.run(new Configuration(), new WordcountWithTools(),

				args);

		System.exit(res);

	}

}

Generic options supported are

-conf<configuration file> specify an application configuration

file

-D <property=value> use value for given property

-fs<local|namenode:port> specify a namenode

-jt<local|jobtracker:port> specify a job tracker

-files<comma separated list of files> specify comma separated

files to be copied to the map reduce cluster

-libjars<comma separated list of jars> specify comma separated

jar files to include in the classpath.

-archives<comma separated list of archives> specify comma

separated archives to be unarchived on the compute machines.

The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]

这里一定要注意顺序，我曾经用错过顺序，把-input -output放在前面，后面使用-D,-libjars不起作用。

使用示例：

JAR_NAME=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-SNAPSHOT.jar

MAIN_CLASS=chapter3.WordcountWithTools

INPUT_DIR=/data/input/

OUTPUT_DIR=/data/output/

hadoop jar $JAR_NAME $MAIN_CLASS -Dtest=lovejava $INPUT_DIR $OUTPUT_DIR

在代码中测试传递的test属性的值。

JAR_NAME=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-SNAPSHOT.jar

MAIN_CLASS=chapter3.WordcountWithTools

INPUT_DIR=/home/hadoop/data/test1.txt

OUTPUT_DIR=/home/hadoop/data/output/

hadoop jar $JAR_NAME $MAIN_CLASS -Dtest=lovejava -fs=file:/// -files=home/hadoop/data/test2.txt

$INPUT_DIR $OUTPUT_DIR

测试处理本地文件系统的文件。

JAR_NAME=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-SNAPSHOT.jar

MAIN_CLASS=chapter3.WordcountWithTools

INPUT_DIR=/home/hadoop/data/test1.txt

OUTPUT_DIR=/home/hadoop/data/output/

hadoop jar $JAR_NAME $MAIN_CLASS -conf=/home/hadoop/data/democonf.xml -fs=file:/// $INPUT_DIR $OUTPUT_DIR

指定配置文件。

－libjars可以把你写的mapreduce中引用的第三方包放到HDFS上，然后各结点在运行作业的时候复制到本地临时目录，以避免找不到引用类的情况。

Hadoop with tool interface的更多相关文章

Hadoop 学习笔记3 Develping MapReduce
小笔记: Mavon是一种项目管理工具,通过xml配置来设置项目信息. Mavon POM(project of model). Steps: 1. set up and configure the ...
hadoop MapReduce 笔记
1. MapReduce程序开发步骤编写map 和 reduce 程序–> 单元测试 -> 编写驱动程序进行验证-> 本地数据集调试 -> 部署到集群运行用 ...
Hadoop MapReduceV2(Yarn) 框架简介[转]
对于业界的大数据存储及分布式处理系统来说,Hadoop 是耳熟能详的卓越开源分布式文件存储及处理框架,对于 Hadoop 框架的介绍在此不再累述,读者可参考 Hadoop 官方简介.使用和学习过老 H ...
（转）单机上配置hadoop
哈哈,几天连续收到百度两次电话,均是利好消息,于是乎不知不觉的自己的工作效率也提高了,几天折腾了好久终于在单机上配置好了hadoop,然后也成功的运行了一个用例,耶耶耶耶耶耶. 转自:http://w ...
Hadoop中的辅助类ToolRunner和Configured的用法详解
在开始学习hadoop时,最痛苦的一件事就是难以理解所写程序的执行过程,让我们先来看这个实例,这个测试类ToolRunnerTest继承Configured的基础上实现了Tool接口,下面对其用到的基 ...
Hadoop伪分布配置与基于Eclipse开发环境搭建
国内私募机构九鼎控股打造APP,来就送 20元现金领取地址:http://jdb.jiudingcapital.com/phone.html内部邀请码:C8E245J (不写邀请码,没有现金送)国内私 ...
远程调试hadoop各组件
远程调试对应用程序开发十分有用.例如,为不能托管开发平台的低端机器开发程序,或在专用的机器上(比如服务不能中断的 Web 服务器)调试程序.其他情况包括:运行在内存小或 CUP 性能低的设备上的 Ja ...
Hadoop MapReduceV2(Yarn) 框架简介
http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/ 对于业界的大数据存储及分布式处理系统来说,Hadoop 是耳熟能详 ...
hadoop2.2编程：Tool, ToolRunner, GenericOptionsParser, Configuration
继承关系: 1. java.util Interface Map.Entry<K,V> description: public static interface Map.Entry&l ...

随机推荐

ThreadLocal的基本原理与实现
一.概念首先,ThreadLocal并不是一个Thread,这个类提供了线程局部变量,这些变量不同于它们的普通对应物,因为访问某个变量的每个线程都有自己的局部变量,它独立于变量的初始化副本. 二.基 ...
WPF ListView 选中问题
WPF ListView 选中问题摘自:http://www.cnblogs.com/BBHor/archive/2013/04/28/VisualTreeHelper-PreviewMouseD ...
BI之SSAS完整实战教程7 -- 设计维度、细化维度中：浏览维度，细化维度
上篇文章我们已经将Dim Geography维度设计好. 若要查看维度的成员, AS需要接收该维度的详细信息(包括已创建的特性.成员属性以及多级层次结构), 通过XMLA与AS的实例进行通信. 今天我 ...
asp.net mvc 4 json大数据异常提示JSON字符长度超出限制的异常
今天客户突然过来找我说在后台添加了一篇超长的文章后,所有后台的文章都显示不出来了.后台的前端显示是用easyui的,返回的数据全是用json.根据客户的描述进行了同样的操作后,在firebug下发现a ...
用于dbnull的数据转换。因为用convert.to无法转换dbnull类型
/// <summary> /// add by wolf /// </summary> public static class ExtendObject { public s ...
mybatis/callablestatement调用存储过程mysql connector产生不必要的元数据查询
INFO | jvm 1 | 2016/08/25 15:17:01 | 16-08-25 15:17:01 DEBUG pool-1-thread-371dao.ITaskDao.callProce ...
RadioButton 自定义控件
在res/drawable新建radiobutton.xml(本案例为video——evaluate.xml)如下 <?xml version="1.0" encoding= ...
SQL对字符串数组的处理
一,用临时表作为数组复制代码代码如下: create function f_split(@c varchar(2000),@split varchar(2)) returns @t table(co ...
Convert part to feature command
Search 库主页 Related Links ArcObjects SDK for ArcGIS 10Microsoft Help System Documentation Convert par ...
关于oracle-12514错误的修改方法
原因1: 打开文件"<OracleHome>/network/admin/listener.ora",添加 (SID_DESC = (GLOBAL_DB ...

Hadoop with tool interface

Hadoop with tool interface的更多相关文章

随机推荐

热门专题