027_编写MapReduce的模板类Mapper、Reducer和Driver

模板类编写好后写MapReduce程序，的模板类编写好以后只需要改参数就行了，代码如下：

 package org.dragon.hadoop.mr.module;

 import java.io.IOException;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.conf.Configured;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.LongWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

 import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

 import org.apache.hadoop.util.Tool;

 import org.apache.hadoop.util.ToolRunner;

 /**

  *

  * ###########################################

  * ############     MapReduce 模板类        ##########

  * ###########################################

  *

  * @author ZhuXY

  * @time 2016-3-13 下午10:21:06

  *

  */

 public class ModuleMapReduce extends Configured implements Tool {

     /**

      * Mapper Class

      */

     public static class ModuleMapper extends

             Mapper<LongWritable, Text, LongWritable, Text> {

         @Override

         protected void setup(Context context) throws IOException,

                 InterruptedException {

             super.setup(context);

         }

         @Override

         protected void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             super.map(key, value, context);

         }

         @Override

         protected void cleanup(Context context) throws IOException,

                 InterruptedException {

             super.cleanup(context);

         }

     }

     /**

      * Reducer Class

      */

     public static class ModuleReducer extends

             Reducer<LongWritable, Text, LongWritable, Text> {

         @Override

         protected void setup(Context context) throws IOException,

                 InterruptedException {

             // TODO Auto-generated method stub

             super.setup(context);

         }

         @Override

         protected void reduce(LongWritable key, Iterable<Text> values,

                 Context context) throws IOException, InterruptedException {

             // TODO Auto-generated method stub

             super.reduce(key, values, context);

         }

         @Override

         protected void cleanup(Context context) throws IOException,

                 InterruptedException {

             // TODO Auto-generated method stub

             super.cleanup(context);

         }

     }

     /**

      * Driver Class

      */

     //    专门抽取一个方法出来用于设置

     public Job parseInputAndOutput(Tool tool,Configuration conf,String[] args) throws IOException

     {

         if (args.length!=2) {

             System.err.printf("Usage:%s [generic options] <input> <output>\n", tool.getClass().getSimpleName());

             ToolRunner.printGenericCommandUsage(System.err);

             return null;

         }

         //创建job，并设置配置信息和job名称

         Job job=new Job(conf, ModuleMapReduce.class.getSimpleName());

         //设置job的运行类

         // step 3：set job

         // 1) set run jar class

         job.setJarByClass(tool.getClass());

         // 14) job output path

         FileOutputFormat.setOutputPath(job, new Path(args[1]));

         return job;

     }

     @Override

     public int run(String[] args) throws Exception {

         // step 1：get conf

         Configuration conf = new Configuration();

         // step 2：create job

         Job job = parseInputAndOutput(this, conf, args);

         // 2) set input format

         job.setInputFormatClass(TextInputFormat.class); // 可省

         // 3) set input path

         FileInputFormat.addInputPath(job, new Path(args[0]));

         // 4) set mapper class

         job.setMapperClass(ModuleMapper.class); // 可省

         // 5)set map input key/value class

         job.setMapOutputKeyClass(LongWritable.class); // 可省

         job.setMapOutputValueClass(Text.class); // 可省

         // 6) set partitioner class

         job.setPartitionerClass(HashPartitioner.class); // 可省

         // 7) set reducer number

         job.setNumReduceTasks(1);// default 1 //可省

         // 8)set sort comparator class

         //job.setSortComparatorClass(LongWritable.Comparator.class); // 可省

         // 9) set group comparator class

         //job.setGroupingComparatorClass(LongWritable.Comparator.class); // 可省

         // 10) set combiner class

         // job.setCombinerClass(null);默认是null，但是此处不能写 //可省

         // 11) set reducer class

         job.setReducerClass(ModuleReducer.class); // 可省

         // 12) set output format

         job.setOutputFormatClass(TextOutputFormat.class); // 可省

         // 13) job output key/value class

         job.setOutputKeyClass(LongWritable.class); // 可省

         job.setOutputValueClass(Text.class); // 可省

         // step 4: submit job

         boolean isSuccess = job.waitForCompletion(true);

         // step 5: return status

         return isSuccess ? 0 : 1;

     }

     public static void main(String[] args) throws Exception {

         args = new String[] {

                 "hdfs://hadoop-master.dragon.org:9000/wc/mininput/",

                 "hdfs://hadoop-master.dragon.org:9000/wc/minoutput"

                 };

         //run mapreduce

         int status=ToolRunner.run(new ModuleMapReduce(), args);

         //exit

         System.exit(status);

     }

 }

View Module Code

模板使用步骤：

1）改名称（MapReduce类的名称、Mapper类的名称、Reducer类的名称）

2）依据实际的业务逻辑修改Mapper类和Reducer类的Key/Value输入输出参数的类型

3）修改驱动Driver部分的Job的参数设置（Mapper类和Reducer类的输出类型）

4）在Mapper类中编写实际的业务逻辑（setup()、map()、cleanup()）

5）在Reducer类中编写实际的业务逻辑（setup()、map()、cleanup()）

6）检查并修改驱动Driver代码（模板类中的run()方法）

7）设置输入输出路径，进行MR测试。

使用ModuleMapReduce编写wordcount程序

 package org.dragon.hadoop.mr.module;

 import java.io.IOException;

 import java.util.StringTokenizer;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.conf.Configured;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.LongWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

 import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

 import org.apache.hadoop.util.Tool;

 import org.apache.hadoop.util.ToolRunner;

 /**

  *

  * ###########################################

  * ############        MapReduce 模板类     ##########

  * ###########################################

  *

  * @author ZhuXY

  * @time 2016-3-13 下午10:21:06

  *

  */

 public class WordcountByModuleMapReduce extends Configured implements Tool {

     /**

      * Mapper Class

      */

     public static class WordcountMapper extends

             Mapper<LongWritable, Text, Text, LongWritable> {

         @Override

         protected void setup(Context context) throws IOException,

                 InterruptedException {

             super.setup(context);

         }

         private Text word = new Text();

         private final static LongWritable one = new LongWritable(1);

         @Override

         protected void map(LongWritable key, Text value, Context context)

                 throws IOException, InterruptedException {

             // 获取每行数据的值

             String lineValue = value.toString();

             // 进行分割

             StringTokenizer stringTokenizer = new StringTokenizer(lineValue);

             // 遍历

             while (stringTokenizer.hasMoreElements()) {

                 // 获取每个值

                 String worldValue = stringTokenizer.nextToken();

                 // 设置map, 输入的key值

                 word.set(worldValue);

                 context.write(word, one); // 如果出现就出现一次，存在每行出现几次，这时候键的值一样，多个键值对

             }

         }

         @Override

         protected void cleanup(Context context) throws IOException,

                 InterruptedException {

             super.cleanup(context);

         }

     }

     /**

      * Reducer Class

      */

     public static class WordcountReducer extends

             Reducer<Text, LongWritable, Text, LongWritable> {

         private LongWritable resultLongWritable = new LongWritable();

         @Override

         protected void setup(Context context) throws IOException,

                 InterruptedException {

             // TODO Auto-generated method stub

             super.setup(context);

         }

         @Override

         protected void reduce(Text key, Iterable<LongWritable> values,

                 Context context) throws IOException, InterruptedException {

             int sum = 0;

             // 循环遍历Interable

             for (LongWritable value : values) {

                 // 累加

                 sum += value.get();

             }

             // 设置总次数

             resultLongWritable.set(sum);

             context.write(key, resultLongWritable);

         }

         @Override

         protected void cleanup(Context context) throws IOException,

                 InterruptedException {

             // TODO Auto-generated method stub

             super.cleanup(context);

         }

     }

     /**

      * Driver Class

      */

     // 专门抽取一个方法出来用于设置

     public Job parseInputAndOutput(Tool tool, Configuration conf, String[] args)

             throws IOException {

         if (args.length != 2) {

             System.err.printf("Usage:%s [generic options] <input> <output>\n",

                     tool.getClass().getSimpleName());

             ToolRunner.printGenericCommandUsage(System.err);

             return null;

         }

         // 创建job，并设置配置信息和job名称

         Job job = new Job(conf,

                 WordcountByModuleMapReduce.class.getSimpleName());

         // 设置job的运行类

         // step 3：set job

         // 1) set run jar class

         job.setJarByClass(tool.getClass());

         // 14) job output path

         FileOutputFormat.setOutputPath(job, new Path(args[1]));

         return job;

     }

     @Override

     public int run(String[] args) throws Exception {

         // step 1：get conf

         Configuration conf = new Configuration();

         // step 2：create job

         Job job = parseInputAndOutput(this, conf, args);

         // 2) set input format

         job.setInputFormatClass(TextInputFormat.class); // 可省

         // 3) set input path

         FileInputFormat.addInputPath(job, new Path(args[0]));

         // 4) set mapper class

         job.setMapperClass(WordcountMapper.class); // 可省

         // 5)set map input key/value class

         job.setMapOutputKeyClass(Text.class); // 可省

         job.setMapOutputValueClass(LongWritable.class); // 可省

         // 6) set partitioner class

         job.setPartitionerClass(HashPartitioner.class); // 可省

         // 7) set reducer number

         job.setNumReduceTasks(1);// default 1 //可省

         // 8)set sort comparator class

         // job.setSortComparatorClass(LongWritable.Comparator.class); // 可省

         // 9) set group comparator class

         // job.setGroupingComparatorClass(LongWritable.Comparator.class); // 可省

         // 10) set combiner class

         // job.setCombinerClass(null);默认是null，但是此处不能写 //可省

         // 11) set reducer class

         job.setReducerClass(WordcountReducer.class); // 可省

         // 12) set output format

         job.setOutputFormatClass(TextOutputFormat.class); // 可省

         // 13) job output key/value class

         job.setOutputKeyClass(Text.class); // 可省

         job.setOutputValueClass(LongWritable.class); // 可省

         // step 4: submit job

         boolean isSuccess = job.waitForCompletion(true);

         // step 5: return status

         return isSuccess ? 0 : 1;

     }

     public static void main(String[] args) throws Exception {

         args = new String[] {

                 "hdfs://hadoop-master.dragon.org:9000/wc/mininput/",

                 "hdfs://hadoop-master.dragon.org:9000/wc/minoutput" };

         // run mapreduce

         int status = ToolRunner.run(new WordcountByModuleMapReduce(), args);

         // exit

         System.exit(status);

     }

 }

View WordcountByModuleMapReduce Code

027_编写MapReduce的模板类Mapper、Reducer和Driver的更多相关文章

Hadoop（十七）之MapReduce作业配置与Mapper和Reducer类
前言前面一篇博文写的是Combiner优化MapReduce执行,也就是使用Combiner在map端执行减少reduce端的计算量. 一.作业的默认配置 MapReduce程序的默认配置 1)概述 ...
024_MapReduce中的基类Mapper和基类Reducer
内容提纲 1) MapReduce中的基类Mapper类,自定义Mapper类的父类. 2) MapReduce中的基类Reducer类,自定义Reducer类的父类. 1.Mapper类 API文档 ...
[Hadoop in Action] 第4章编写MapReduce基础程序
基于hadoop的专利数据处理示例 MapReduce程序框架用于计数统计的MapReduce基础程序支持用脚本语言编写MapReduce程序的hadoop流式API 用于提升性能的Combine ...
Hadoop：使用Mrjob框架编写MapReduce
Mrjob简介 Mrjob是一个编写MapReduce任务的开源Python框架,它实际上对Hadoop Streaming的命令行进行了封装,因此接粗不到Hadoop的数据流命令行,使我们可以更轻松 ...
如何在maven项目里面编写mapreduce程序以及一个maven项目里面管理多个mapreduce程序
我们平时创建普通的mapreduce项目,在遍代码当你需要导包使用一些工具类的时候, 你需要自己找到对应的架包,再导进项目里面其实这样做非常不方便,我建议我们还是用maven项目来得方便多了话不多说 ...
整合使用持久层框架mybatis 使用SqlSessionTemplate模板类与使用映射接口对比
spring中整合使用mybatis的用法总结一:在Spring配置Mybatis 第一步:将mybatis-spring类包添加到项目的类库中第二步:编写spring和持久层衔接的xml文件, ...
c++模板类
c++模板类理解编译器的编译模板过程如何组织编写模板程序前言常遇到询问使用模板到底是否容易的问题,我的回答是:“模板的使用是容易的,但组织编写却不容易”.看看我们几乎每天都能遇到的模板类吧,如S ...
C++ - 模板类模板成员函数(member function template)隐式处理(implicit)变化
模板类模板成员函数(member function template)隐式处理(implicit)变化本文地址: http://blog.csdn.net/caroline_wendy/articl ...
开涛spring3(7.2) - 对JDBC的支持之 7.2 JDBC模板类
7.2 JDBC模板类 7.2.1 概述 Spring JDBC抽象框架core包提供了JDBC模板类,其中JdbcTemplate是core包的核心类,所以其他模板类都是基于它封装完成的,JDB ...

随机推荐

F - Goldbach`s Conjecture 对一个大于2的偶数n，找有多少种方法使两个素数的和为n；保证素数a<=b; a+b==n; a,b都为素数。
/** 题目:F - Goldbach`s Conjecture 链接:https://vjudge.net/contest/154246#problem/F 题意:对一个大于2的偶数n,找有多少种方 ...
我的第九个java程序--spring和mybatis整合（java project）
思路:入口程序读spring的配置文件-配置文件注入给程序bean--程序拿到bean以操作对象的手法查出程序入口程序HelloWorld.java package HelloWorld; impo ...
第一百七十八节，jQuery-UI，知问前端--对话框 UI
jQuery-UI,知问前端--对话框 UI 学习要点: 1.开启多个 dialog 2.修改 dialog 样式 3.dialog()方法的属性 4.dialog()方法的事件 5.dialog 中 ...
转：SSD详解
原文:http://blog.csdn.net/a8039974/article/details/77592395, http://blog.csdn.net/jesse_mx/article/det ...
Nginx模块系列之auth_basic模块
1.1 介绍 ngx_http_auth_basic_module模块实现让访问着,只有输入正确的用户密码才允许访问web内容.web上的一些内容不想被其他人知道,但是又想让部分人看到.nginx的h ...
EasyUI怎么利用onBeforeRender事件
onBeforeRender事件是view的属性,该事件发生在把ajax请求到的数据填充到表格内容中之前将此段代码附加在DataGrid初始化后执行,即可完成在DataGrid渲染之前进行操作 // ...
Android--去除EditText边框，加入下划线
<span style="font-family: Arial, Helvetica, sans-serif;"><?xml version="1.0& ...
三维空间直线最近点对hdu4741
//求两条直线之间的关系(三维) //输入:两条不为点的直线 //输出:相交返回XIANGJIAO和交点p,平行返回PINGXING,共线返回GONGXIAN int LineAndLine(Line ...
《从零开始学Swift》学习笔记（Day 34）——静态属性是怎么回事？
原创文章,欢迎转载.转载请注明:关东升的博客我先来设计一个类:有一个Account(银行账户)类,假设它有3个属性:amount(账户金额).interestRate(利率)和owner(账户名). ...
ASIHttprequest 报错
(void)requestReceivedResponseHeaders:(NSMutableDictionary *)newResponseHeaders { if ([self error] || ...

027_编写MapReduce的模板类Mapper、Reducer和Driver

027_编写MapReduce的模板类Mapper、Reducer和Driver的更多相关文章

随机推荐

热门专题