2.27 MapReduce Shuffle过程如何在Job中进行设置

一、shuffle过程

总的来说：

*分区

partitioner

*排序

sort

*copy (用户无法干涉)

拷贝

*分组

group

可设置

*压缩

compress

*combiner

map task端的Reduce

二、示例

package com.ibeifeng.hadoop.senior.mapreduce;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.conf.Configured;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Mapper.Context;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.Tool;

import org.apache.hadoop.util.ToolRunner;

/**

 * mapreduce

 *

 * @author root

 *

 */

public class ModuleMapReduce extends Configured implements Tool {

    // step1: map class

    /**

     * public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT>

     *

     */

    //TODO

    public static class ModuleMapper extends

            Mapper<LongWritable, Text, Text, IntWritable> {

        @Override

        public void setup(Context context) throws IOException,

                InterruptedException {

            //Nothing

        }

        @Override

        public void map(LongWritable key, Text value, Context context)

                throws IOException, InterruptedException {

            //TODO

        }

        @Override

        public void cleanup(Context context) throws IOException,

                InterruptedException {

            //Nothing

        }

    }

    // step2: reduce class

    /**

     * public class Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>

     *

     */

    public static class ModuleReducer extends

            Reducer<Text, IntWritable, Text, IntWritable> {

        @Override

        public void setup(Context context)

                throws IOException, InterruptedException {

            //Nothing

        }

        @Override

        public void reduce(Text key, Iterable<IntWritable> values,

                Context context) throws IOException, InterruptedException {

            //TODO

        }

        @Override

        public void cleanup(

                Context context)

                throws IOException, InterruptedException {

            //Nothing

        }

    }

    // step3: Driver, component job

    public int run(String[] args) throws Exception {

        // 1: get confifuration

        Configuration configuration = getConf();

        // 2: create job

        Job job = Job.getInstance(configuration, this.getClass()

                .getSimpleName());

        // run jar

        job.setJarByClass(this.getClass());

        // 3: set job

        // input->map->reduce->output

        // 3.1: input

        Path inPath = new Path(args[0]);

        FileInputFormat.addInputPath(job, inPath);

        // 3.2 map

        job.setMapperClass(ModuleMapper.class);

        //TODO

        job.setMapOutputKeyClass(Text.class);

        job.setMapOutputValueClass(IntWritable.class);

        //*****************shuffle********************

        // 1) partitioner

        //job.setPartitionerClass(cls);

        // 2)sort

        //job.setSortComparatorClass(cls);

        // 3) optional, combiner

        //job.setCombinerClass(cls);

        // 4) group

        //job.setGroupingComparatorClass(cls);

        //*****************shuffle********************

        // 3.3: reduce

        job.setReducerClass(ModuleReducer.class);

        //TODO

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(IntWritable.class);

        // 3.4:output

        Path outPath = new Path(args[1]);

        FileOutputFormat.setOutputPath(job, outPath);

        // 4:

        boolean isSuccess = job.waitForCompletion(true);

        return isSuccess ? 0 : 1 ;

    }

    //step 4: run program

    public static void main(String[] args) throws Exception {

        // 1: get confifuration

        Configuration configuration = new Configuration();

        //set compress; 启用压缩

        configuration.set("mapreduce.map.output.compress", "true");

        //压缩格式

        configuration.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.SnappyCodec");

        //int status = new WordCountMapReduce().run(args);

        int status = ToolRunner.run(configuration, new ModuleMapReduce(), args);

        System.exit(status);

    }

}

2.27 MapReduce Shuffle过程如何在Job中进行设置的更多相关文章

MapReduce Shuffle过程
MapReduce Shuffle 过程详解一.MapReduce Shuffle过程 1. Map Shuffle过程 2. Reduce Shuffle过程二.Map Shuffle过程 1. ...
彻底理解MapReduce shuffle过程原理
彻底理解MapReduce shuffle过程原理 MapReduce的Shuffle过程介绍 Shuffle的本义是洗牌.混洗,把一组有一定规则的数据尽量转换成一组无规则的数据,越随机越好.MapR ...
【转】如何在vmware中如何设置ip
如何在vmware中如何设置ip 1.修改网络接口选hostonly2.虚拟机里安装vmware-tool,对鼠标和图形进行更好地支持.如果你在图形界面下,首先要切换到文本模式.右键点击桌面,打开一个 ...
MapReduce:Shuffle过程的流程
Shuffle过程是MapReduce的核心,Shuffle描述着数据从map task输出到reduce task输入的这段过程. 1.map端
MapReduce shuffle过程剖析及调优
MapReduce简介在Hadoop MapReduce中,框架会确保reduce收到的输入数据是根据key排序过的.数据从Mapper输出到Reducer接收,是一个很复杂的过程,框架处理了所有问 ...
MapReduce:详解Shuffle过程(转)
/** * author : 冶秀刚 * mail : dennyy99@gmail.com */ Shuffle过程是MapReduce的核心,也被称为奇迹发生的地方.要想理解MapRedu ...
MapReduce：详解Shuffle过程
Shuffle过程,也称Copy阶段.reduce task从各个map task上远程拷贝一片数据,并针对某一片数据,如果其大小超过一定的阀值,则写到磁盘上,否则直接放到内存中. 官方的Shuffl ...
MapReduce:详解Shuffle过程
Shuffle过程是MapReduce的核心,也被称为奇迹发生的地方.要想理解MapReduce, Shuffle是必须要了解的.我看过很多相关的资料,但每次看完都云里雾里的绕着,很难理清大致的逻辑, ...
[转]MapReduce:详解Shuffle过程
Shuffle过程是MapReduce的核心,也被称为奇迹发生的地方.要想理解MapReduce, Shuffle是必须要了解的.我看过很多相关的资料,但每次看完都云里雾里的绕着,很难理清大致的逻辑, ...

随机推荐

BUPT复试专题—图像识别(2014-2)
题目描述在图像识别中,我们经常需要分析特定图像中的一些特征,而其中很重要的一点就是识别出图像的多个区域.在这个问题中,我们将给定一幅N xM的图像,其中毎个1 x 1的点都用一个[0, 255]的值 ...
全国省市区三级联动js
function Dsy(){ this.Items = {}; } Dsy.prototype.add = function(id,iArray){ this.Items[id] = iArray; ...
[LightOJ 1018]Brush (IV)[状压DP]
题目链接:http://lightoj.com/volume_showproblem.php? problem=1018 题意分析:平面上有不超过N个点,如今能够随意方向划直线将它们划去,问:最少要划 ...
android 项目R文件丢失解决的方法
R文件丢失的原因有非常多,这里提供几种解决的方法: 1. 选中项目,点击 Project - Clean , 清理一下项目. 2. 选中项目,右键选择 Android Tools - Fix P ...
OpenStack源码系列---起始篇
近一年来我负责公司云点的自动化部署工作,包括公司自有云平台方案.XenServer.vSphere.Ovirt和OpenStack的自动化安装部署,目前已经到了OpenStack这一部分.自动化部署首 ...
《STL源代码剖析》---stl_deque.h阅读笔记(2)
看完,<STL源代码剖析>---stl_deque.h阅读笔记(1)后.再看代码: G++ 2.91.57,cygnus\cygwin-b20\include\g++\stl_deque. ...
Session 钝化机制
android studio Error:Unable to tunnel through proxy. Proxy returns "HTTP/1.1 400 Bad Request"
android studio运行会遇到Error:Unable to tunnel through proxy. Proxy returns "HTTP/1.1 400 Bad Reques ...
Node中的promise简说及入门
Node的特色之一就是异步回调,可是回调过多,就会形成著名的回调金字塔. 直接上例子,我要读取1.txt里的内容,然后在这个内容上加上'test'并重新写入文件,如下代码所示: var fs = re ...
ElasticSearch远程随意代码运行漏洞(CVE-2014-3120)分析
原理这个漏洞实际上非常easy,ElasticSearch有脚本运行(scripting)的功能,能够非常方便地对查询出来的数据再加工处理. ElasticSearch用的脚本引擎是MVEL,这个引 ...

2.27 MapReduce Shuffle过程如何在Job中进行设置

2.27 MapReduce Shuffle过程如何在Job中进行设置的更多相关文章

随机推荐

热门专题