MapReduce编程小结

　　（1）key-value到map端比较容易，每个分片都会交由一个MapTask，而每个分片由InputFormat（一般是FileInputFormat）决定（一般是64M），

　　　　每个MapTask会调用N次map函数，具体是多少次map函数呢？

　　　　由job.setInputFormatClass(?)中？决定，默认是TextInputFormat.class，TextInputFormat是以一行为解析对象，一行对应一个map函数的调用。

　　（2）key-value在reduce端比较复杂，第二参数是Iterable<?>对象，涉及<key,list{value1,value2...}>,它对应一次reduce函数的调用，

　　　　也就是说，一次reduce函数调用将会处理一个key，多个value，

　　（3）而这个<key,list{value1,value2...}>输入是如何来的呢？

　　　　mapreduce框架自带了预定义key（Text、LongWritable等）的排序，

　　　　将来自不同MapTask的相同的key加以聚合，变为<key,list{value1,value2...}>作为reduce函数的输入。

　　（4）说了MapTask个数有分片决定，那ReduceTask将由什么决定呢？

　　　　每个map函数执行后都会调用一次getPartition函数(默认是HashPartitioner类的)来获取分区号，最终写入磁盘文件带有分区号这条尾巴，以便reduce端的拉取，

　　　　而getPartition函数中最重要的参数numReduceTasks将由job.setNumReduceTasks决定，默认值为1，

　　　　故若不设置此参数很多情况下getPartition函数会返回0，也就对应一个ReduceTask。

　　（5）说完了分区，再来说分组。分区是在map端确定，相对于每个map函数，而分组却放到了reduce端，相对于多个MapTask，组属于区。

　　　　分组会影响什么呢？

　　（6）当map端的输出key是自定义NewK2时，且自定义了compareTo，使用分组后，

　　　将使用分组类MyGroupingComparator的compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2)进行sort，

　　　　得到<key,list{value1,value2...}>。

　　附上一个例子：

package examples; 

import java.io.DataInput;

import java.io.DataOutput;

import java.io.IOException;

import java.net.URI;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.RawComparator;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.io.WritableComparable;

import org.apache.hadoop.io.WritableComparator;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

public class GroupApp {

	static final String INPUT_PATH = "hdfs://192.168.2.100:9000/hello";

	static final String OUTPUT_PATH = "hdfs://192.168.2.100:9000/out";

	public static void main(String[] args) throws Exception {

		Configuration conf = new Configuration();

		final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);

		final Path outPath = new Path(OUTPUT_PATH);

		if(fileSystem.exists(outPath)) {

			fileSystem.delete(outPath, true);

		}

		final Job job = new Job(conf, GroupApp.class.getSimpleName());

		job.setJarByClass(GroupApp.class);

		FileInputFormat.setInputPaths(job, INPUT_PATH);

		job.setInputFormatClass(TextInputFormat.class);

		job.setMapperClass(MyMapper.class);

		job.setMapOutputKeyClass(NewK2.class);

		job.setMapOutputValueClass(LongWritable.class);

		job.setPartitionerClass(MyPartitoner.class);

		job.setNumReduceTasks(3);

		job.setGroupingComparatorClass(MyGroupingComparator.class);

		job.setReducerClass(MyReducer.class);

		job.setOutputKeyClass(LongWritable.class);

		job.setOutputValueClass(LongWritable.class);

		FileOutputFormat.setOutputPath(job, outPath);

		job.waitForCompletion(true);

	}

	static class MyPartitoner extends HashPartitioner<NewK2, LongWritable> {

		  public int getPartition(NewK2 key, LongWritable value, int numReduceTasks) {

			  System.out.println("the getPartition() is called...");

			  if(key.first == 1) {

				  return 0 % numReduceTasks;

			  }

			  else if(key.first == 2) {

				  return 1 % numReduceTasks;

			  }

			  else {

				  return 2 % numReduceTasks;

			  }

		  }

	}

	static class NewK2 implements WritableComparable<NewK2> {

		Long first = 0L;

		Long second = 0L;

		public NewK2(){}

		public NewK2(long first, long second) {

			this.first = first;

			this.second = second;

		}

		public void write(DataOutput out) throws IOException {

			out.writeLong(first);

			out.writeLong(second);

		}

		public void readFields(DataInput in) throws IOException {

			first = in.readLong();

			second = in.readLong();

		}

		public int compareTo(NewK2 o) {

			System.out.println("the compareTo() is called...");

			final long minus = this.first - o.first;

			if(minus != 0) {

				return (int)minus;

			}

			return (int) (this.second - o.second);

		}

	}

	static class MyGroupingComparator implements RawComparator<NewK2> {

		public int compare(NewK2 o1, NewK2 o2) {

	//		System.out.println("the compare() is called...");

			return (int) (o1.first - o2.first);

		}

		public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) {

			System.out.println("the compare() is called...");

			return WritableComparator.compareBytes(b1, s1, 8, b2, s2, 8);

		}

	}

	static class MyMapper extends Mapper<LongWritable, Text, NewK2, LongWritable> {

		protected void map(LongWritable k1, Text v1, Context ctx) throws IOException, InterruptedException {

			final String[] splited = v1.toString().split("\t");

			System.out.println("the map() is called...");

			NewK2 k2 = new NewK2(Integer.parseInt(splited[0]), Integer.parseInt(splited[1]));

			LongWritable v2 = new LongWritable(Long.parseLong((splited[1])));

			ctx.write(k2, v2);

//			System.out.println("the real map output...");

//			System.out.println("<"+k2.first+","+v2+">");

		}

	}

	static class MyReducer extends Reducer<NewK2, LongWritable, LongWritable, LongWritable> {

		long v3 = 0;

		protected void reduce(NewK2 k2, Iterable<LongWritable> v2s, Context ctx) throws IOException, InterruptedException {

			System.out.println("the reduce() is called...");

			for(LongWritable secend : v2s) {

				v3 = secend.get();

				System.out.println("<"+k2.first+","+k2.second+">, "+v3+"");

			}

			System.out.println("--------------------------------------------");

			System.out.println("the real reduce output...");

			System.out.println("<"+k2.first+","+v3+">");

			ctx.write(new LongWritable(k2.first), new LongWritable(v3));

			System.out.println("--------------------------------------------");

		}

	}

}

MapReduce编程小结的更多相关文章

MapReduce 编程模型
一.简单介绍 1.MapReduce 应用广泛的原因之中的一个在于它的易用性.它提供了一个因高度抽象化而变得异常简单的编程模型. 2.从MapReduce 自身的命名特点能够看出,MapReduce ...
MapReduce编程实战之“调试”和"调优"
本篇内容在上一篇的"初识"环节,我们已经在本地和Hadoop集群中,成功的执行了几个MapReduce程序,对MapReduce编程,已经有了最初的理解. 在本篇文章中,我们对M ...
Hadoop MapReduce编程 API入门系列之压缩和计数器（三十）
不多说,直接上代码. Hadoop MapReduce编程 API入门系列之小文件合并(二十九) 生成的结果,作为输入源. 代码 package zhouls.bigdata.myMapReduce. ...
[Hadoop入门] - 1 Ubuntu系统 Hadoop介绍 MapReduce编程思想
Ubuntu系统 (我用到版本号是140.4) ubuntu系统是一个以桌面应用为主的Linux操作系统,Ubuntu基于Debian发行版和GNOME桌面环境.Ubuntu的目标在于为一般用户提供一 ...
mapreduce编程模型你知道多少？
上次新霸哥给大家介绍了一些hadoop的相关知识,发现大家对hadoop有了一定的了解,但是还有很多的朋友对mapreduce很模糊,下面新霸哥将带你共同学习mapreduce编程模型. mapred ...
hadoop2.2编程：使用MapReduce编程实例（转）
原文链接:http://www.cnblogs.com/xia520pi/archive/2012/06/04/2534533.html 从网上搜到的一篇hadoop的编程实例,对于初学者真是帮助太大 ...
《Data-Intensive Text Processing with mapReduce》读书笔记之二：mapreduce编程、框架及运行
搜狐视频的屌丝男士第二季大结局了,惊现波多野老师,怀揣着无比鸡冻的心情啊,可惜随着剧情的推进发展,并没有出现期待中的屌丝奇遇,大鹏还是没敢冲破尺度的界线.想百度些种子吧,又不想让电脑留下污点证据,要知 ...
Linux多线程编程小结
Linux多线程编程小结前一段时间由于开题的事情一直耽搁了我搞Linux的进度,搞的我之前学的东西都遗忘了,非常烦躁的说,如今抽个时间把之前所学的做个小节.文章内容主要总结于<Linux程序 ...
Windows Store 手势编程小结
Windows Store 手势编程小结最近完成了一个Windows Store上面的手势操作的页面.在这里总结了一下经验和心得,希望能和大家一起分享和讨论一下. 首先,要纠正一个误区,在Windo ...

随机推荐

ASP.NET通用权限组件实现一
沙发(SF)通用权限验证组件开篇上一篇提到了通用权限的设计思路,根据设计思路一步一步的来实现一个相对通用的权限验证组件.在VS2010下用C#语言基于.net framework2.0框架实现具体 ...
java使用注解和反射打造一个简单的jdbc工具类
a simple jdbc tools 如有转载和引用,请注明出处,谢谢 1. 定义我们需要的注解要想实现对数据库的操作,我们必须知道数据表名以及表中的字段名称以及类型,正如hibernate 使用 ...
为什么JavaScript函数中的参数前面不能加var
首先这里是JavaScript的语法规则. 其次在调用function()函数的时候参数时外部传入的.在传入之前就已经被声明了.没必要在函数参数里声明. 如果想要在函数里用新的参数 function( ...
如何让secureCRT显示Linux的颜色
style="padding-bottom: 0px; line-height: 1.5; margin: 0px; padding-left: 0px; padding-right: 0p ...
播放器音乐源之天天动听API
搜索歌曲API:http://so.ard.iyyin.com/s/song_with_out?q={0}&page={1}&size={2} {0}=需要搜索的歌曲或歌手 {1}=查 ...
Python成长之路_装饰器
一.初入装饰器 1.首先呢我们有这么一段代码,这段代码假设是N个业务部门的函数 def f1(aaa): print('我是F1业务') if aaa == 'f1': return 'ok' def ...
ip聚合（百度之星资格赛1003）
IP聚合点击这里 Problem Description 当今世界,网络已经无处不在了,小度熊由于犯了错误,当上了度度公司的网络管理员,他手上有大量的 IP列表,小度熊想知道在某个固定的子网掩码下, ...
java学习：AWT组件和事件处理的笔记(1)--Frame
1.java的抽象窗口工具包(AWT)中包含了许多类来支持GUI设计2.AWT由java的java.awt包提供3.再进行GUI编程时,要理解:容器类(Container),组件(component) ...
Listview注意事项
1.缓存 @Override public View getView(int position, View convertView, ViewGroup parent) { ViewHolder ho ...
正式学习React(五) Reactjs 的 PropTypes 使用方法
propTypes 使用來規範元件Props的型別與必需狀態 var Test = React.createClass({ propTypes: { // required requiredFunc: ...

MapReduce编程小结

MapReduce编程小结的更多相关文章

随机推荐

热门专题