mapreduce自定义排序(map端1.4步)

3　　3

3　　2

3　　1

2　　2

2　　1

1　　1

-----------------期望输出

1　　1

2　　1

2　　2

3　　1

3　　2

3　　3

将以上数据进行排序，排序规则是：按照第一列升序排序，如果第一列数值相同，则按照第二列升序排序。

但是默认情况下结果是：

1　　1

2　　2

2　　1

3　　3

3　　2

3　　1

即默认情况下只有第一列参加排序，第二列并不参加，即原来的v2不能参与排序，想达到目标必须自定义类，该类必须将原来的k2和v2封装到一个类中，作为新的k2必须实现一个接口implements WritableComparable，于mapper reducer平级，并对其中方法进行实现。这里自定义类NewK2如下：

static class NewK2 implements WritableComparable<NewK2>{
Long first;
Long second;
public NewK2(){}
public NewK2(long first, long second){
this.first = first;
this.second = second;
}
@Override
public void readFields(DataInput in) throws IOException {
this.first = in.readLong();
this.second = in.readLong();
}

@Override
public void write(DataOutput out) throws IOException {
out.writeLong(first);
out.writeLong(second);
}

/**
* 当k2进行排序时，会调用该方法.
* 当第一列不同时，升序；当第一列相同时，第二列升序
*/
@Override
public int compareTo(NewK2 o) {
final long minus = this.first - o.first;
if(minus !=0){
return (int)minus;
}
return (int)(this.second - o.second);
}
@Override
public int hashCode() {
return this.first.hashCode()+this.second.hashCode();
}

public boolean equals(Object obj){

if(!(obj instanceof NewK2))

return false;

NewK2 NK2=(NewK2)obj;

return (this.first==NK2.first&&this.second==NK2.second);

}
}

----------------

static class MyMapper extends Mapper<LongWritable, Text, NewK2, LongWritable>{
protected void map(LongWritable key, Text value, Context context) throws Exception {
final String[] splited = value.toString().split("\t");
final NewK2 k2 = new NewK2(Long.parseLong(splited[0]), Long.parseLong(splited[1]));
final LongWritable v2 = new LongWritable(Long.parseLong(splited[1]));
context.write(k2, v2);
};
}

static class MyReducer extends Reducer<NewK2, LongWritable, LongWritable, LongWritable>{
protected void reduce(NewK2 k2, java.lang.Iterable<LongWritable> v2s, Context context) throws Exception {
context.write(new LongWritable(k2.first), new LongWritable(k2.second));
};
}

---------------------------

package sort;

import java.io.DataInput;

import java.io.DataOutput;

import java.io.IOException;

import java.net.URI;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FileSystem;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.io.WritableComparable;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

public class SortApp {

	static final String INPUT_PATH = "hdfs://mlj:9000/sort";

	static final String OUT_PATH = "hdfs://mlj:9000/sort_out";

	public static void main(String[] args) throws Exception{

		final Configuration configuration = new Configuration();

		final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), configuration);

		if(fileSystem.exists(new Path(OUT_PATH))){

			fileSystem.delete(new Path(OUT_PATH), true);

		}

		final Job job = new Job(configuration, SortApp.class.getSimpleName());

		//1.1 指定输入文件路径

		FileInputFormat.setInputPaths(job, INPUT_PATH);

		//指定哪个类用来格式化输入文件

		job.setInputFormatClass(TextInputFormat.class);

		//1.2指定自定义的Mapper类

		job.setMapperClass(MyMapper.class);

		//指定输出<k2,v2>的类型

		job.setMapOutputKeyClass(NewK2.class);

		job.setMapOutputValueClass(LongWritable.class);

		//1.3 指定分区类

		job.setPartitionerClass(HashPartitioner.class);

		job.setNumReduceTasks(1);

		//1.4 TODO 排序、分区

		//1.5  TODO （可选）合并

		//2.2 指定自定义的reduce类

		job.setReducerClass(MyReducer.class);

		//指定输出<k3,v3>的类型

		job.setOutputKeyClass(LongWritable.class);

		job.setOutputValueClass(LongWritable.class);

		//2.3 指定输出到哪里

		FileOutputFormat.setOutputPath(job, new Path(OUT_PATH));

		//设定输出文件的格式化类

		job.setOutputFormatClass(TextOutputFormat.class);

		//把代码提交给JobTracker执行

		job.waitForCompletion(true);

	}

	static class MyMapper extends Mapper<LongWritable, Text, NewK2, LongWritable>{

		protected void map(LongWritable key, Text value, org.apache.hadoop.mapreduce.Mapper<LongWritable,Text,NewK2,LongWritable>.Context context) throws java.io.IOException ,InterruptedException {

			final String[] splited = value.toString().split("\t");

			final NewK2 k2 = new NewK2(Long.parseLong(splited[0]), Long.parseLong(splited[1]));

			final LongWritable v2 = new LongWritable(Long.parseLong(splited[1]));

			context.write(k2, v2);

		};

	}

	static class MyReducer extends Reducer<NewK2, LongWritable, LongWritable, LongWritable>{

		protected void reduce(NewK2 k2, java.lang.Iterable<LongWritable> v2s, org.apache.hadoop.mapreduce.Reducer<NewK2,LongWritable,LongWritable,LongWritable>.Context context) throws java.io.IOException ,InterruptedException {

			context.write(new LongWritable(k2.first), new LongWritable(k2.second));

		};

	}

	/**

	 * 问：为什么实现该类？

	 * 答：因为原来的v2不能参与排序，把原来的k2和v2封装到一个类中，作为新的k2

	 *

	 */

	static class  NewK2 implements WritableComparable<NewK2>{

		Long first;

		Long second;

		public NewK2(){}

		public NewK2(long first, long second){

			this.first = first;

			this.second = second;

		}

		@Override

		public void readFields(DataInput in) throws IOException {

			this.first = in.readLong();

			this.second = in.readLong();

		}

		@Override

		public void write(DataOutput out) throws IOException {

			out.writeLong(first);

			out.writeLong(second);

		}

		/**

		 * 当k2进行排序时，会调用该方法.

		 * 当第一列不同时，升序；当第一列相同时，第二列升序

		 */

		@Override

		public int compareTo(NewK2 o) {

			final long minus = this.first - o.first;

			if(minus !=0){

				return (int)minus;

			}

			return (int)(this.second - o.second);

		}

		@Override

		public int hashCode() {

			return this.first.hashCode()+this.second.hashCode();

		}

	}

}

mapreduce自定义排序(map端1.4步)的更多相关文章

Hadoop mapreduce自定义排序WritableComparable
本文发表于本人博客. 今天继续写练习题,上次对分区稍微理解了一下,那根据那个步骤分区.排序.分组.规约来的话,今天应该是要写个排序有关的例子了,那好现在就开始! 说到排序我们可以查看下hadoop源码 ...
Hadoop学习之路(7)MapReduce自定义排序
本文测试文本: tom 20 8000 nancy 22 8000 ketty 22 9000 stone 19 10000 green 19 11000 white 39 29000 socrate ...
MapReduce自定义排序器不生效一个可能的原因
有问题的代码: package com.mytq.weather; import org.apache.hadoop.io.WritableComparable; import org.apache. ...
Hadoop mapreduce自定义分组RawComparator
本文发表于本人博客. 今天接着上次[Hadoop mapreduce自定义排序WritableComparable]文章写,按照顺序那么这次应该是讲解自定义分组如何实现,关于操作顺序在这里不多说了,需 ...
Hadoop on Mac with IntelliJ IDEA - 10 陆喜恒. Hadoop实战（第2版）6.4.1（Shuffle和排序）Map端内容整理
下午对着源码看陆喜恒. Hadoop实战(第2版)6.4.1 (Shuffle和排序)Map端,发现与Hadoop 1.2.1的源码有些出入.下面作个简单的记录,方便起见,引用自书本的语句都用斜体表 ...
Hadoop2.4.1 MapReduce通过Map端shuffle（Combiner）完成数据去重
package com.bank.service; import java.io.IOException; import org.apache.hadoop.conf.Configuration;im ...
Hadoop基础-Map端链式编程之MapReduce统计TopN示例
Hadoop基础-Map端链式编程之MapReduce统计TopN示例作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.项目需求对“temp.txt”中的数据进行分析,统计出各 ...
MapReduce在Map端的Combiner和在Reduce端的Partitioner
1.Map端的Combiner. 通过单词计数WordCountApp.java的例子,如何在Map端设置Combiner... 只附录部分代码: /** * 以文本 * hello you * he ...
CCF CSP 201503-2 数字排序（map+自定义排序）
题目链接:http://118.190.20.162/view.page?gpid=T26 返回试题列表问题描述试题编号: 201503-2 试题名称: 数字排序时间限制: 1.0s 内存限制: ...

随机推荐

Redis集群之优化系统参数
1.最大打开文件数量 (1)编辑资源限制文件,针对redis用户做资源访问控制,在文件尾加入最后两行, sudo vim /etc/security/limits.conf (2) sudo vim ...
face ++ 人脸识别技术初步
网站地址: https://console.faceplusplus.com.cn/documents/5671791主要有 1 人脸识别技术 2 人体识别技术 ...
关于springmvc接受简单参数和List集合数据的实现
首先要创建一个搭建一个springmvc的工程,至于如何搭建这里就不说了.给出比较重要的配置,项目目录结构如下,弄的比较简单,因为最近遇到一个需要传递List集合数据的问题,所以就当做实验. web. ...
永久关闭selinux | 防火墙
关闭SELinux的两种方法 1 永久方法 – 需要重启服务器修改/etc/selinux/config文件中设置SELINUX=disabled ,然后重启服务器. 2 临时方法 – 设置系统参数 ...
python+selenium自动化软件测试(第6章)：selenium phantomjs页面解析使用
我们都知道Selenium是一个Web的自动化测试工具,可以在多平台下操作多种浏览器进行各种动作,比如运行浏览器,访问页面,点击按钮,提交表单,浏览器窗口调整,鼠标右键和拖放动作,下拉框和对话框处理等 ...
使用Coding.net+Hexo+node.js+git来搭建个人博客
使用Coding.net来搭建基于Hexo的博客一.准备工作什么是Coding.net Coding可以说,就是国产的Github,但是,有一个功能使它似乎超越了GitHub-那就是 Web ID ...
[知了堂学习笔记]_JSON数据操作第2讲(JSON的封装与解析)
上一讲为大家讲了什么是JSON,那么这一讲为大家带来了在WEB项目中JSON的用法,也就是JSON的封装与解析. 此图是数据库中的部分内容一.JSON封装所谓的JSON封装,指的是在Servlet ...
深度学习在 CTR 中应用
欢迎大家前往腾讯云技术社区,获取更多腾讯海量技术实践干货哦~ 作者:高航一. Wide&&Deep 模型首先给出Wide && Deep [1] 网络结构: 本质上 ...
NCS8801S芯片RGB/LVDS转EDP功能简介
NCS8801S RGB/LVDS-to-eDP Converter (1/2/4-lane eDP) Features --Embedded-DisplayPort (eDP) Output 1/2 ...
Redis介绍和环境安装
-------------------Redis环境安装------------------- 1.安装 1.卸载软件 sudo apt-get remove redis-se ...

mapreduce自定义排序(map端1.4步)

mapreduce自定义排序(map端1.4步)的更多相关文章

随机推荐

热门专题