MapReduce编程：数字排序

问题描述

将乱序数字按照升序排序。

思路描述

按照mapreduce的默认排序，依次输出key值。

代码

 package org.apache.hadoop.examples;

 import java.io.IOException;

 import java.util.Iterator;

 import java.util.StringTokenizer;

 import org.apache.hadoop.conf.Configuration;

 import org.apache.hadoop.fs.Path;

 import org.apache.hadoop.io.IntWritable;

 import org.apache.hadoop.io.Text;

 import org.apache.hadoop.mapreduce.Job;

 import org.apache.hadoop.mapreduce.Mapper;

 import org.apache.hadoop.mapreduce.Reducer;

 import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

 import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

 public class sort {

     public sort() {

     }

     public static void main(String[] args) throws Exception {

         Configuration conf = new Configuration();

         String fileAddress = "hdfs://localhost:9000/user/hadoop/";

         //String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();

         String[] otherArgs = new String[]{fileAddress+"number.txt", fileAddress+"output"};

         if(otherArgs.length < 2) {

             System.err.println("Usage: sort <in> [<in>...] <out>");

             System.exit(2);

         }

         Job job = Job.getInstance(conf, "sort");

         job.setJarByClass(sort.class);

         job.setMapperClass(sort.TokenizerMapper.class);

         //job.setCombinerClass(sort.SortReducer.class);

         job.setReducerClass(sort.SortReducer.class);

         job.setOutputKeyClass(IntWritable.class);

         job.setOutputValueClass(IntWritable.class);

         for(int i = 0; i < otherArgs.length - 1; ++i) {

             FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

         }

         FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));

         System.exit(job.waitForCompletion(true)?0:1);

     }

     public static class TokenizerMapper extends Mapper<Object, Text, IntWritable, IntWritable> {

         public TokenizerMapper() {

         }

         public void map(Object key, Text value, Context context) throws IOException, InterruptedException {

             StringTokenizer itr = new StringTokenizer(value.toString());

             while(itr.hasMoreTokens()) {

                 context.write(new IntWritable(Integer.parseInt(itr.nextToken())), new IntWritable(1));

             }

         }

     }

     public static class SortReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {

         private static IntWritable num = new IntWritable(1);

         public SortReducer() {

         }

         public void reduce(IntWritable key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

             for(Iterator<IntWritable> i$ = values.iterator(); i$.hasNext();i$.next()) {

                 context.write(num, key);

             }

            num = new IntWritable(num.get()+1);

         }

     }

 }

注：不能有combiner操作。

不然就会变成

MapReduce编程：数字排序的更多相关文章

【原创】MapReduce编程系列之二元排序
普通排序实现普通排序的实现利用了按姓名的排序,调用了默认的对key的HashPartition函数来实现数据的分组.partition操作之后写入磁盘时会对数据进行排序操作(对一个分区内的数据作排序 ...
Hadoop MapReduce编程学习
一直在搞spark,也没时间弄hadoop,不过Hadoop基本的编程我觉得我还是要会吧,看到一篇不错的文章,不过应该应用于hadoop2.0以前,因为代码中有 conf.set("map ...
hadoop2.2编程：使用MapReduce编程实例（转）
原文链接:http://www.cnblogs.com/xia520pi/archive/2012/06/04/2534533.html 从网上搜到的一篇hadoop的编程实例,对于初学者真是帮助太大 ...
MapReduce编程基础
MapReduce编程基础 1. WordCount示例及MapReduce程序框架 2. MapReduce程序执行流程 3. 深入学习MapReduce编程(1) 4. 参考资料及代码下载 & ...
MapReduce编程模型及其在Hadoop上的实现
转自:https://www.zybuluo.com/frank-shaw/note/206604 MapReduce基本过程关于MapReduce中数据流的传输过程,下图是一个经典演示: 关于上 ...
Hadoop学习笔记—11.MapReduce中的排序和分组
一.写在之前的 1.1 回顾Map阶段四大步骤首先,我们回顾一下在MapReduce中,排序和分组在哪里被执行: 从上图中可以清楚地看出,在Step1.4也就是第四步中,需要对不同分区中的数据进行排 ...
基于Hadoop 2.6.0运行数字排序的计算
上个博客写了Hadoop2.6.0的环境部署,下面写一个简单的基于数字排序的小程序,真正实现分布式的计算,原理就是对多个文件中的数字进行排序,每个文件中每个数字占一行,排序原理是按行读取后分块进行排序 ...
[Hadoop入门] - 1 Ubuntu系统 Hadoop介绍 MapReduce编程思想
Ubuntu系统 (我用到版本号是140.4) ubuntu系统是一个以桌面应用为主的Linux操作系统,Ubuntu基于Debian发行版和GNOME桌面环境.Ubuntu的目标在于为一般用户提供一 ...
mapreduce编程模型你知道多少？
上次新霸哥给大家介绍了一些hadoop的相关知识,发现大家对hadoop有了一定的了解,但是还有很多的朋友对mapreduce很模糊,下面新霸哥将带你共同学习mapreduce编程模型. mapred ...

随机推荐

matlab工作空间数据导入simulink
使用的是其中一种方式: 第一步在工作命令区 ,写命令: 第二步:保证导入simulink区,及from worker设置: 其中注意设置你的采样时间, 第三步设置scop : 采样时承接数据线上 ...
SpringBoot项目修改html后不即时编译
springboot templates 下的 html 修改后无法达到即时编译的效果,搜索资料后记录笔记.原文地址:https://www.cnblogs.com/jiangbei/p/843939 ...
Mysql笔试题(二)
(1)表名:购物信息购物人商品名称数量A 甲 2B 乙 4C 丙 ...
mysql----------原生的sql里面如何根据case then排序
1.按照三个字段都符合条件来排序 ORDER BY ( CASE WHEN is_top = 1 AND top_end_time>UNIX_TIMESTAMP() AN ...
UGUI动态更换精灵图片
//动态更换精灵图片 m_headimage.overrideSprite = Resources.Load("texture/"+info.HeadPortrait,typeof ...
c# 中HttpClient访问Https网站
c# 中HttpClient访问Https网站,加入如下代码: handler = new HttpClientHandler() ;handler.AllowAutoRedirect = true; ...
学Python的感受
这门课程已经上了两周了,虽然还没学到什么实质上的东西,只是做了几道题,但是我也感受到了Python的魅力.我感觉这门课真的很有用,比如老师所说的网络爬虫,我对这个非常感兴趣.再说说老师的教学方式,理论 ...
Python 冒泡排序、归并排序、快速排序
冒泡排序原理: 代码: def bubble_sort2(arr):for j in range(len(arr) - 1, 0, -1): # [n-1, n-2, ....2, 1]for i ...
迷宫问题（bfs广度优先搜索记录路径）
问题描述: 定义一个二维数组: int maze[5][5] = { 0, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, ...
Python练习:爬虫练习,从一个提供免费代理的网站中爬取IP地址信息
西刺代理,http://www.xicidaili.com/,提供免费代理的IP,是爬虫程序的目标网站. 开始写程序 import urllib.requestimport re def open_u ...

MapReduce编程：数字排序

问题描述

思路描述

代码

MapReduce编程：数字排序的更多相关文章

随机推荐

热门专题