MapReduce实战（二）自定义类型排序

需求：

基于上一道题，我想将结果按照总流量的大小由大到小输出。

思考：

默认mapreduce是对key字符串按照字母进行排序的，而我们想任意排序，只需要把key设成一个类，再对该类写一个compareTo（大于要比较对象返回1，等于返回0，小于返回-1）方法就可以了。

注：这里如果是实现java.lang.Comparable接口，最终报错，还是直接实现WritableComparable吧。

FlowBean.java更改如下：

package cn.darrenchan.hadoop.mr.flow;

import java.io.DataInput;

import java.io.DataOutput;

import java.io.IOException;

import org.apache.hadoop.io.Writable;

import org.apache.hadoop.io.WritableComparable;

public class FlowBean implements WritableComparable<FlowBean> {

    private String phoneNum;// 手机号

    private long upFlow;// 上行流量

    private long downFlow;// 下行流量

    private long sumFlow;// 总流量

    public FlowBean() {

        super();

    }

    public FlowBean(String phoneNum, long upFlow, long downFlow) {

        super();

        this.phoneNum = phoneNum;

        this.upFlow = upFlow;

        this.downFlow = downFlow;

        this.sumFlow = upFlow + downFlow;

    }

    public String getPhoneNum() {

        return phoneNum;

    }

    public void setPhoneNum(String phoneNum) {

        this.phoneNum = phoneNum;

    }

    public long getUpFlow() {

        return upFlow;

    }

    public void setUpFlow(long upFlow) {

        this.upFlow = upFlow;

    }

    public long getDownFlow() {

        return downFlow;

    }

    public void setDownFlow(long downFlow) {

        this.downFlow = downFlow;

    }

    public long getSumFlow() {

        return sumFlow;

    }

    public void setSumFlow(long sumFlow) {

        this.sumFlow = sumFlow;

    }

    @Override

    public String toString() {

        return upFlow + "\t" + downFlow + "\t" + sumFlow;

    }

    // 从数据流中反序列出对象的数据

    // 从数据流中读出对象字段时，必须跟序列化时的顺序保持一致

    @Override

    public void readFields(DataInput in) throws IOException {

        phoneNum = in.readUTF();

        upFlow = in.readLong();

        downFlow = in.readLong();

        sumFlow = in.readLong();

    }

    // 将对象数据序列化到流中

    @Override

    public void write(DataOutput out) throws IOException {

        out.writeUTF(phoneNum);

        out.writeLong(upFlow);

        out.writeLong(downFlow);

        out.writeLong(sumFlow);

    }

    @Override

    public int compareTo(FlowBean flowBean) {

        return sumFlow > flowBean.getSumFlow() ? -1 : 1;

    }

}

建立文件SortMR.java:

package cn.darrenchan.hadoop.mr.flowsort;

import java.io.IOException;

import org.apache.commons.io.output.NullWriter;

import org.apache.commons.lang.StringUtils;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.NullWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import cn.darrenchan.hadoop.mr.flow.FlowBean;

//执行命令：hadoop jar flowsort.jar cn.darrenchan.hadoop.mr.flowsort.SortMR /flow/output /flow/outputsort

public class SortMR {

    public static class SortMapper extends

            Mapper<LongWritable, Text, FlowBean, NullWritable> {

        // 拿到一行数据，切分出各字段，封装为一个flowbean，作为key输出

        @Override

        protected void map(LongWritable key, Text value, Context context)

                throws IOException, InterruptedException {

            String line = value.toString();

            String[] words = StringUtils.split(line, "\t");

            String phoneNum = words[0];

            long upFlow = Long.parseLong(words[1]);

            long downFlow = Long.parseLong(words[2]);

            context.write(new FlowBean(phoneNum, upFlow, downFlow),

                    NullWritable.get());

        }

    }

    public static class SortReducer extends

            Reducer<FlowBean, NullWritable, Text, FlowBean> {

        @Override

        protected void reduce(FlowBean key, Iterable<NullWritable> values,

                Context context) throws IOException, InterruptedException {

            String phoneNum = key.getPhoneNum();

            context.write(new Text(phoneNum), key);

        }

    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        Job job = Job.getInstance(conf);

        job.setJarByClass(SortMR.class);

        job.setMapperClass(SortMapper.class);

        job.setReducerClass(SortReducer.class);

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(FlowBean.class);

        job.setMapOutputKeyClass(FlowBean.class);

        job.setMapOutputValueClass(NullWritable.class);

        FileInputFormat.setInputPaths(job, new Path(args[0]));

        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

我们现在处理的结果是上一次实验的输出结果，打成jar包flowsort.jar，执行命令：

hadoop jar flowsort.jar cn.darrenchan.hadoop.mr.flowsort.SortMR /flow/output /flow/outputsort

得到的处理信息如下：

17/02/26 05:22:36 INFO client.RMProxy: Connecting to ResourceManager at weekend110/192.168.230.134:8032
17/02/26 05:22:36 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/02/26 05:22:36 INFO input.FileInputFormat: Total input paths to process : 1
17/02/26 05:22:36 INFO mapreduce.JobSubmitter: number of splits:1
17/02/26 05:22:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1488112052214_0003
17/02/26 05:22:37 INFO impl.YarnClientImpl: Submitted application application_1488112052214_0003
17/02/26 05:22:37 INFO mapreduce.Job: The url to track the job: http://weekend110:8088/proxy/application_1488112052214_0003/
17/02/26 05:22:37 INFO mapreduce.Job: Running job: job_1488112052214_0003
17/02/26 05:24:16 INFO mapreduce.Job: Job job_1488112052214_0003 running in uber mode : false
17/02/26 05:24:16 INFO mapreduce.Job: map 0% reduce 0%
17/02/26 05:24:22 INFO mapreduce.Job: map 100% reduce 0%
17/02/26 05:24:28 INFO mapreduce.Job: map 100% reduce 100%
17/02/26 05:24:28 INFO mapreduce.Job: Job job_1488112052214_0003 completed successfully
17/02/26 05:24:28 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=933
FILE: Number of bytes written=187799
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=735
HDFS: Number of bytes written=623
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=3077
Total time spent by all reduces in occupied slots (ms)=2350
Total time spent by all map tasks (ms)=3077
Total time spent by all reduce tasks (ms)=2350
Total vcore-seconds taken by all map tasks=3077
Total vcore-seconds taken by all reduce tasks=2350
Total megabyte-seconds taken by all map tasks=3150848
Total megabyte-seconds taken by all reduce tasks=2406400
Map-Reduce Framework
Map input records=22
Map output records=22
Map output bytes=883
Map output materialized bytes=933
Input split bytes=112
Combine input records=0
Combine output records=0
Reduce input groups=22
Reduce shuffle bytes=933
Reduce input records=22
Reduce output records=22
Spilled Records=44
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=142
CPU time spent (ms)=1280
Physical memory (bytes) snapshot=218406912
Virtual memory (bytes) snapshot=726446080
Total committed heap usage (bytes)=137433088
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=623
File Output Format Counters
Bytes Written=623

最终结果如下，可以看到是排序好的。

1363157985069 186852 200 187052
1363157985066 2481 24681 27162
1363157990043 63 11058 11121
1363157986072 18 9531 9549
1363157982040 102 7335 7437
1363157984041 9 6960 6969
1363157995093 3008 3720 6728
1363157995074 4116 1432 5548
1363157992093 4938 200 5138
1363157973098 27 3659 3686
1363157995033 20 3156 3176
1363157984040 12 1938 1950
1363157986029 3 1938 1941
1363157991076 1512 200 1712
1363157993044 12 1527 1539
1363157993055 954 200 1154
1363157985079 180 200 380
1363157986041 180 200 380
1363157988072 120 200 320
1363154400022 0 200 200
1363157983019 0 200 200
1363157995052 0 200 200

MapReduce实战（二）自定义类型排序的更多相关文章

[c#基础]泛型集合的自定义类型排序
引用最近总有种感觉,自己复习的进度总被项目中的问题给耽搁了,项目中遇到的问题,不总结又不行,只能将复习基础方面的东西放后再放后.一直没研究过太深奥的东西,过去一年一直在基础上打转,写代码,反编译,不 ...
C# 泛型集合的自定义类型排序
一.泛型集合List<T>排序经sort方法之后,采用了升序的方式进行排列的. List<int> list = new List<int>() { 2, 4, ...
MapReduce实战：自定义输入格式实现成绩管理
1. 项目需求我们取有一份学生五门课程的期末考试成绩数据,现在我们希望统计每个学生的总成绩和平均成绩. 样本数据如下所示,每行数据的数据格式为:学号.姓名.语文成绩.数学成绩.英语成绩.物理成绩.化 ...
java利用自定义类型对树形数据类型进行排序
前言为什么集合在存自定义类型时需要重写equals和hashCode? 1.先说List集合 List集合在存数据时是可以重复的但是当我们需要判断一个对象是否在集合中存在时这样就有问题了! 因为我 ...
golang 自定义类型的排序sort
sort包中提供了很多排序算法,对自定义类型进行排序时,只需要实现sort的Interface即可,包括: func Len() int {... } func Swap(i, j int) {... ...
Struts(二十)：自定义类型转换器
如何自定义类型转换器: 1)为什么需要自定义类型转化器?strtuts2不能自动完成字符串到所有的类型: 2) 如何定义类型转化器? 步骤一:创建自定义类型转化器的类,并继承org.apache.st ...
《SpringMVC从入门到放肆》十二、SpringMVC自定义类型转换器
之前的教程,我们都已经学会了如何使用Spring MVC来进行开发,掌握了基本的开发方法,返回不同类型的结果也有了一定的了解,包括返回ModelAndView.返回List.Map等等,这里就包含了传 ...
java编程排序之自定义类型的集合，按业务需求排序
自定义引用类型放入集合中,按实际业务需求进行排序的两种思路第一种思路: (1)自定义实体类实现java.lang.Comparable接口,重写public int compareTo(Object ...
[Java]如何为一个自定义类型的List排序。
好吧,三年了,又重拾我的博客了,是因为啥呢,哈哈哈.今天被问到一个题目,当场答不出来,动手动的少了,再此记录下来. Q:有一个MyObject类型的List,MyObject定义如下: class M ...

随机推荐

maven工程小红叉处理方法
搞了个Maven工程在Eclipse上,刚开始说JDK版本不对,编译的时候老报错误,很容易搞明白, 本地JDK版本为1.7.0_79: diamond operator is not supporte ...
Centos&RHEL 6安装图形化
Linux是一个多任务的多用户的操作系统,而在安装linux的时候经常遇到的问题-没有图形化桌面.在上节中我们演示了RHEL7安装图形化的过程,下面我们演示Centos6的图形化安装. 一.Cento ...
Maven中setting.xml配置Demo
 <localRepository>G:\Java\apache-maven-3.5.2\repository</localReposi ...
C#秘密武器之异步编程
一.概述 1.什么是异步? 异步操作通常用于执行完成时间可能较长的任务,如打开大文件.连接远程计算机或查询数据库.异步操作在主应用程序线程以外的线程中执行.应用程序调用方法异步执行某个操作时,应用程序 ...
sprintf、vsprintf、sprintf_s、vsprintf_s、_snprintf、_vsnprintf、snprintf、vsnprintf 函数辨析
看了题目中的几个函数名是不是有点头晕?为了防止以后总在这样的细节里纠缠不清,今天我们就来好好地辨析一下这几个函数的异同. 实验环境: Windows下使用VS2017Linux下使用gcc4.9.4 ...
HTML5学习笔记简明版（5）：input的type超级类型
HTML5为input的type类型添加了多种枚举值,用来表达不同的意思.同事具有验证功能,假设格式不正确,浏览器将原始提供错误提示,堪称超级牛X啊,详细例如以下: Keyword Data type ...
GUID转换成16位字符串或19位唯一字符串
整理几个经常使用GUID转换成16位字符串或19位唯一字符串方法: /// <summary> /// 依据GUID获取16位的唯一字符串 /// Author : 付义方 /// < ...
获取SQL Server的安装时间
近期安装SQL Server 2014时.还没有正式的License,仅仅能试用3个月.想知道什么时候到期,就要知道SQL Server 2014是什么时候安装的.假设你没有特意记录安装日期(实际大部 ...
C#代码覆盖率 -vsinstr和OpenCover
最近接触的项目涉及到C#开发的应用,测试过程中我们需要去分析C#的代码覆盖率,问了一些人,在网上也搜了一些,零碎的找到很多资料,但是都不是很完整,实际使用的过程中还是走了不少弯路.到现在为止,有两种可 ...
ss - float浮动模块的高度问题解决方案
当一个Div中的子元素都是浮动元素时,该div是没有高度的.通常会带来很多困扰,解决方案如下: 低版本统配兼容: overflow: hidden; 下面是不支持低配浏览器,而且似乎该效果对 P 标签 ...

MapReduce实战（二）自定义类型排序

MapReduce实战（二）自定义类型排序的更多相关文章

随机推荐

热门专题