MapReduce 中的Map后，sort不能对中文的key排序

今天写了一个用mapreduce求平均分的程序，结果是出来了，可是没有按照“学生名字”进行排序，如果是英文名字的话，结果是排好序的。

代码如下：

package com.pro.bq;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

import org.apache.hadoop.fs.Path;

public class AverageScore {

    public static class MapAvg extends Mapper<Object, Text, Text, IntWritable>

    {

        public void map(Object key, Text value,Context context)

                throws IOException, InterruptedException {  
//            String[] lineData=value.toString().split(" ");//split中间如果有很多“ ”的话lineData的长度增加，灵活性差
//            if(lineData.length==2)
//            {        
//                name.set(lineData[0]);
//                score.set(Integer.parseInt(lineData[1]));
//                context.write(name,score);
//            }

            String line=value.toString();

            StringTokenizer tokenizer=new StringTokenizer(line,"\n");

            while(tokenizer.hasMoreElements())

            {

                StringTokenizer token=new StringTokenizer(tokenizer.nextToken());

                Text name=new Text(token.nextToken());

                IntWritable score=new IntWritable(Integer.parseInt(token.nextToken()));

                context.write(name,score);

            }

        }

    }

    public static class ReduceAvg extends Reducer<Text, IntWritable, Text, IntWritable>

    {

        public void reduce(Text key, Iterable<IntWritable> values,Context context)

                throws IOException, InterruptedException {

            // TODO Auto-generated method stub

            int sum=0;

            int cnt=0;

            for(IntWritable val:values)

            {

                sum+=val.get();

                cnt++;

            }

            sum=(Integer)sum/cnt;

            context.write(key, new IntWritable(sum));

        }

    }

    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

        Configuration conf=new Configuration();

        String[] hdfsPath=new String[]{"hdfs://localhost:9000/user/haduser/input/averageTest/","hdfs://localhost:9000/user/haduser/output/outAvgScore/"};

        String[] otherArgs=new GenericOptionsParser(conf, hdfsPath).getRemainingArgs();

        if(otherArgs.length!=2)

        {

            System.err.println("<in> <out>!!");

            System.exit(2);

        }

        Job job=new Job();

        job.setJarByClass(AverageScore.class);

        job.setMapperClass(MapAvg.class);

        job.setReducerClass(ReduceAvg.class);

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

        FileOutputFormat.setOutputPath(job,new Path(otherArgs[1]));

        System.exit(job.waitForCompletion(true)?0:1);

    }

}

file1:

zhangsan

lisi

wangwu

zhaoliu 

file2:

张三

李四

王五

赵六    

file3:

zhangsan

lisi

wangwu

zhaoliu 

file4:

李四

张三

王五

赵六

结果如下：

lisi    38

wangwu    49

zhangsan    27

zhaoliu    60

张三    2

李四    1

王五    2

赵六    3

难道不支持中文的排序？？以后学会自己写Partitioner后是不是可以自己写排序的程序？？以后解决...

MapReduce 中的Map后，sort不能对中文的key排序的更多相关文章

MapReduce中的Shuffle和Sort分析
MapReduce 是现今一个非常流行的分布式计算框架,它被设计用于并行计算海量数据.第一个提出该技术框架的是Google 公司,而Google 的灵感则来自于函数式编程语言,如LISP,Scheme ...
Hadoop : MapReduce中的Shuffle和Sort分析
地址 MapReduce 是现今一个非常流行的分布式计算框架,它被设计用于并行计算海量数据.第一个提出该技术框架的是Google 公司,而Google 的灵感则来自于函数式编程语言,如LISP,Sch ...
MapReduce中的map个数
在map阶段读取数据前,FileInputFormat会将输入文件分割成split.split的个数决定了map的个数.影响map个数(split个数)的主要因素有: 1) 文件的大小.当块(dfs. ...
mapreduce中一个map多个输入路径
package duogemap; import java.io.IOException; import java.util.ArrayList; import java.util.List; imp ...
Hadoop框架下MapReduce中的map个数如何控制
控制map个数的核心源码 long minSize = Math.max(getFormatMinSplitSize(), getMinSplitSize(job)); //getFormatMinS ...
list中依据map<String,Object>的某个值排序
private void sort(List<Map<String, Object>> list) { Collections.sort(list, new Comparato ...
MapReduce中combine、partition、shuffle的作用是什么
http://www.aboutyun.com/thread-8927-1-1.html Mapreduce在hadoop中是一个比較难以的概念.以下须要用心看,然后自己就能总结出来了. 概括: co ...
Java Map 键值对排序按key排序和按Value排序
一.理论准备 Map是键值对的集合接口,它的实现类主要包括:HashMap,TreeMap,Hashtable以及LinkedHashMap等. TreeMap:基于红黑树(Red-Black tre ...
mapreduce 中 map数量与文件大小的关系
学习mapreduce过程中, map第一个阶段是从hdfs 中获取文件的并进行切片,我自己在好奇map的启动的数量和文件的大小有什么关系,进过学习得知map的数量和文件切片的数量有关系,那文件的大小 ...

随机推荐

C语言标准库函数strcpy与strcmp的简单实现
//C语言标准库函数strcpy的一种简单实现. //返回值:目标串的地址. //对于出现异常的情况ANSI-C99标准并未定义,故由实现者决定返回值,通常为NULL. //参数:des为目标字符串, ...
SOLVED: GATT callback fails to register
I finally figured this problem out. The device I am using is a Samsung Galaxy S4 and the actual prob ...
Qt Script
旧项目运行在Qt4.x上,要加上一个脚本逻辑,只能上Qt Script.(建议新项目使用QJSEngine) QT += script #include <QtScript> int cp ...
HTTP - 内容编码
HTTP 应用程序有时在发送之前需要对内容进行编码.例如,在把很大的 HTML 文档发送给通过慢速连接上来的客户端之前,服务器可能就会对它进行压缩,这样有助于减少传输实体的时间. 内容编码过程内容编 ...
C++实现简单的内存池
多进程编程多用在并发服务器的编写上,当收到一个请求时,服务器新建一个进程处理请求,同时继续监听.为了提高响应速度,服务器采用进程池的方法,在初始化阶段创建一个进程池,池中有许多预创建的进程,当请求到达 ...
linux 下安装 nginx
安装nginx版本为1.7.5 一.下载nginx 官方地址:http://www.nginx.org/ 下载地址:http://nginx.org/download/ Nginx官网提供了三个类型的 ...
汇编中Enter与Leave指令
Enter的作用相当==push ebp和mov ebp,esp 这后面两句大家很熟悉吧?函数开始一般都是这两句 Leave的作用相当==mov esp,ebp和pop ebp 而这后面这两句也很常见 ...
hibernate里createSQLQuery
一.addEntity()和setResultTransformer()方法 1. 使用SQLQuery 对原生SQL查询执行的控制是通过SQLQuery接口进行的,通过执行Session.creat ...
几款实用的 JavaScript 图形图表库
一款好的图表插件不是那么容易找到的.最近项目里需要实现统计图表功能,所以在网上搜罗了一圈,找到一些不错的图表插件,分享大家.众多周知,图形和图表要比文本更具表现力和说服力.这里给大家精心推荐几款实用的 ...
NGUI UIGrid 动态刷新布局 && BUG FIX
/// <summary> /// "1" => 对应的一个UISpirte,"1234" => 对应四个预设 /// </sum ...

MapReduce 中的Map后，sort不能对中文的key排序

MapReduce 中的Map后，sort不能对中文的key排序的更多相关文章

随机推荐

热门专题