Hadoop自定义分组Group

matadata：

hadoop  a

spark   a

hive    a

hbase   a

tachyon a

storm   a

redis   a

自定义分组

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.io.WritableComparable;

import org.apache.hadoop.io.WritableComparator;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class MyGroup {

	public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {

		Configuration conf = new Configuration();

		String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

		if(otherArgs.length!=2){

			System.err.println("Usage databaseV1 <inputpath> <outputpath>");

		}

		Job job = Job.getInstance(conf, MyGroup.class.getSimpleName() + "1");

		job.setJarByClass(MyGroup.class);

		job.setMapOutputKeyClass(Text.class);

		job.setMapOutputValueClass(Text.class);

		job.setOutputKeyClass(Text.class);

		job.setOutputValueClass(Text.class);

		job.setMapperClass(MyMapper1.class);

		job.setGroupingComparatorClass(MyGroupComparator.class);

		job.setReducerClass(MyReducer1.class);

		job.setInputFormatClass(TextInputFormat.class);

		job.setOutputFormatClass(TextOutputFormat.class);

		FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

		FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

		job.waitForCompletion(true);

	}

	public static class MyMapper1 extends Mapper<LongWritable, Text, Text, Text>{

		@Override

		protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, Text>.Context context)

				throws IOException, InterruptedException {

			String[] spl=value.toString().split("\t");

			context.write(new Text(spl[0].trim()), new Text(spl[1].trim()));

		}

	}

	public static class MyReducer1 extends Reducer<Text, Text, Text, Text>{

		@Override

		protected void reduce(Text k2, Iterable<Text> v2s, Reducer<Text, Text, Text, Text>.Context context)

				throws IOException, InterruptedException {

			Long count=0L;

			for (@SuppressWarnings("unused") Text v2 : v2s) {

				count++;

				context.write(new Text("in--"+k2), new Text(count.toString()));

			}

			context.write(new Text("out--"+k2), new Text(count.toString()));

		}

	}

	public static class MyGroupComparator extends WritableComparator{

		public MyGroupComparator(){

			super(Text.class,true);

		}

		@SuppressWarnings("rawtypes")

		public int compare(WritableComparable a, WritableComparable b) {

			Text p1 = (Text) a;

			Text p2 = (Text) b;

			p1.compareTo(p2);

			return  0;

		  }

	}

}

结果

in--hadoop      1

in--hbase       2

in--hive        3

in--redis       4

in--spark       5

in--storm       6

in--tachyon     7

out--tachyon    7

然后看下默认分组

public static class MyGroupComparator extends WritableComparator{

		public MyGroupComparator(){

			super(Text.class,true);

		}

		@SuppressWarnings("rawtypes")

		public int compare(WritableComparable a, WritableComparable b) {

			Text p1 = (Text) a;

			Text p2 = (Text) b;

			return p1.compareTo(p2);

		  }

	}

结果

in--hadoop      1

out--hadoop     1

in--hbase       1

out--hbase      1

in--hive        1

out--hive       1

in--redis       1

out--redis      1

in--spark       1

out--spark      1

in--storm       1

out--storm      1

in--tachyon     1

out--tachyon    1

通过对比，自定义分组就很容易理解了

Hadoop自定义分组Group的更多相关文章

2 weekend110的hadoop的自定义排序实现 + mr程序中自定义分组的实现
我想得到按流量来排序,而且还是倒序,怎么达到实现呢? 达到下面这种效果, 默认是根据key来排, 我想根据value里的某个排, 解决思路:将value里的某个,放到key里去,然后来排下面,开始w ...
Hadoop mapreduce自定义分组RawComparator
本文发表于本人博客. 今天接着上次[Hadoop mapreduce自定义排序WritableComparable]文章写,按照顺序那么这次应该是讲解自定义分组如何实现,关于操作顺序在这里不多说了,需 ...
【Hadoop】Hadoop MR 自定义分组 Partition机制
1.概念 2.Hadoop默认分组机制--所有的Key分到一个组,一个Reduce任务处理 3.代码示例 FlowBean package com.ares.hadoop.mr.flowgroup; ...
关于MapReduce中自定义分组类（三）
Job类 /** * Define the comparator that controls which keys are grouped together * for a single ...
Table.Group分组…Group（Power Query 之 M 语言）
数据源: 10列55行数据,其中包括含有重复项的"部门"列和可求和的"金额"列. 目标: 按"部门"列进行分组,显示各部门金额小计. 操作过 ...
Oracle 表分组 group by和模糊查询like
分组group by写法 select 字段名 from 表名 group by 字段名查询这个字段名里的种类分组后可以加聚合函数select 字段名,聚合函数 from 表名 group by 字 ...
大数据量场景下storm自定义分组与Hbase预分区完美结合大幅度节省内存空间
前言:在系统中向hbase中插入数据时,常常通过设置region的预分区来防止大数据量插入的热点问题,提高数据插入的效率,同时可以减少当数据猛增时由于Region split带来的资源消耗.大量的预分 ...
storm自定义分组与Hbase预分区结合节省内存消耗
Hbas预分区在系统中向hbase中插入数据时,常常通过设置region的预分区来防止大数据量插入的热点问题,提高数据插入的效率,同时可以减少当数据猛增时由于Region split带来的资源消耗. ...
MySQL数据分组Group By 和 Having
现有以下的学生信息表: 若果现在想计算每个班的平均年龄,使用where的操作如下: SELECT Cno AS 班级, AVG(Sage) AS 平均年龄 FROM stu ; 这样的话,有多少个班就 ...

随机推荐

词法分析程序 LEX和VC6整合使用的一个简单例子
词法分析的理论知识不少,包括了正规式.正规文法.它们之间的转换以及确定的有穷自动机和不确定的有穷自动机等等... 要自己写一个词法分析器也不会很难,只要给出了最简的有穷自动机,就能很方便实现了,用if ...
hibernate----N-N--（人与地点）
package com.ij34.dao; import java.util.HashSet; import java.util.Set; import javax.persistence.*; @E ...
uums
http://blog.csdn.net/hudon/article/details/1506042 http://www.cnblogs.com/biakia/p/4779655.html http ...
bootstrap框架禁用谷歌字体
bootstrap框架禁用谷歌字体 H5框架换字体.png
ReportingServies——SQLServer报表开发综合实例
如果我们安装了sqlserver2008 R2,将会自动安装一个报表开发工具不要以为此报表开发工具只适合于sqlserver2008,其实在sqlserver2012中也是支持的,事实上我现在项目中 ...
jQuery als.js 跑马灯
ali.js是一款滚动插件,滚动的内容可包含文字和图片.它的API也很强大,包括滚动区域可见个数.每次滚动个数.滚动方向.是否循环滚动.是否自动滚动.滚动间隔时间.滚动动画速度.动画效果.滚动方向以及 ...
Gremlins.js – 模拟用户随机操作的 JS 测试库
Gremlins.js 是基于 JavaScript 编写的 Monkey 测试库,支持 Node.js 平台和浏览器中使用.Gremlins.js 随机模拟用户操作:单击窗口中的任意位置,在表格中输 ...
css3中的animation
不使用js或jquery,用css3实现一张图片的滑动.我用的是animation来设置所要应用的动画效果,首先在html中写好一个<div></div>,并放置一张图片在di ...
错误 1 “System.Data.DataRow.DataRow(System.Data.DataRowBuilder)”不可访问，因为它受保护级别限制
new DataRow 的方式: DataTable pDataTable = new DataTable(); DataRow pRow = new DataRow(); 正确的方式: DataRo ...
SharePoint:WebPartPageUserException This page has encountered a critical error
遇到如下webpart莫名错误,很常见吧.一般用户是直接删掉,知道原因的不算太多. 解决办法(Solution): Usually, This error caused by wrong entrie ...

Hadoop自定义分组Group

Hadoop自定义分组Group的更多相关文章

随机推荐

热门专题