Map Reduce Application(Partitioninig/Binning)
Map Reduce Application(Partitioninig/Group data by a defined key)
Assuming we want to group data by the year(2008 to 2016) of their [last access date time]. For each year, we use a reducer to collect them and output the data in this group/partition(year of the last access datetime). So, we want the MR to partition our key by year. We will lean what's the default partitioner and see how to set custom partitioner.
The default partitioner:
public int getPartition(K key, V value,
int numReduceTasks) {
return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;
}
Custom Partitioner:
job.setPartitionerClass(CustomPartitioner.class)
With blew partitioner, the data of different year of [last access date time] will be assigned to different / unique partition. The num of reduce tasks is 9.
public static class CustomPartitioner extends Partitioner<Text, Text>{
@Override
public int getPartition(Text key, Text value, int numReduceTasks){
if(numReduceTasks == 0){
return 0;
}
return key-2008
}
Binning pattern
The text/comments/answer/question....contains the specific words will be written into the corresponding files from mapper.
See below picture to understand the binning pattern. It is easier than partitioning as it does not have partition/sorting/shuffling and reducer(job.setNumReduceTasks(0)). The outputs from mappers compose the final outputs.
MultipleOutputs.addNamedOutput(job,"namedoutput",TextOutputFormat.class, NullWritable.class, Text.class)

In the mapper setup function, create the MultipleOutputs intance by calling its constructor
MultipleOutputs(TaskInputOutputContext<?,?,KEYOUT,VALUEOUT> context)
Creates and initializes multiple outputs support, it should be instantiated in the Mapper/Reducer setup method.
@Override
protected void setup(Context context){
maltipleOutputs = new MultipleOurputs(context);
}
Write your logic in the mapper function and output the result. "$tag/$tag-tag" means folder $pag will be created and $tag-tag is the prefix of the files(to distinguish the different mappers with suffix).
See doc for MultipleOutputs:https://hadoop.apache.org/docs/r3.0.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html
if(tag.equalsIgnoreCase("pig"){
multipleOutputs.write("namedoutput",key,value,"pig/pig-tag");
}
if(tag.equalsIgnoreCase("hive"){
multipleOutputs.write("namedoutput",key,value,"hive/hive-tag");
}
.....
Map Reduce Application(Partitioninig/Binning)的更多相关文章
- Map Reduce Application(Join)
We are going to explain how join works in MR , we will focus on reduce side join and map side join. ...
- Map Reduce Application(Top 10 IDs base on their value)
Top 10 IDs base on their value First , we need to set the reduce to 1. For each map task, it is not ...
- mapreduce: 揭秘InputFormat--掌控Map Reduce任务执行的利器
随着越来越多的公司采用Hadoop,它所处理的问题类型也变得愈发多元化.随着Hadoop适用场景数量的不断膨胀,控制好怎样执行以及何处执行map任务显得至关重要.实现这种控制的方法之一就是自定义Inp ...
- MapReduce剖析笔记之三:Job的Map/Reduce Task初始化
上一节分析了Job由JobClient提交到JobTracker的流程,利用RPC机制,JobTracker接收到Job ID和Job所在HDFS的目录,够早了JobInProgress对象,丢入队列 ...
- python--函数式编程 (高阶函数(map , reduce ,filter,sorted),匿名函数(lambda))
1.1函数式编程 面向过程编程:我们通过把大段代码拆成函数,通过一层一层的函数,可以把复杂的任务分解成简单的任务,这种一步一步的分解可以称之为面向过程的程序设计.函数就是面向过程的程序设计的基本单元. ...
- 记一次MongoDB Map&Reduce入门操作
需求说明 用Map&Reduce计算几个班级中,每个班级10岁和20岁之间学生的数量: 需求分析 学生表的字段: db.students.insert({classid:1, age:14, ...
- filter,map,reduce,lambda(python3)
1.filter filter(function,sequence) 对sequence中的item依次执行function(item),将执行的结果为True(符合函数判断)的item组成一个lis ...
- map reduce
作者:Coldwings链接:https://www.zhihu.com/question/29936822/answer/48586327来源:知乎著作权归作者所有,转载请联系作者获得授权. 简单的 ...
- python基础——map/reduce
python基础——map/reduce Python内建了map()和reduce()函数. 如果你读过Google的那篇大名鼎鼎的论文“MapReduce: Simplified Data Pro ...
随机推荐
- Swift_方法
Swift_方法 点击查看源码 ///方法 class Methods: NSObject { func test() { // self.testInstanceMethods() //实例方法 s ...
- 初窥UIKit Dynamics
原文来自这里. iOS7中可以方便的给物体添加动态物理特性,主要使用到UIDynamicAnimator,UIDynamicBehavior以及实现了UIDynamicItem协议的对象.在iOS7中 ...
- js 变速动画函数
//获取任意一个元素的任意一个属性的当前的值---当前属性的位置值 function getStyle(element, attr) { return window.getComputedStyle ...
- #leetcode刷题之路7- 整数反转
给出一个 32 位的有符号整数,你需要将这个整数中每位上的数字进行反转. 示例 1:输入: 123输出: 321 示例 2:输入: -123输出: -321 示例 3:输入: 120输出: 21 #i ...
- 【2013 ICPC亚洲区域赛成都站 F】Fibonacci Tree(最小生成树+思维)
Problem Description Coach Pang is interested in Fibonacci numbers while Uncle Yang wants him to do s ...
- HDU Ellipse(simpson积分)
Time Limit: 1000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)Total Submission( ...
- 跨浏览器实现placeholder效果的jQuery插件
曾经遇到这样一个问题,处理IE8密码框placeholder属性兼容性.几经周折,这个方案是可以解决问题的. 1.jsp页面引入js插件 <script type="text/java ...
- HTML5文本
1.重要文本.斜体文本 粗体:<strong></strong> 粗体:<b></b> 斜体:<em></em> 斜体:< ...
- 第2天 Java基础语法
第2天 Java基础语法 今日内容介绍 变量 运算符 变量 变量概述 前面我们已经学习了常量,接下来我们要学习变量.在Java中变量的应用比常量的应用要多很多.所以变量也是尤为重要的知识点! 什么是变 ...
- go学习笔记-并发
并发 goroutine goroutine是Go并行设计的核心.goroutine说到底其实就是协程,但是它比线程更小,十几个goroutine可能体现在底层就是五六个线程,Go语言内部帮你实现了这 ...