MapReduce(2): How does Mapper work

In the previous post, we've illustrated how Hadoop MapReduce prepares input for Mappers. Long story short, InputSplit convert physical storaged data into many logical unit, and each one will be processed by a RecordReader, who will generate input (K,V) pairs for Mapper. I used to be confused about how (K,V) pairs are generated, but actually it just breaks a 128M file into single lines (just an example), and each line is a (K,V) pair. A mapper process these pairs one by one untill the end of the file.

A user-defined mapper, takes input (K,V) pairs from RecordReader, generate new key/value pair set at the output side.Usually we call the new (K,V) pairs as 'immediate (K,V) pairs'. For example: in the post (Using MapReduce on Azure), we define a Mapper as following:

#!/usr/bin/env python

"""mapper.py"""

import sys

# input comes from STDIN (standard input)

for line in sys.stdin:

    # remove leading and trailing whitespace

    line = line.strip()

    # split the line into words

    words = line.split()

    # increase counters

    for word in words:

        # write the results to STDOUT (standard output);

        # what we output here will be the input for the

        # Reduce step, i.e. the input for reducer.py

        #

        # tab-delimited; the trivial word count is 1

        print '%s\t%s' % (word, 1)

We can see, this mapper just breaks a line into words set, and ouput immediate (K,V) pairs, in which key is the word and value is 1.

A funny but intuitive illustration for this process is cutting a car into pieces:

MapReduce(2): How does Mapper work的更多相关文章

Hadoop（十七）之MapReduce作业配置与Mapper和Reducer类
前言前面一篇博文写的是Combiner优化MapReduce执行,也就是使用Combiner在map端执行减少reduce端的计算量. 一.作业的默认配置 MapReduce程序的默认配置 1)概述 ...
MapReduce从输入文件到Mapper处理之间的过程
1.MapReduce代码入口 FileInputFormat.setInputPaths(job, new Path(input)); //设置MapReduce输入格式 job.waitForCo ...
027_编写MapReduce的模板类Mapper、Reducer和Driver
模板类编写好后写MapReduce程序,的模板类编写好以后只需要改参数就行了,代码如下: package org.dragon.hadoop.mr.module; import java.io.IOE ...
MapReduce ：基于 FileInputFormat 的 mapper 数量控制
本篇分两部分,第一部分分析使用 java 提交 mapreduce 任务时对 mapper 数量的控制,第二部分分析使用 streaming 形式提交 mapreduce 任务时对 mapper 数量 ...
Mapreduce报错：java.lang.ClassNotFoundException: Class Mapper not found
mapreduce找不到mapper类解决方法: 开始自己用的是mapreduce自己打包的一种方法: job.setJarByClass(StandardJob.class); 但这样一直在报错: ...
MapReduce在Shuffle阶段按Mapper输出的Value进行排序
ZKe ----------------- 在MapReduce框架中,Mapper的输出在Shuffle阶段,根据Key值分组之后,还将会根据Key值进行排序,因此Reducer的输出我们看到的结果 ...
[Hadoop in Action] 第4章编写MapReduce基础程序
基于hadoop的专利数据处理示例 MapReduce程序框架用于计数统计的MapReduce基础程序支持用脚本语言编写MapReduce程序的hadoop流式API 用于提升性能的Combine ...
MapReduce实例浅析
在文章<MapReduce原理与设计思想>中,详细剖析了MapReduce的原理,这篇文章则通过实例重点剖析MapReduce 本文地址:http://www.cnblogs.com/ar ...
MapReduce的模式、算法和用例
英文原文:<MapReduce Patterns, Algorithms, and Use Cases> https://highlyscalable.wordpress.com/2012 ...

随机推荐

IETester——用来测试IE5.5~IE11兼容性的工具
IETester是一款ie浏览器多版本测试工具,能很方便在ie5.5,ie6,ie7,ie8,ie9,ie10,ie11切换,只需安装一个软件,就可以解决N多ie浏览器的问题,满足大部分IE浏览器兼容 ...
VS2019配置MKL教程(Windows)
下载链接:https://software.intel.com/en-us/mkl 1.文件下载官网注册后,选择MKL下载下来,安装到指定目录就行,不在多说. 2.配置文件首先创建一个Window ...
sqli(7)
前言第7关导出文件GET字符型注入步骤OK,但是就是不能写入文件,不知是文件夹的问题还是自己操作的问题.但是确实,没有导入成功. 1. 查看闭合,看源码,发现闭合是((‘ ’)): 2.查看所在 ...
阿里P7前端需要哪些技能
原谅我copy过来的,但是这个条理很清楚很有借鉴意义前言以下是从公众号的文章中获取到的一位阿里的前端架构师整理的前端架构p7的技能图谱,当然不是最完整.最系统的,所以之后我会一直维护更新这里的内容 ...
tomcat启动报错：Error configuring application listener of class org.springframework.web.context.ContextLoaderListener
1.错误信息: 严重: Error configuring application listener of class org.springframework.web.context.ContextL ...
Test 6.24 T2 集合
问题描述有一个可重集合,一开始只有一个元素 0. 你可以进行若干轮操作,每轮你需要对集合中每个元素 x 执行以下三种操作之一: 将 x 变为 x+1; 选择两个非负整数 y,z 满足 y+z=x , ...
Test 6.23 T1 扫雷
题目背景题目描述输入格式输出格式样例输入输出数据范围解析我们设两个作弊器的参数分别为\((a_1,b_1)\)和\((a_2,b_2)\),那么设 \[ S1=\frac{a_1}{b_ ...
web高拍仪图片上传
公司引进高拍仪,想拍完照片点上传按钮直接上传图片.高拍仪接口能提供照片的本地路径,现在的问题是不用file控件选择,只有路径,不知道如何上传到服务器,求解决方案. 方法: 使用泽优Web图片上传控件( ...
Elven Postman
Elven Postman Time Limit: 1500/1000 MS (Java/Others) Memory Limit: 131072/131072 K (Java/Others)T ...
学习日记9、easyui控件两次请求服务器端问题
<select id="BankCard" class="easyui-combobox" style="width: 600px;" ...

MapReduce(2): How does Mapper work

MapReduce(2): How does Mapper work的更多相关文章

随机推荐

热门专题