Mapper maps input key/value pairs into intermediate key/value pairs.

E.g.

Input: (docID, doc)

Output: (term, 1)

Mapper Class Prototype:

Mapper<Object, Text, Text, IntWritable>
// Object:: INPUT_KEY
// Text:: INPUT_VALUE
// Text:: OUTPUT_KEY
// IntWritable:: OUTPUT_VALUE

Special Data Type for Mapper

IntWritable

A serializable and comparable object for integer.

Example:

private final static IntWritable one = new IntWritable(1);

Text

A serializable, deserializable and comparable object for string at byte level. It stores text in UTF-8 encoding.

Example:

private Text word = new Text();

Hadoop defines its own classes for general data types.

-- All "values" must have Writable interface;

-- All "keys" must have WritableComparable interface;

Map Method for Mapper

Method header

public void map(Object key, Text value, Context context
) throws IOException, InterruptedException
// Object key:: Declare data type of input key;
// Text value:: Declare data type of input value;
// Context context:: Declare data type of output. Context is often used for output data collection.

Tokenization

// Use Java built-in StringTokenizer to split input value (document) into words:
StringTokenizer itr = new StringTokenizer(value.toString());

Building (key, value) pairs

// Loop over all words:
while (itr.hasMoreTokens()) {
// convert built-in String back to Text:
word.set(itr.nextToken());
// build (key, value) pairs into Context and emit:
context.write(word, one);
}

Map Method Summary

Mapper class produces Mapper.Context object, which comprise a series of (key, value) pairs

  public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}

Overview of Mapper Class

public static class TokenizerMapper
extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1);
private Text word = new Text(); public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
}

Wordcount -- MapReduce example -- Mapper的更多相关文章

  1. MapReduce之Mapper类,Reducer类中的函数(转载)

    Mapper类4个函数的解析 Mapper有setup(),map(),cleanup()和run()四个方法.其中setup()一般是用来进行一些map()前的准备工作,map()则一般承担主要的处 ...

  2. hadoop中mapreduce的mapper抽象类和reduce抽象类

    mapreduce过程key 和value分别存什么值 https://blog.csdn.net/csdnliuxin123524/article/details/80191199 Mapper抽象 ...

  3. Wordcount -- MapReduce example -- Reducer

    Reducer receives (key, values) pairs and aggregate values to a desired format, then write produced ( ...

  4. MapReduce数据流-Mapper

  5. mapreduce程序编写(WordCount)

    折腾了半天.终于编写成功了第一个自己的mapreduce程序,并通过打jar包的方式运行起来了. 运行环境: windows 64bit eclipse 64bit jdk6.0 64bit 一.工程 ...

  6. Java编程MapReduce实现WordCount

    Java编程MapReduce实现WordCount 1.编写Mapper package net.toocruel.yarn.mapreduce.wordcount; import org.apac ...

  7. Kettle实现MapReduce之WordCount

    作者:Syn良子 出处:http://www.cnblogs.com/cssdongl 欢迎转载 抽空用kettle配置了一个Mapreduce的Word count,发现还是很方便快捷的,废话不多说 ...

  8. Hadoop(十七)之MapReduce作业配置与Mapper和Reducer类

    前言 前面一篇博文写的是Combiner优化MapReduce执行,也就是使用Combiner在map端执行减少reduce端的计算量. 一.作业的默认配置 MapReduce程序的默认配置 1)概述 ...

  9. hadoop2.7之Mapper/reducer源码分析

    一切从示例程序开始: 示例程序 Hadoop2.7 提供的示例程序WordCount.java package org.apache.hadoop.examples; import java.io.I ...

随机推荐

  1. jsp页面运行的步骤以及原理

    1.jsp页面在服务器端的执行步骤: 1)将jsp页面翻译成java文件 2)编译  java-class 3)执行返回结果(html页面)给客户端. 2.jsp页面运行的原理: jsp在服务器端运行 ...

  2. 如何不使用 submit 按钮来提交表单?

      如果我们不想用 submit 按钮来提交表单,我们也可以用超链接来提交,我们可以这样写代码: <a href=”javascript: document.myform.submit ();” ...

  3. lable随堂笔记

    lable标签与属性 lable标签:for属性,让标签与指定的input元素建立标签:将input元素包含在lable标签中. <table border="2" alig ...

  4. ASP.Net 中的三种控件

    ---恢复内容开始--- 第一种:HTML控件 ASP.Net把HTML控件当成普通字符串渲染到浏览器端,不去检查正确性.无法在服务器端进行处理. 比如: <input111 type=&quo ...

  5. WPF几个样式

    其实也是大家学的最多的,网上的. 1.老版360 2.360悬浮窗 不好意思,没有找到悬浮球的图片,随便一个代替了 3.老版迅雷 4.新版360 遗憾的是这个样式没有完整的源代码.只是一个演示和图片代 ...

  6. win10 pro 永久激活

    win10 专业版永久激活 转自雨林木风 查看激活状态 ·"Windows+R"打开"运行"窗口,输入"slmgr.vbs -xpr"并点击 ...

  7. 断言assert()与调试帮助

    列表内容assert()是一种预处理宏(preprocessor marco),使用一个表达式来作为条件,只在DEBUG模式下才有用. assert(expr); 对expr求值,如果expr为假,则 ...

  8. 本地打jar包到本地的Maven出库

    1.命令行输入 mvn install:install-file -DgroupId=jar包的groupId -DartifactId=jar包的artifactId -Dversion=jar包的 ...

  9. js中回调函数写法

    第一种方式 function studyEnglish(who){ document.write(who+"学习英语</br>"); } function study( ...

  10. Hive(9)-自定义函数

    一. 自定义函数分类 当Hive提供的内置函数无法满足你的业务处理需要时,此时就可以考虑使用用户自定义函数. 根据用户自定义函数类别分为以下三种: 1. UDF(User-Defined-Functi ...