前言

本章主要讲述了如何在mapreduce任务中添加自定义的计数器，从所有任务中聚合信息，并且最终输出到mapreduce web ui中得到统计信息。

准备工作

数据集：ufo-60000条记录，这个数据集有一系列包含下列字段的UFO目击事件记录组成，每条记录的字段都是以tab键分割，请看http://www.cnblogs.com/cafebabe-yun/p/8679994.html

sighting date：UFO目击事件发生时间
Recorded date：报告目击事件的时间
Location：目击事件发生的地点
Shape：UFO形状
Duration：目击事件持续时间
Dexcription：目击事件的大致描述

例子：

19950915 19950915 Redmond, WA 6 min. Young man w/ 2 co-workers witness tiny, distinctly white round disc drifting slowly toward NE. Flew in dir. 90 deg. to winds.

需要共享的数据：州名缩写与全称的对应关系

数据：

AL      Alabama

AK      Alaska

AZ      Arizona

AR      Arkansas

CA      California

自定义计数器的使用

将数据集 ufo.tsv 上传到hdfs上

hadoop dfs -put ufo.tsv ufo.tsv

将共享数据数据上传到hdfs上，命令同上
创建文件 UFOCountingRecordValidationMapper.java ，并且输入以下代码：

import java.io.IOException;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.mapred.lib.*;

public class UFOCountingRecordValidationMapper extends MapReduceBase implements Mapper<LongWritable, Text, LongWritable, Text> {

    public enum LineCounters {

        BAD_LINES,

        TOO_MANY_TABS,

        TOO_FEW_TABS

    };

    @Override

    public void map(LongWritable key, Text value, OutputCollector<LongWritable, Text> output, Reporter reporter) throws IOException {

        String line = value.toString();

        if(validate(line, reporter)) {

            output.collect(key, value);

        }

    }

    private boolean validate(String line, Reporter reporter) {

        String[] words = line.split("\t");

        if (words.length != 6) {

            if (words.length < 6) {

                reporter.incrCounter(LineCounters.TOO_MANY_TABS

, 1);

              } else {

                reporter.incrCounter(LineCounters.TOO_FEW_TABS, 1);

            }

            reporter.incrCounter(LineCounters.BAD_LINES, 1);

            if ((reporter.getCounter(LineCounters.BAD_LINES).getCounter() % 10) == 0) {

                reporter.setStatus("Got 10 bad lines.");

                System.err.println("Read another 10 bad lines.");

            }

            return false;

        }

        return true;

    }

}

创建文件 UFOLocation3.java ，并输入以下代码：

import java.io.*;

import java.util.*;

import java.net.*;

import java.util.regex.*;

import org.apache.hadoop.conf.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.filecache.DistributedCache;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.mapred.lib.*;

public class UFOLocation3 {

    public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable> {

        private final static LongWritable one = new LongWritable(1);

        private static Pattern locationPattern = Pattern.compile("[a-zA-Z]{2}[^a-zA-Z]*$");

        private Map<String, String> stateNames;    

        @Override

        public void configure(JobConf job) {

            try {

                Path[] cacheFiles = DistributedCache.getLocalCacheFiles(job);

                setupStateMap(cacheFiles[0].toString());

            } catch (IOException e) {

                System.err.println("Error reading state file.");

                System.exit(1);

            }

        }

        private void setupStateMap(String fileName) throws IOException {

            Map<String, String> stateCache = new HashMap<String, String>();

            BufferedReader reader = new BufferedReader(new FileReader(fileName));

            String line = null;

            while((line = reader.readLine()) != null) {

                String[] splits = line.split("\t");

                stateCache.put(splits[0], splits[1]);

            }

            stateNames = stateCache;

        }

        @Override

        public void map(LongWritable key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {

            String line = value.toString();

            String[] fields = line.split("\t");

            String location = fields[2].trim();

            if(location.length() >= 2) {

                Matcher matcher = locationPattern.matcher(location);

                if(matcher.find()) {

                    int start = matcher.start();

                    String state = location.substring(start, start + 2);

                    output.collect(new Text(lookupState(state.toUpperCase())), one);

                }

            }

        }

        private String lookupState(String state) {

            String fullName = stateNames.get(state);

            if(fullName == null || "".equals(fullName)) {

                fullName = state;

            }

            return fullName;

        }

    }

    public static void main(String...args) throws Exception {

        Configuration config = new Configuration();

        JobConf conf = new JobConf(config, UFOLocation3.class);

        conf.setJobName("UFOLocation3");

        DistributedCache.addCacheFile(new URI("/user/root/states.txt"), conf);

        conf.setOutputKeyClass(Text.class);

        conf.setOutputValueClass(LongWritable.class);

        JobConf mapconf1 = new JobConf(false);

        ChainMapper.addMapper(conf, UFOCountingRecordValidationMapper.class, LongWritable.class, Text.class, LongWritable.class, Text.class, true, mapconf1);

        JobConf mapconf2 = new JobConf(false);

        ChainMapper.addMapper(conf, MapClass.class, LongWritable.class, Text.class, Text.class, LongWritable.class, true, mapconf2);

        conf.setMapperClass(ChainMapper.class);

        conf.setCombinerClass(LongSumReducer.class);

        conf.setReducerClass(LongSumReducer.class);

        FileInputFormat.setInputPaths(conf, args[0]);

        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);

    }

}

编译上述的两个文件

javac UFOCountingRecordValidationMapper.java UFOLocation3.java

将编译好的文件打包成jar文件

jar cvf ufo3.jar UFO*class

在hadoop上执行jar包

hadoop cvf ufo3.jar UFOLocation3 ufo.tsv output

查看输出结果

hadoop dfs -cat output/part-00000

在mapreduce web ui页面上查看统计信息
- 　　相应的job，进入job的统计信息页面

- 　　查看统计信息

[hadoop](3) MapReduce:创建计数器、任务状态和写入日志的更多相关文章

Hadoop基础-MapReduce的工作原理第一弹
Hadoop基础-MapReduce的工作原理第一弹作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 在本篇博客中,我们将深入学习Hadoop中的MapReduce工作机制,这些知识 ...
Hadoop基础-MapReduce的排序
Hadoop基础-MapReduce的排序作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.MapReduce的排序分类 1>.部分排序部分排序是对单个分区进行排序,举个 ...
Hadoop 新 MapReduce 框架 Yarn 详解【转】
[转自:http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/] 简介: 本文介绍了 Hadoop 自 0.23.0 版本 ...
hadoop之mapreduce详解（进阶篇）
上篇文章hadoop之mapreduce详解(基础篇)我们了解了mapreduce的执行过程和shuffle过程,本篇文章主要从mapreduce的组件和输入输出方面进行阐述. 一.mapreduce ...
hadoop(二MapReduce)
hadoop(二MapReduce) 介绍 MapReduce:其实就是把数据分开处理后再将数据合在一起. Map负责“分”,即把复杂的任务分解为若干个“简单的任务”来并行处理.可以进行拆分的前提是这 ...
Hadoop 新 MapReduce 框架 Yarn 详解
Hadoop 新 MapReduce 框架 Yarn 详解: http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/ Ap ...
用PHP编写Hadoop的MapReduce程序
用PHP编写Hadoop的MapReduce程序 Hadoop流虽然Hadoop是用Java写的,但是Hadoop提供了Hadoop流,Hadoop流提供一个API, 允许用户使用任何语言编 ...
HADOOP之MAPREDUCE程序应用二
摘要:MapReduce程序进行单词计数. 关键词:MapReduce程序单词计数数据源:人工构造英文文档file1.txt,file2.txt. file1.txt 内容 Hello Ha ...
对于Hadoop的MapReduce编程makefile
根据近期需要hadoop的MapReduce程序集成到一个大的应用C/C++书面框架.在需求make当自己主动MapReduce编译和打包的应用. 在这里,一个简单的WordCount1一个例子详细的 ...

随机推荐

类TreeSet
/* * TreeSet能够对元素按照某种规则进行排序 * * 排序有2种方式 * A自然排序 * B比较器排序 * */ import java.util.TreeSet; /* * TreeSet ...
Nginx基本属性配置
Nginx基本属性配置 1.找到安装目录下conf 文件下的nginx.conf文件通过 Notepad++打开进行属性配置 image ==> image 2.worker_pro ...
TCP为什么要三次握手和四次挥手
http://www.jellythink.com/archives/705 简析TCP的三次握手与四次分手 https://zhuanlan.zhihu.com/p/24001696 计算机网络面试 ...
tips for using shortcuts
tips for using shortcuts for mac: command+ctrl+F:full screen(当前应用全屏之后有一个好处就是使用 4 tap 的手势可以在全屏的界面之 ...
javascript 阻止事件冒泡
阻止冒泡冒泡简单的举例来说,儿子知道了一个秘密消息,它告诉了爸爸,爸爸知道了又告诉了爷爷,一级级传递从而引起事件的混乱,而阻止冒泡就是不让儿子告诉爸爸,爸爸自然不会告诉爷爷了. 举个栗子: 父容器是 ...
前端项目中使用jsencrypt进行字段加密
前端项目中使用jsencrypt进行字段加密. 使用步骤:①获取公钥②实例化对象③设置公钥④将所需数据进行加密然后返回. 进行一个简单的封装如下 /** * npm install jsencrypt ...
[Markdown] 03 进阶语法第一弹
目录 1. YMAL 题头 2. 缩写 3. 强调 4. 自定义 <div> 标签 5. <cite> 标签 5. <code> 与 <br> 标签 6 ...
String hashCode 这个数字，很多人不知道！
作者:coolblog segmentfault.com/a/1190000010799123 1. 背景某天,我在写代码的时候,无意中点开了 String hashCode 方法.然后大致看了一下 ...
Django 路由层与视图层
1.路由层 1.1无名分组 1.2 有名分组 1.3 反向解析 1.4 路由分发 1.5 名称空间 2.伪静态网页 3.虚拟环境 4.视图层 1.1 JsonResponse 1.2 FBV与CBV ...
[POI2006]ORK-Ploughing（贪心，枚举）
[POI2006]ORK-Ploughing 题目描述 Byteasar, the farmer, wants to plough his rectangular field. He can begi ...

[hadoop](3) MapReduce:创建计数器、任务状态和写入日志

前言

准备工作

自定义计数器的使用

[hadoop](3) MapReduce:创建计数器、任务状态和写入日志的更多相关文章

随机推荐

热门专题