前言

本章主要讲述了如何在mapreduce任务中添加自定义的计数器，从所有任务中聚合信息，并且最终输出到mapreduce web ui中得到统计信息。

准备工作

数据集：ufo-60000条记录，这个数据集有一系列包含下列字段的UFO目击事件记录组成，每条记录的字段都是以tab键分割，请看http://www.cnblogs.com/cafebabe-yun/p/8679994.html

sighting date：UFO目击事件发生时间
Recorded date：报告目击事件的时间
Location：目击事件发生的地点
Shape：UFO形状
Duration：目击事件持续时间
Dexcription：目击事件的大致描述

例子：

19950915 19950915 Redmond, WA 6 min. Young man w/ 2 co-workers witness tiny, distinctly white round disc drifting slowly toward NE. Flew in dir. 90 deg. to winds.

需要共享的数据：州名缩写与全称的对应关系

数据：

AL      Alabama

AK      Alaska

AZ      Arizona

AR      Arkansas

CA      California

自定义计数器的使用

将数据集 ufo.tsv 上传到hdfs上

hadoop dfs -put ufo.tsv ufo.tsv

将共享数据数据上传到hdfs上，命令同上
创建文件 UFOCountingRecordValidationMapper.java ，并且输入以下代码：

import java.io.IOException;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.mapred.lib.*;

public class UFOCountingRecordValidationMapper extends MapReduceBase implements Mapper<LongWritable, Text, LongWritable, Text> {

    public enum LineCounters {

        BAD_LINES,

        TOO_MANY_TABS,

        TOO_FEW_TABS

    };

    @Override

    public void map(LongWritable key, Text value, OutputCollector<LongWritable, Text> output, Reporter reporter) throws IOException {

        String line = value.toString();

        if(validate(line, reporter)) {

            output.collect(key, value);

        }

    }

    private boolean validate(String line, Reporter reporter) {

        String[] words = line.split("\t");

        if (words.length != 6) {

            if (words.length < 6) {

                reporter.incrCounter(LineCounters.TOO_MANY_TABS

, 1);

              } else {

                reporter.incrCounter(LineCounters.TOO_FEW_TABS, 1);

            }

            reporter.incrCounter(LineCounters.BAD_LINES, 1);

            if ((reporter.getCounter(LineCounters.BAD_LINES).getCounter() % 10) == 0) {

                reporter.setStatus("Got 10 bad lines.");

                System.err.println("Read another 10 bad lines.");

            }

            return false;

        }

        return true;

    }

}

创建文件 UFOLocation3.java ，并输入以下代码：

import java.io.*;

import java.util.*;

import java.net.*;

import java.util.regex.*;

import org.apache.hadoop.conf.*;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.filecache.DistributedCache;

import org.apache.hadoop.io.*;

import org.apache.hadoop.mapred.*;

import org.apache.hadoop.mapred.lib.*;

public class UFOLocation3 {

    public static class MapClass extends MapReduceBase implements Mapper<LongWritable, Text, Text, LongWritable> {

        private final static LongWritable one = new LongWritable(1);

        private static Pattern locationPattern = Pattern.compile("[a-zA-Z]{2}[^a-zA-Z]*$");

        private Map<String, String> stateNames;    

        @Override

        public void configure(JobConf job) {

            try {

                Path[] cacheFiles = DistributedCache.getLocalCacheFiles(job);

                setupStateMap(cacheFiles[0].toString());

            } catch (IOException e) {

                System.err.println("Error reading state file.");

                System.exit(1);

            }

        }

        private void setupStateMap(String fileName) throws IOException {

            Map<String, String> stateCache = new HashMap<String, String>();

            BufferedReader reader = new BufferedReader(new FileReader(fileName));

            String line = null;

            while((line = reader.readLine()) != null) {

                String[] splits = line.split("\t");

                stateCache.put(splits[0], splits[1]);

            }

            stateNames = stateCache;

        }

        @Override

        public void map(LongWritable key, Text value, OutputCollector<Text, LongWritable> output, Reporter reporter) throws IOException {

            String line = value.toString();

            String[] fields = line.split("\t");

            String location = fields[2].trim();

            if(location.length() >= 2) {

                Matcher matcher = locationPattern.matcher(location);

                if(matcher.find()) {

                    int start = matcher.start();

                    String state = location.substring(start, start + 2);

                    output.collect(new Text(lookupState(state.toUpperCase())), one);

                }

            }

        }

        private String lookupState(String state) {

            String fullName = stateNames.get(state);

            if(fullName == null || "".equals(fullName)) {

                fullName = state;

            }

            return fullName;

        }

    }

    public static void main(String...args) throws Exception {

        Configuration config = new Configuration();

        JobConf conf = new JobConf(config, UFOLocation3.class);

        conf.setJobName("UFOLocation3");

        DistributedCache.addCacheFile(new URI("/user/root/states.txt"), conf);

        conf.setOutputKeyClass(Text.class);

        conf.setOutputValueClass(LongWritable.class);

        JobConf mapconf1 = new JobConf(false);

        ChainMapper.addMapper(conf, UFOCountingRecordValidationMapper.class, LongWritable.class, Text.class, LongWritable.class, Text.class, true, mapconf1);

        JobConf mapconf2 = new JobConf(false);

        ChainMapper.addMapper(conf, MapClass.class, LongWritable.class, Text.class, Text.class, LongWritable.class, true, mapconf2);

        conf.setMapperClass(ChainMapper.class);

        conf.setCombinerClass(LongSumReducer.class);

        conf.setReducerClass(LongSumReducer.class);

        FileInputFormat.setInputPaths(conf, args[0]);

        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);

    }

}

编译上述的两个文件

javac UFOCountingRecordValidationMapper.java UFOLocation3.java

将编译好的文件打包成jar文件

jar cvf ufo3.jar UFO*class

在hadoop上执行jar包

hadoop cvf ufo3.jar UFOLocation3 ufo.tsv output

查看输出结果

hadoop dfs -cat output/part-00000

在mapreduce web ui页面上查看统计信息
- 　　相应的job，进入job的统计信息页面

- 　　查看统计信息

[hadoop](3) MapReduce:创建计数器、任务状态和写入日志的更多相关文章

Hadoop基础-MapReduce的工作原理第一弹
Hadoop基础-MapReduce的工作原理第一弹作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 在本篇博客中,我们将深入学习Hadoop中的MapReduce工作机制,这些知识 ...
Hadoop基础-MapReduce的排序
Hadoop基础-MapReduce的排序作者:尹正杰版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.MapReduce的排序分类 1>.部分排序部分排序是对单个分区进行排序,举个 ...
Hadoop 新 MapReduce 框架 Yarn 详解【转】
[转自:http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/] 简介: 本文介绍了 Hadoop 自 0.23.0 版本 ...
hadoop之mapreduce详解（进阶篇）
上篇文章hadoop之mapreduce详解(基础篇)我们了解了mapreduce的执行过程和shuffle过程,本篇文章主要从mapreduce的组件和输入输出方面进行阐述. 一.mapreduce ...
hadoop(二MapReduce)
hadoop(二MapReduce) 介绍 MapReduce:其实就是把数据分开处理后再将数据合在一起. Map负责“分”,即把复杂的任务分解为若干个“简单的任务”来并行处理.可以进行拆分的前提是这 ...
Hadoop 新 MapReduce 框架 Yarn 详解
Hadoop 新 MapReduce 框架 Yarn 详解: http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-yarn/ Ap ...
用PHP编写Hadoop的MapReduce程序
用PHP编写Hadoop的MapReduce程序 Hadoop流虽然Hadoop是用Java写的,但是Hadoop提供了Hadoop流,Hadoop流提供一个API, 允许用户使用任何语言编 ...
HADOOP之MAPREDUCE程序应用二
摘要:MapReduce程序进行单词计数. 关键词:MapReduce程序单词计数数据源:人工构造英文文档file1.txt,file2.txt. file1.txt 内容 Hello Ha ...
对于Hadoop的MapReduce编程makefile
根据近期需要hadoop的MapReduce程序集成到一个大的应用C/C++书面框架.在需求make当自己主动MapReduce编译和打包的应用. 在这里,一个简单的WordCount1一个例子详细的 ...

随机推荐

Js dom 学习
节点类型文档节点: 一棵DOM树的顶端是文档节点,它呈现为整个页面(相当于document对象),当需要访问任何元素.属性或文本节点时,都需要通过文档节点来进行导航.(document.) 元素节点 ...
我的常用的Linux命令
环境:centos7 主要应用Linux命令是为了搭建环境,所以记录一下我的常用的Liunx命令一.常用目录.文件操作命令 1.显示目录列表命令 ls 显示当前目录下的可见文件 ls - ...
Arduino入门之前
胡乱乱的,就买了,这个 arduino的板子. 哎...本来明明是学动漫的,然后不小心就开始做软件了,然后越跑越偏...现在开始做硬件开发了... 其实还有树莓派可供选择,算了,不 ...
尝试Vue3.0
Composition API 纯函数式 <!DOCTYPE html> <html lang="en"> <head> <meta ch ...
TCP 和 UDP 的区别---还有一个UTP一
面试的时候会经常问到这些问题,所以要对比了解一下他们之间的差别,能讲出个所以然来.多积累多总结,懵逼中... TCP 和 UDP TCP与UDP基本区别 : 1.基于连接与无连接 2.TCP要求系统资 ...
linux shell中的正则表达式
正则表达式的使用正则表达式,又称规则表达式.(英语:Regular Expression [ˈreɡjulə] 规则的 [ iksˈpreʃən] 表达 ),在代码中常简写为regex.regexp ...
Python - pycharm 代码自动补全
有很多人说是代码补全功能未打开,的确,代码补全功能确实要打开才能用,打开方法 file---->power save mode,把这个前面的√号去掉即可
Java7/8 中的 HashMap 和 ConcurrentHashMap 全解析（转）
阅读前提:本文分析的是源码,所以至少读者要熟悉它们的接口使用,同时,对于并发,读者至少要知道 CAS.ReentrantLock.UNSAFE 操作这几个基本的知识,文中不会对这些知识进行介绍.Jav ...
POJ3321[苹果树] 树状数组/线段树 + dfs序
Apple Tree Time Limit: 2000MS Memory Limit: 65536K Total Submissions:39452 Accepted: 11694 Descr ...
OpenCV-----Numpy数组
Nunmpy数组包含: 强大的N维数组对象复杂的(广播)功能集成C / C ++和Fortran代码的工具有用的线性代数,傅立叶变换和随机数功能遍历与修改数组中的所有像素点 #对所有像素进行循 ...

[hadoop](3) MapReduce:创建计数器、任务状态和写入日志

前言

准备工作

自定义计数器的使用

[hadoop](3) MapReduce:创建计数器、任务状态和写入日志的更多相关文章

随机推荐

热门专题