大数据笔记（十二）——使用MRUnit进行单元测试

package demo.wc;

import java.util.ArrayList;

import java.util.List;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mrunit.mapreduce.MapDriver;

import org.apache.hadoop.mrunit.mapreduce.MapReduceDriver;

import org.apache.hadoop.mrunit.mapreduce.ReduceDriver;

import org.junit.Test;

public class MRUnitWordCount {

    @Test

    public void testMapper() throws Exception{

        //设置一个环境变量（没有可能会报错）

        System.setProperty("hadoop.home.dir", "D:\\temp\\hadoop-2.4.1\\hadoop-2.4.1");

        //创建一个测试对象

        WordCountMapper mapper = new WordCountMapper();

        //创建一个MapDriver进行单元测试

        MapDriver<LongWritable, Text, Text, IntWritable> driver = new MapDriver<>(mapper);

        //ָ指定Map的输入值: k1  v1

        driver.withInput(new LongWritable(1), new Text("I love Beijing"));

        //ָ指定Map的输出值:k2   v2  ----> 期望值

        driver.withOutput(new Text("I"), new IntWritable(1))

              .withOutput(new Text("love"), new IntWritable(1))

              .withOutput(new Text("Beijing"), new IntWritable(1));

        //ִ执行单元测试，对比 期望的结果和实际的结果

        driver.runTest();

    }

    @Test

    public void testReducer() throws Exception{

        //设置一个环境变量

        System.setProperty("hadoop.home.dir", "D:\\temp\\hadoop-2.4.1\\hadoop-2.4.1");

        //创建一个测试对象

        WordCountReducer reducer = new WordCountReducer();

        //创建一个ReduceDriver进行单元测试

        ReduceDriver<Text, IntWritable, Text, IntWritable> driver = new ReduceDriver<>(reducer);

        //构造v3:List

        List<IntWritable> value3 = new ArrayList<>();

        value3.add(new IntWritable(1));

        value3.add(new IntWritable(1));

        value3.add(new IntWritable(1));

        //指定reducer的输入

        driver.withInput(new Text("Beijing"), value3);

        //指定reducer的输出

        driver.withOutput(new Text("Beijing"), new IntWritable(3));

        //ִ执行测试

        driver.runTest();

    }

    @Test

    public void testJob() throws Exception{

        //设置一个环境变量

        System.setProperty("hadoop.home.dir", "D:\\temp\\hadoop-2.4.1\\hadoop-2.4.1");

        //创建一个测试对象

        WordCountMapper mapper = new WordCountMapper();

        WordCountReducer reducer = new WordCountReducer();        

        //创建一个Driver

        //MapReduceDriver<K1, V1, K2, V2, K4, V4>

        MapReduceDriver<LongWritable, Text, Text, IntWritable, Text, IntWritable>

                driver = new MapReduceDriver<>(mapper,reducer);

        //指定Map输入的数据

        driver.withInput(new LongWritable(1), new Text("I love Beijing"))

              .withInput(new LongWritable(4), new Text("I love China"))

              .withInput(new LongWritable(7), new Text("Beijing is the capital of China"));

        //ָ指定Reducer的输出

//        driver.withOutput(new Text("I"), new IntWritable(2))

//              .withOutput(new Text("love"), new IntWritable(2))

//              .withOutput(new Text("Beijing"), new IntWritable(2))

//              .withOutput(new Text("China"), new IntWritable(2))

//              .withOutput(new Text("is"), new IntWritable(1))

//              .withOutput(new Text("the"), new IntWritable(1))

//              .withOutput(new Text("capital"), new IntWritable(1))

//              .withOutput(new Text("of"), new IntWritable(1));

        //指定Reducer的输出（默认排序规则）

        driver.withOutput(new Text("Beijing"), new IntWritable(2))

              .withOutput(new Text("China"), new IntWritable(2))

              .withOutput(new Text("I"), new IntWritable(2))

              .withOutput(new Text("capital"), new IntWritable(1))

              .withOutput(new Text("is"), new IntWritable(1))

              .withOutput(new Text("love"), new IntWritable(2))

              .withOutput(new Text("of"), new IntWritable(1))

              .withOutput(new Text("the"), new IntWritable(1));

        driver.runTest();

    }

}

大数据笔记（十二）——使用MRUnit进行单元测试的更多相关文章

大数据笔记（二十四）——Scala面向对象编程实例
===================== Scala语言的面向对象编程 ======================== 一.面向对象的基本概念:把数据和操作数据的方法放到一起,作为一个整体(类 c ...
大数据笔记（二十九）——RDD简介、特性及常用算子
1.什么是RDD? 最核心 (*)弹性分布式数据集,Resilent distributed DataSet (*)Spark中数据的基本抽象 (*)结合源码,查看RDD的概念 RDD属性 * Int ...
大数据笔记（二十六）——Scala语言的高级特性
===================== Scala语言的高级特性 ========================一.Scala的集合 1.可变集合mutable 不可变集合immutable / ...
大数据笔记（二十五）——Scala函数式编程
===================== Scala函数式编程 ======================== 一.Scala中的函数 (*) 函数是Scala中的头等公民,就和数字一样,可以在变 ...
大数据笔记（二十二）——大数据实时计算框架Storm
一. 1.对比:离线计算和实时计算离线计算:MapReduce,批量处理(Sqoop-->HDFS--> MR ---> HDFS) 实时计算:Storm和Spark Sparki ...
大数据笔记（二十）——NoSQL数据库之MemCached
一.为什么要把数据存入内存? 1.原因:快2.常见的内存数据库 (*)MemCached:看成Redis的前身,严格来说Memcached的不能叫数据库,原因:不支持持久化 (*)Redis:内存数据 ...
大数据笔记（二十八）——执行Spark任务、开发Spark WordCount程序
一.执行Spark任务: 客户端 1.Spark Submit工具:提交Spark的任务(jar文件) (*)spark提供的用于提交Spark任务工具 (*)example:/root/traini ...
大数据笔记（二十一）——NoSQL数据库之Redis
一.Redis内存数据库一个key-value存储系统,支持存储的value包括string(字符串).list(链表).set(集合).zset(sorted set--有序集合)和hash(哈希 ...
大数据笔记（二）——Apache Hadoop的体系结构
一.分布式存储 NameNode(名称节点) 1.维护HDFS文件系统,是HDFS的主节点. 2.接收客户端的请求:上传.下载文件.创建目录等. 3.记录客户端操作的日志(edits文件),保存了HD ...
大数据笔记（二十七）——Spark Core简介及安装配置
1.Spark Core: 类似MapReduce 核心:RDD 2.Spark SQL: 类似Hive,支持SQL 3.Spark Streaming:类似Storm =============== ...

随机推荐

指定pom文件jdk版本
<build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> ...
[转帖]16nm国内最先进兆芯展示x86 KX-6000八核处理器
16nm国内最先进兆芯展示x86 KX-6000八核处理器 https://www.cnbeta.com/articles/tech/847125.htm 在近日的2019北京国际互联网科技博览会暨 ...
MySql-Mysql技术内幕~SQL编程学习笔记（N）
1._rowid 类似Oracle的rowid mysql> ; +-------+----+----------------+-------------+---------------+--- ...
linux下定时器的实现
简介: linux下经常有这样的需求,需要定时轮询执行某种任务,当然,用shell脚本的话,crontab和at就可以满足要求.如果从C语言的角度来看,实现定时器也是一个比较简单的任务,因为具有普遍性 ...
剑指offer-顺序打印二叉树节点（系列）-树-python
转载自 https://blog.csdn.net/u010005281/article/details/79761056 非常感谢! 首先创建二叉树,然后按各种方式打印: class treeNo ...
rabitMQ-centos7安装
1.安装rabitMq之前需要安装Erlang cd /usr/local/ wget http://erlang.org/download/otp_src_18.3.tar.gz tar -zxvf ...
Linux架构之Nginx 高可用
第53章 Nginx之高可用Keepalived 一.Keepalived高可用基本概述 1.1)什么是高可用一般是指2台机器启动着完全相同的业务系统,当有一台机器down机了,另外一台服务器就能快 ...
Linux日常之显示文件总结
1. 将文件file1.file2的内容都显示出来,并显示行号 cat -n file1 file2 2. 将文件file1.file2中带有字符'h'的行显示出来 cat -n file1 file ...
VS2017报错：未提供初始值设定项
今天在使用VS2017写程序时,报错: 出错的代码如下: #include "pch.h" #include <iostream> #include <threa ...
Codeforces 矩阵题题单
Matrix CF 166E Tetrahedron dp方程设为 f[i] 最后在 D点,g[i] 表示最后不在D点.最后 g[] 可以通过矩阵加速数列求得,数据可以强化,复杂度 \(O(logn) ...

大数据笔记（十二）——使用MRUnit进行单元测试

大数据笔记（十二）——使用MRUnit进行单元测试的更多相关文章

随机推荐

热门专题