Java编程MapReduce实现WordCount
Java编程MapReduce实现WordCount
1.编写Mapper
package net.toocruel.yarn.mapreduce.wordcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
import java.util.StringTokenizer;
/**
* @author : 宋同煜
* @version : 1.0
* @createTime : 2017/4/12 14:15
* @description :
*/
public class WordCountMapper extends Mapper<Object,Text,Text,IntWritable>{
//对于每个单词赋予出现频数1,因为单词是一个一个取出来的,所以每个数量都为1
private final static IntWritable one = new IntWritable(1);
//存储取出来的一行单词
private Text word = new Text();
@Override
protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {
//StringTokenizer 对输入单词进行切分
StringTokenizer itr = new StringTokenizer(value.toString());
while(itr.hasMoreTokens())
{
word.set(itr.nextToken());
context.write(word, one);
}
}
}
123456789101112131415161718192021222324252627282930313233
2.编写Reducer
package net.toocruel.yarn.mapreduce.wordcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
/**
* @author : 宋同煜
* @version : 1.0
* @createTime : 2017/4/12 14:16
* @description :
*/
public class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable>{
//存取对应单词总频数
private IntWritable result = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
//计算频数
int sum = 0;
for(IntWritable value:values){
sum+=value.get();
}
result.set(sum);
//写入输出
context.write(key, result);
}
}
12345678910111213141516171819202122232425262728293031
3.编写Job提交器
package net.toocruel.yarn.mapreduce.wordcount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/**
* wordcount 提交器 打包在hadoop集群任意机器执行 hadoop jar XXX.jar net.toocruel.yarn.mapreduce.wordcount WordCount
* @author : 宋同煜
* @version : 1.0
* @createTime : 2017/4/12 14:15
* @description :
*/
public class WordCount {
public static void main(String[] args)throws Exception {
//初始化配置
Configuration conf = new Configuration();
System.setProperty("HADOOP_USER_NAME","hdfs");
//创建一个job提交器对象
Job job = Job.getInstance(conf);
job.setJobName("WordCount");
job.setJarByClass(WordCount.class);
//设置map,reduce处理
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
//设置输出格式处理类
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//设置输入输出路径
FileSystem.get(new Configuration()).delete(new Path("/sty/wordcount/output")); //先清空输出目录
FileInputFormat.addInputPath(job, new Path("hdfs://cloudera:8020/sty/wordcount/input"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://cloudera:8020/sty/wordcount/output"));
boolean res = job.waitForCompletion(true);
System.out.println("任务名称: "+job.getJobName());
System.out.println("任务成功: "+(res?"Yes":"No"));
System.exit(res?0:1);
}
}
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748
4.打包
我用的maven打包,也可以Eclipse的直接导出jar包或Idea的build artifacts
hadoopSimple-1.0.jar
5.运行
在Yarn的ResourceManager 或NodeManager节点机器上运行
hadoop jar hadoopSimple-1.0.jar net.toocruel.yarn.mapreduce.wordcount.WordCount
6.运行结果
[root@cloudera ~]# hadoop jar hadoopSimple-1.0.jar net.toocruel.yarn.mapreduce.wordcount.WordCount
17/04/13 12:57:13 INFO client.RMProxy: Connecting to ResourceManager at cloudera/192.168.254.203:8032
17/04/13 12:57:14 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/04/13 12:57:18 INFO input.FileInputFormat: Total input paths to process : 1
17/04/13 12:57:18 INFO mapreduce.JobSubmitter: number of splits:1
17/04/13 12:57:18 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491999347093_0012
17/04/13 12:57:19 INFO impl.YarnClientImpl: Submitted application application_1491999347093_0012
17/04/13 12:57:19 INFO mapreduce.Job: The url to track the job: http://cloudera:8088/proxy/application_1491999347093_0012/
17/04/13 12:57:19 INFO mapreduce.Job: Running job: job_1491999347093_0012
17/04/13 12:57:32 INFO mapreduce.Job: Job job_1491999347093_0012 running in uber mode : false
17/04/13 12:57:32 INFO mapreduce.Job: map 0% reduce 0%
17/04/13 12:57:39 INFO mapreduce.Job: map 100% reduce 0%
17/04/13 12:57:47 INFO mapreduce.Job: map 100% reduce 33%
17/04/13 12:57:49 INFO mapreduce.Job: map 100% reduce 67%
17/04/13 12:57:53 INFO mapreduce.Job: map 100% reduce 100%
17/04/13 12:57:54 INFO mapreduce.Job: Job job_1491999347093_0012 completed successfully
17/04/13 12:57:54 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=162
FILE: Number of bytes written=497579
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=233
HDFS: Number of bytes written=62
HDFS: Number of read operations=12
HDFS: Number of large read operations=0
HDFS: Number of write operations=6
Job Counters
Launched map tasks=1
Launched reduce tasks=3
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=5167
Total time spent by all reduces in occupied slots (ms)=18520
Total time spent by all map tasks (ms)=5167
Total time spent by all reduce tasks (ms)=18520
Total vcore-seconds taken by all map tasks=5167
Total vcore-seconds taken by all reduce tasks=18520
Total megabyte-seconds taken by all map tasks=5291008
Total megabyte-seconds taken by all reduce tasks=18964480
Map-Reduce Framework
Map input records=19
Map output records=18
Map output bytes=193
Map output materialized bytes=150
Input split bytes=111
Combine input records=0
Combine output records=0
Reduce input groups=7
Reduce shuffle bytes=150
Reduce input records=18
Reduce output records=7
Spilled Records=36
Shuffled Maps =3
Failed Shuffles=0
Merged Map outputs=3
GC time elapsed (ms)=320
CPU time spent (ms)=4280
Physical memory (bytes) snapshot=805298176
Virtual memory (bytes) snapshot=11053834240
Total committed heap usage (bytes)=529731584
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=122
File Output Format Counters
Bytes Written=62
任务名称: WordCount
任务成功: Yes
Java编程MapReduce实现WordCount的更多相关文章
- Hadoop学习笔记: MapReduce Java编程简介
概述 本文主要基于Hadoop 1.0.0后推出的新Java API为例介绍MapReduce的Java编程模型.新旧API主要区别在于新API(org.apache.hadoop.mapreduce ...
- Java编程手冊-Collection框架(下)
建议先看Java编程手冊-Collection框架(上) 5. Set<E>接口与实现 Set<E>接口表示一个数学的集合,它不同意元素的反复,仅仅能包括一个null元素. ...
- eclipse运行mapreduce的wordcount
1,eclipse安装hadoop插件 插件下载地址:链接: https://pan.baidu.com/s/1U4_6kLFNiKeLsGfO7ahXew 提取码: as9e 下载hadoop-ec ...
- JAVA编程思想(第四版)学习笔记----4.8 switch(知识点已更新)
switch语句和if-else语句不同,switch语句可以有多个可能的执行路径.在第四版java编程思想介绍switch语句的语法格式时写到: switch (integral-selector) ...
- 《Java编程思想》学习笔记(二)——类加载及执行顺序
<Java编程思想>学习笔记(二)--类加载及执行顺序 (这是很久之前写的,保存在印象笔记上,今天写在博客上.) 今天看Java编程思想,看到这样一道代码 //: OrderOfIniti ...
- #Java编程思想笔记(一)——static
Java编程思想笔记(一)--static 看<Java编程思想>已经有一段时间了,一直以来都把笔记做在印象笔记上,今天开始写博客来记录. 第一篇笔记来写static关键字. static ...
- [Java编程思想-学习笔记]第3章 操作符
3.1 更简单的打印语句 学习编程语言的通许遇到的第一个程序无非打印"Hello, world"了,然而在Java中要写成 System.out.println("He ...
- Java编程思想重点笔记(Java开发必看)
Java编程思想重点笔记(Java开发必看) Java编程思想,Java学习必读经典,不管是初学者还是大牛都值得一读,这里总结书中的重点知识,这些知识不仅经常出现在各大知名公司的笔试面试过程中,而 ...
- JAVA编程讲座-吴老
JAVA系列公开课第4讲:多态系列课程:从JAVA编程零基础讲起,同时结合工作中遇到的具体实例,语言清晰易懂,连续10周+深入讲解,打下编程基础,让我们一起打来自动化测试的大门时间:4月25日(周一) ...
随机推荐
- TPO-11 C1 Use the gym pass
TPO-11 C1 Use the gym pass 第 1 段 1.Listen to a conversation between a student and a university emplo ...
- 初学Direct X(8) ——碰撞检测
初学Direct X(8) --碰撞检测 真正让一个游戏鹤立鸡群的是程序对碰撞的响应有多好,这里介绍两种检测的方法: 1) 基于边框的碰撞检测 2) 基于距离的碰撞检测 1. 基于边框的碰撞检测 1. ...
- Codeforces-A. Shortest path of the king(简单bfs记录路径)
A. Shortest path of the king time limit per test 1 second memory limit per test 64 megabytes input s ...
- 剑指offer-包含min函数的栈20
题目描述 定义栈的数据结构,请在该类型中实现一个能够得到栈中所含最小元素的min函数(时间复杂度应为O(1)). class Solution: def __init__(self): self.st ...
- 洛谷P1068 分数线划定:sort结构体排序+贪心
题目描述 世博会志愿者的选拔工作正在 A 市如火如荼的进行.为了选拔最合适的人才,A市对所有报名的选手进行了笔试,笔试分数达到面试分数线的选手方可进入面试. 面试分数线根据计划录取人数的150%划定, ...
- 实战小项目之ffmpeg推流yolo视频实时检测
之前实现了yolo图像的在线检测,这次主要完成远程视频的检测.主要包括推流--収流--检测显示三大部分 首先说一下推流,主要使用ffmpeg命令进行本地摄像头的推流,为了实现首屏秒开使用-g设置gop ...
- Java动态代码模式
java动态代理(JDK和cglib) JAVA的动态代理 代理模式 代理模式是常用的java设计模式,他的特征是代理类与委托类有同样的接口,代理类主要负责为委托类预处理消息.过滤消息.把消息转发给委 ...
- [leetcode-693-Binary Number with Alternating Bits]
Given a positive integer, check whether it has alternating bits: namely, if two adjacent bits will a ...
- 《剑指offer》---把数组排成最小的数
本文算法使用python3实现 1 题目描述: 输入一个正整数数组,把数组里所有数字拼接起来排成一个数,打印能拼接出的所有数字中最小的一个.例如输入数组 $ [3,32,321] $ ,则打印出这 ...
- Linux less命令语法
一.Linux less命令语法 less [参数] 文件 less命令非常强大,在此只介绍几个常用的参数,更多参数使用man less来查看Linux帮助手册. -b <缓冲区大小> 设 ...