在eclipse使用map reduce编写word count程序生成jar包并在虚拟机运行的步骤

---恢复内容开始---

1.首先准备一个需要统计的单词文件 word.txt，我们的单词是以空格分开的，统计时按照空格分隔即可

hello hadoop
hello yarn
hello zookeeper
hdfs hadoop
select from hadoop
select from yarn
mapReduce
MapReduce

2.上传word.txt到hdfs根目录

$ bin/hdfs dfs -put test/word.txt /

3.准备工作完成后在eclipse编写代码，分别编写Map、Reduce、Driver等Java文件

WordCountMap.java

map执行我们的word.txt 文件是按行执行，每一行执行一个map

WordCountMap.java

map执行我们的word.txt 文件是按行执行，每一行执行一个map

package com.ijeffrey.mapreduce.wordcount.client;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
/**
* map 输出的键值对必须和reducer输入的键值对类型一致
* @author PXY
*
*/
public class WordCountMap extends Mapper<LongWritable, Text, Text, IntWritable> {

private Text keyout = new Text();
private IntWritable valueout = new IntWritable(1);

@Override
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {

String line = value.toString();
// 我的文件记录的单词是以空格记录单词，所以这里用空格来截取
String[] words = line.split(" ");

// 遍历数组，并以k v 对的形式输出
for (String word : words) {
keyout.set(word);
context.write(keyout, valueout);
}
}

}

WordCountReducer.java

package com.ijeffrey.mapreduce.wordcount.client;

import java.io.IOException;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

/**
* reducer 输入的键值对必须和map输出的键值对类型一致
* map <hello,1> <world,1> <hello,1> <apple,1> ....
* reduce 接收 <apple,[1]> <hello,[1,1]> <world,[1]>
* @author PXY
*
*/
public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable valueout = new IntWritable();

@Override
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {
int count = 0; // 统计总数

// 遍历数组，累加求和
for(IntWritable value : values){

// IntWritable类型不能和int类型相加，所以需要先使用get方法转换成int类型
count += value.get();
}

// 将统计的结果转成IntWritable
valueout.set(count);

// 最后reduce要输出最终的 k v 对
context.write(key, valueout);

}
}

WordCountDriver.java

package com.ijeffrey.mapreduce.wordcount.client;

import java.io.IOException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/**
* 运行主函数
* @author PXY
*
*/
public class WordCountDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();

// 获得一个job对象，用来完成一个mapreduce作业
Job job = Job.getInstance(conf);

// 让程序找到主入口
job.setJarByClass(WordCountDriver.class);

// 指定输入数据的目录，指定数据计算完成后输出的目录
// sbin/yarn jar share/hadoop/xxxxxxx.jar wordcount /wordcount/input/ /wordcount/output/
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));

// 告诉我调用那个map方法和reduce方法
job.setMapperClass(WordCountMap.class);
job.setReducerClass(WordCountReducer.class);

// 指定map输出键值对的类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);

// 指定reduce输出键值对的类型
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);

// 提交job任务
boolean result = job.waitForCompletion(true);
System.exit(result ? 0 : 1);

}
}

}

4.将编写完成的代码打成jar包,并在集群上运行

将jar上传到到服务器，启动服务后运行我们自己编写的MapReduce，统计根目录下的word.txt并将运行结果写入output

$ bin/yarn jar test/wordCount.jar com.ijeffrey.mapreduce.wordcount.client.WordCountDriver /word.txt /output

注意：运行jar的时候要添加Driver的完全路径

运行完成后查看output结果:

$ bin/hdfs dfs -text /output12/part-r-00000

在eclipse使用map reduce编写word count程序生成jar包并在虚拟机运行的步骤的更多相关文章

Eclipse笔记-sun.misc.BASE64Encoder找不到jar包的解决方法
从SVN检出新项目,在Eclipse中报错如下: 转: Eclipse笔记-sun.misc.BASE64Encoder找不到jar包的解决方法 2018-01-04 00:36:20 雨临Lewis ...
spark编写word count
创建SparkContext对象的时候需要传递SparkConf对象,SparkConf至少需要包含spark.master和spark.app.name这两个参数,不然的话程序不能正常运行 obje ...
在Eclipse上打包并使用Proguard工具混淆jar包
近期由于工作须要,学习到了Android jar包的打包与混淆. 之前觉得还是非常easy的,可是自己深入研究下,发现还是有一些东西须要注意的,并且自己也踩了一些坑,在这里写下供同僚们借鉴借鉴. 转载 ...
Eclipse中如何添加相对路径的外部jar包
在eclipse中进行java编程的时候,常常需要引用外部jar包.而采用相对路径引用jar包可以大大方便java工程的拷贝,这样使得java工程从一个路径转移到另一个路径时不用大费周章的修改外包ja ...
java 项目打jar包，用cmd运行，并且编写运行脚本
项目是ideal编辑器的springboot项目的demo.打包就是在侧边栏,点击packge ,就会在target下生成jar包. 生成之后把 jar包放在一个文件夹中.新建一个txt文件,在txt ...
eclipse将javaSE项目导出成可执行jar包
将第三方包和项目打包到一块 step1:选中要导出的项目,右键选择Export step2:选择java/Runable JAR file step3:选择main主程序,选择第三方包打包的形式,推荐 ...
Hive中自定义Map/Reduce示例 In Python
Hive支持自定义map与reduce script.接下来我用一个简单的wordcount例子加以说明.使用Python开发(如果使用Java开发,请看这里). 开发环境: python:2.7.5 ...
Hadoop学习笔记2 - 第一和第二个Map Reduce程序
转载请标注原链接http://www.cnblogs.com/xczyd/p/8608906.html 在Hdfs学习笔记1 - 使用Java API访问远程hdfs集群中,我们已经可以完成了访问hd ...
Hadoop Map/Reduce教程
原文地址:http://hadoop.apache.org/docs/r1.0.4/cn/mapred_tutorial.html 目的先决条件概述输入与输出例子:WordCount v1.0 ...

随机推荐

04 Django模板
基本概念作为Web框架,Django提供了模板,用于编写html代码,还可以嵌入模板代码更快更方便的完成页面开发,再通过在视图中渲染模板,将生成最终的html字符串返回给客户端浏览器模版致力于表达 ...
python寻找模块的路径顺序
>>> import sys >>> sys.path ['', '/Library/Frameworks/Python.framework/Versions/3. ...
eclipse使用技巧的网站收集——转载（二）
写代码离不开文本编辑器,看代码也离不开,iar和keil编辑和阅读简直一般般了,因此使用eclipse可以看看代码,提高效率.网上有几个博客的文章,这里收集一下,以备忘. 以下文章转载自:http:/ ...
poj3617 best cow line（贪心题）
Best Cow Line Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 32687 Accepted: 8660 De ...
Leetcode 81. 搜索旋转排序数组 II
题目链接 https://leetcode-cn.com/problems/search-in-rotated-sorted-array-ii/description/ 题目描述假设按照升序排序的数 ...
NOIP 2017 小凯的疑惑
# NOIP 2017 小凯的疑惑思路 a,b 互质求最大不能表示出来的数k 则k与 a,b 互质这里有一个结论:(网上有证明)不过我是打表找的规律若 x,y(设x<y) 互质则 : ...
MySQL 的MAC终端一些指令总结
开启MySQL服务 sudo /usr/local/mysql/support-files/mysql.server start 关闭MySQL服务 udo /usr/local/mysql/supp ...
百度地图的API接口----多地址查询和经纬度
最近看了百度地图的API的接口,正想自己做点小东西,主要是多地址查询和经纬度坐标跟踪, 下面的代码直接另存为html就可以了,目前测试Chrome和360浏览器可以正常使用. <!DOCTYPE ...
主席树 - Luogu 1001 A+B problem
看着大佬们的解法我瑟瑟发抖我用主席树写一写吧 #include<iostream> #include<iomanip> #include<cmath> #incl ...
栈的push、pop序列【微软面试100题第二十九题】
题目要求: 输入两个整数序列,第一个序列表示栈的压入顺序,请判断第二个序列是否为该栈的弹出顺序.假设压入栈的所有数字均不相等.例如序列1.2.3.4.5是某栈的压栈序列,序列4.5.3.2.1是该压栈 ...

在eclipse使用map reduce编写word count程序生成jar包并在虚拟机运行的步骤

在eclipse使用map reduce编写word count程序生成jar包并在虚拟机运行的步骤的更多相关文章

随机推荐

热门专题