mapreduce程序编写(WordCount)

折腾了半天。终于编写成功了第一个自己的mapreduce程序，并通过打jar包的方式运行起来了。

运行环境：

windows 64bit

eclipse 64bit

jdk6.0 64bit

一、工程准备

1、新建java project

2、导入jar包

新建一个user library 把hadoop文件夹里的hadoop-core和lib包里的所有包都导入进来，以免出错。

二、编码

1、主要是计算单词的小程序，测试用

package com.hirra;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

    //嵌套类 Mapper

    //Mapper<keyin,valuein,keyout,valueout>

    public static class WordCountMapper extends Mapper<Object, Text, Text, IntWritable>{

        private final static IntWritable one = new IntWritable(1);

        private Text word = new Text();  

        @Override

        protected void map(Object key, Text value, Context context)

                throws IOException, InterruptedException {

            StringTokenizer itr = new StringTokenizer(value.toString());

            while(itr.hasMoreTokens()){

                word.set(itr.nextToken());

                context.write(word, one);//Context机制

            }

        }

    }  

    //嵌套类Reducer

    //Reduce<keyin,valuein,keyout,valueout>

    //Reducer的valuein类型要和Mapper的va lueout类型一致,Reducer的valuein是Mapper的valueout经过shuffle之后的值

    public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable>{

        private IntWritable result = new IntWritable();  

        @Override

        protected void reduce(Text key, Iterable<IntWritable> values,

                Context context)

                throws IOException, InterruptedException {

            int sum  = 0;

            for(IntWritable i:values){

                sum += i.get();

            }

            result.set(sum);

            context.write(key,result);//Context机制

        }  

    }  

    public static void main(String[] args) throws Exception{

        Configuration conf = new Configuration();//获得Configuration配置 Configuration: core-default.xml, core-site.xml　
　　　　　//很关键
　　　　 conf.set("mapred.job.tracker", "hadoopmaster:9001");

　　　　String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();//获得输入参数[hdfs://localhost:9000/user/dat/input, hdfs://localhost:9000/user/dat/output]

        if(otherArgs.length != 2){//判断输入参数个数，不为两个异常退出

            System.err.println("Usage:wordcount <in> <out>");

            System.exit(2);

        }  

        ////设置Job属性

        Job job = new Job(conf,"word count");

        job.setJarByClass(WordCount.class);

        job.setMapperClass(WordCountMapper.class);

        job.setCombinerClass(WordCountReducer.class);//将结果进行局部合并

        job.setReducerClass(WordCountReducer.class);

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(IntWritable.class);  

        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));//传入input path

        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));//传入output path，输出路径应该为空，否则报错org.apache.hadoop.mapred.FileAlreadyExistsException。  

        System.exit(job.waitForCompletion(true)?0:1);//是否正常退出

    }

}

2、注意问题

有些jar包没导入会出现问题

三、生成jar包

1、eclipse自带功能export jar包

四、运行

1、ssh client工具导入至linux

2、hadoop运行,转到hadoop的bin目录下，执行下面指令:

./hadoop jar test.jar /README.txt /usr/dat/output

3、注意问题

output目录必须是之前不存在的路径。

mapreduce程序编写(WordCount)的更多相关文章

运行第一个MapReduce程序，WordCount
1.安装Eclipse 安装后如果无法启动重新配置Java路径(如果之前配置了Java) 2.下载安装eclipse的hadoop插件注意版本对应,放到/uer/lib/eclipse/plugin ...
Hadoop学习之路(5)Mapreduce程序完成wordcount
程序使用的测试文本数据: Dear River Dear River Bear Spark Car Dear Car Bear Car Dear Car River Car Spark Spark D ...
MapReduce 程序：WordCount
一起学Hadoop——使用IDEA编写第一个MapReduce程序(Java和Python)
上一篇我们学习了MapReduce的原理,今天我们使用代码来加深对MapReduce原理的理解. wordcount是Hadoop入门的经典例子,我们也不能免俗,也使用这个例子作为学习Hadoop的第 ...
大数据之路week07--day03（Hadoop深入理解，JAVA代码编写WordCount程序，以及扩展升级）
什么是MapReduce 你想数出一摞牌中有多少张黑桃.直观方式是一张一张检查并且数出有多少张是黑桃. MapReduce方法则是: 1.给在座的所有玩家中分配这摞牌 2.让每个玩家数自己手中的牌有几 ...
编写简单的Mapreduce程序并部署在Hadoop2.2.0上运行
今天主要来说说怎么在Hadoop2.2.0分布式上面运行写好的 Mapreduce 程序. 可以在eclipse写好程序,export或用fatjar打包成jar文件. 先给出这个程序所依赖的Mave ...
用PHP编写Hadoop的MapReduce程序
用PHP编写Hadoop的MapReduce程序 Hadoop流虽然Hadoop是用Java写的,但是Hadoop提供了Hadoop流,Hadoop流提供一个API, 允许用户使用任何语言编 ...
如何在maven项目里面编写mapreduce程序以及一个maven项目里面管理多个mapreduce程序
我们平时创建普通的mapreduce项目,在遍代码当你需要导包使用一些工具类的时候, 你需要自己找到对应的架包,再导进项目里面其实这样做非常不方便,我建议我们还是用maven项目来得方便多了话不多说 ...
MapReduce程序（一）——wordCount
写在前面:WordCount的功能是统计输入文件中每个单词出现的次数.基本解决思路就是将文本内容切分成单词,将其中相同的单词聚集在一起,统计其数量作为该单词的出现次数输出. 1.MapReduce之w ...

随机推荐

◆linux分区的加密与自动解密◆——Super孟再创辉煌
首先制作分区的加密挂载: 分区的自动解密:
【转】JavaScript里的this指针
用自然语言的角度理解JavaScript中的this关键字 <script type="text/javascript"> function ftn03(){ var ...
arguments.callee 调用函数自身用法----JSON.parse()和JSON.stringify()前端js数据转换json格式
arguments.callee 调用函数自身用法 arguments.callee 在哪一个函数中运行,它就代表哪个函数. 一般用在匿名函数中. 在匿名函数中有时会需要自己调用自己,但是由于是匿名函 ...
封装getElementsByClassName
function getElementsByClassName(oEle,sClass,sEle){ if(oEle.getElementsByClassName){ return oEle.getE ...
How to force to Fullscreen Form
Is it possibile by code resize a form to fullscreen? (like button Maximize) ? // VAR Changed on 10 J ...
oracle11g关于表空间的问题
1.oracle11g默认的块大小为8K 每个表空间里面的单个数据文件最大为32G (2^22-1) *4k 最多可以放1024个单个文件 SQL> show parameter ...
java之classpath到底是什么
如果你输入一个命令,比如java那么系统是如何找到这个命令的呢?按照顺序,系统先在当前目录搜索是否有java.exe, java.bat 等. 如果没有,就得到系统的PATH(不区分大小写)里面查找. ...
H5不能少的功能-滑动分页
// 滑动分页 $(window).scroll(function() { var mayLoadContent = $(window).scrollTop() & ...
iOS多线程GCD 研究
Grand Central Dispatch (GCD)是Apple开发的一个多核编程的解决方法. dispatch queue分成以下三种: 1)运行在主线程的Main queue,通过dispat ...
OC面向对象多态笔记
面向对象的多态是建立在继承上,可以说没有继承就没有多态: 多态:父类指针指向了子类的对象: int main() { //假设已定义了Animal类和它的子类Dog,那么多态的代码体现就是 Anima ...

mapreduce程序编写(WordCount)

一、工程准备

二、编码

mapreduce程序编写(WordCount)的更多相关文章

随机推荐

热门专题