简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行

简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行

程序源码

import java.io.IOException;

import java.util.Iterator;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class Score {

    public static class Map extends

            Mapper<LongWritable, Text, Text, IntWritable> {

        // 实现map函数

        public void map(LongWritable key, Text value, Context context)

                throws IOException, InterruptedException {

            // 将输入的纯文本文件的数据转化成String

            String line = value.toString();

            // 将输入的数据首先按行进行分割

            StringTokenizer tokenizerArticle = new StringTokenizer(line, "\n");

            // 分别对每一行进行处理

            while (tokenizerArticle.hasMoreElements()) {

                // 每行按空格划分

                StringTokenizer tokenizerLine = new StringTokenizer(tokenizerArticle.nextToken());

                String strName = tokenizerLine.nextToken();// 学生姓名部分

                String strScore = tokenizerLine.nextToken();// 成绩部分

                Text name = new Text(strName);

                int scoreInt = Integer.parseInt(strScore);

                // 输出姓名和成绩

                context.write(name, new IntWritable(scoreInt));

            }

        }

    }

    public static class Reduce extends

            Reducer<Text, IntWritable, Text, IntWritable> {

        // 实现reduce函数

        public void reduce(Text key, Iterable<IntWritable> values,

                Context context) throws IOException, InterruptedException {

            int sum = 0;

            int count = 0;

            Iterator<IntWritable> iterator = values.iterator();

            while (iterator.hasNext()) {

                sum += iterator.next().get();// 计算总分

                count++;// 统计总的科目数

            }

            int average = (int) sum / count;// 计算平均成绩

            context.write(key, new IntWritable(average));

        }

    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        // "localhost:9000" 需要根据实际情况设置一下

        conf.set("mapred.job.tracker", "localhost:9000");

      	// 一个hdfs文件系统中的 输入目录 及 输出目录

        String[] ioArgs = new String[] { "input/score", "output" };

        String[] otherArgs = new GenericOptionsParser(conf, ioArgs).getRemainingArgs();

        if (otherArgs.length != 2) {

            System.err.println("Usage: Score Average <in> <out>");

            System.exit(2);

        }

        Job job = new Job(conf, "Score Average");

        job.setJarByClass(Score.class);

        // 设置Map、Combine和Reduce处理类

        job.setMapperClass(Map.class);

        job.setCombinerClass(Reduce.class);

        job.setReducerClass(Reduce.class);

        // 设置输出类型

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(IntWritable.class);

        // 将输入的数据集分割成小数据块splites，提供一个RecordReder的实现

        job.setInputFormatClass(TextInputFormat.class);

        // 提供一个RecordWriter的实现，负责数据输出

        job.setOutputFormatClass(TextOutputFormat.class);

        // 设置输入和输出目录

        FileInputFormat.addInputPath(job, new Path(otherArgs[0]));

        FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

编译

命令

javac Score.java

依赖错误

如果出现如下错误:

mint@lenovo ~/Desktop/hadoop $ javac Score.java

Score.java:4: error: package org.apache.hadoop.conf does not exist

import org.apache.hadoop.conf.Configuration;

                             ^

Score.java:5: error: package org.apache.hadoop.fs does not exist

import org.apache.hadoop.fs.Path;

                           ^

Score.java:6: error: package org.apache.hadoop.io does not exist

import org.apache.hadoop.io.IntWritable;

                           ^

Score.java:7: error: package org.apache.hadoop.io does not exist

import org.apache.hadoop.io.LongWritable;

                           ^

Score.java:8: error: package org.apache.hadoop.io does not exist

import org.apache.hadoop.io.Text;

尝试修改环境变量CLASSPATH

sudo vim /etc/profile

# 添加如下内容

export HADOOP_HOME=/usr/local/hadoop	# 如果没设置的话, 路径是hadoop安装目录

export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH	# 如果没设置的话

export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH

source /etc/profile

然后重复上述编译命令.

打包

编译之后会生成三个class文件:

mint@lenovo ~/Desktop/hadoop $ ls | grep class

Score.class

Score$Map.class

Score$Reduce.class

使用tar程序打包class文件.

tar -cvf Score.jar ./Score*.class

会生成Score.jar文件.

提交运行

样例输入

mint@lenovo ~/Desktop/hadoop $ ls | grep txt

chinese.txt

english.txt

math.txt

mint@lenovo ~/Desktop/hadoop $ cat chinese.txt

Zhao 98

Qian 9

Sun 67

Li 23

mint@lenovo ~/Desktop/hadoop $ cat english.txt

Zhao 93

Qian 42

Sun 87

Li 54

mint@lenovo ~/Desktop/hadoop $ cat math.txt

Zhao 38

Qian 45

Sun 23

Li 43

上传到HDFS

hdfs dfs -put ./*/txt input/score

mint@lenovo ~/Desktop/hadoop $ hdfs dfs -ls input/score

Found 3 items

-rw-r--r--   1 mint supergroup         28 2017-01-11 23:25 input/score/chinese.txt

-rw-r--r--   1 mint supergroup         29 2017-01-11 23:25 input/score/english.txt

-rw-r--r--   1 mint supergroup         29 2017-01-11 23:25 input/score/math.txt

运行

mint@lenovo ~/Desktop/hadoop $ hadoop jar Score.jar Score input/score output

17/01/11 23:26:26 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

17/01/11 23:26:27 INFO input.FileInputFormat: Total input paths to process : 3

17/01/11 23:26:27 INFO mapreduce.JobSubmitter: number of splits:3

17/01/11 23:26:27 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

17/01/11 23:26:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1484147224423_0006

17/01/11 23:26:27 INFO impl.YarnClientImpl: Submitted application application_1484147224423_0006

17/01/11 23:26:27 INFO mapreduce.Job: The url to track the job: http://lenovo:8088/proxy/application_1484147224423_0006/

17/01/11 23:26:27 INFO mapreduce.Job: Running job: job_1484147224423_0006

17/01/11 23:26:33 INFO mapreduce.Job: Job job_1484147224423_0006 running in uber mode : false

17/01/11 23:26:33 INFO mapreduce.Job:  map 0% reduce 0%

17/01/11 23:26:40 INFO mapreduce.Job:  map 67% reduce 0%

17/01/11 23:26:41 INFO mapreduce.Job:  map 100% reduce 0%

17/01/11 23:26:46 INFO mapreduce.Job:  map 100% reduce 100%

17/01/11 23:26:46 INFO mapreduce.Job: Job job_1484147224423_0006 completed successfully

17/01/11 23:26:47 INFO mapreduce.Job: Counters: 49

	File System Counters

		FILE: Number of bytes read=129

		FILE: Number of bytes written=471147

		FILE: Number of read operations=0

		FILE: Number of large read operations=0

		FILE: Number of write operations=0

		HDFS: Number of bytes read=443

		HDFS: Number of bytes written=29

		HDFS: Number of read operations=12

		HDFS: Number of large read operations=0

		HDFS: Number of write operations=2

	Job Counters

		Launched map tasks=3

		Launched reduce tasks=1

		Data-local map tasks=3

		Total time spent by all maps in occupied slots (ms)=15538

		Total time spent by all reduces in occupied slots (ms)=2551

		Total time spent by all map tasks (ms)=15538

		Total time spent by all reduce tasks (ms)=2551

		Total vcore-milliseconds taken by all map tasks=15538

		Total vcore-milliseconds taken by all reduce tasks=2551

		Total megabyte-milliseconds taken by all map tasks=15910912

		Total megabyte-milliseconds taken by all reduce tasks=2612224

	Map-Reduce Framework

		Map input records=12

		Map output records=12

		Map output bytes=99

		Map output materialized bytes=141

		Input split bytes=357

		Combine input records=12

		Combine output records=12

		Reduce input groups=4

		Reduce shuffle bytes=141

		Reduce input records=12

		Reduce output records=4

		Spilled Records=24

		Shuffled Maps =3

		Failed Shuffles=0

		Merged Map outputs=3

		GC time elapsed (ms)=462

		CPU time spent (ms)=2940

		Physical memory (bytes) snapshot=992215040

		Virtual memory (bytes) snapshot=7659905024

		Total committed heap usage (bytes)=732430336

	Shuffle Errors

		BAD_ID=0

		CONNECTION=0

		IO_ERROR=0

		WRONG_LENGTH=0

		WRONG_MAP=0

		WRONG_REDUCE=0

	File Input Format Counters

		Bytes Read=86

	File Output Format Counters

		Bytes Written=29

输出

mint@lenovo ~/Desktop/hadoop $ hdfs dfs -ls output

Found 2 items

-rw-r--r--   1 mint supergroup          0 2017-01-11 23:26 output/_SUCCESS

-rw-r--r--   1 mint supergroup         29 2017-01-11 23:26 output/part-r-00000

mint@lenovo ~/Desktop/hadoop $ hdfs dfs -cat output/part-r-00000

Li	40

Qian	32

Sun	59

Zhao	76

简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行的更多相关文章

使用Python实现Hadoop MapReduce程序
转自:使用Python实现Hadoop MapReduce程序英文原文:Writing an Hadoop MapReduce Program in Python 根据上面两篇文章,下面是我在自己的 ...
mapreduce实现学生平均成绩
思路: 首先从文本读入一行数据,按空格对字符串进行切割,切割后包含学生姓名和某一科的成绩,map输出key->学生姓名 value->某一个成绩然后在reduce里面对成绩进行遍历 ...
【MFC学习笔记-作业9-基于单击响应的计算平均成绩】【】
要求..单击出现一个输入成绩的框,点确定后,计算平均成绩意义很大~ 完成对话框再写个鼠标点击的响应部分鼠标点击的响应部分为难点.... void CWj1401_0302140107_9V ...
[python]使用python实现Hadoop MapReduce程序：计算一组数据的均值和方差
这是参照<机器学习实战>中第15章“大数据与MapReduce”的内容,因为作者写作时hadoop版本和现在的版本相差很大,所以在Hadoop上运行python写的MapReduce程序时 ...
HDFS基本命令与Hadoop MapReduce程序的执行
一.HDFS基本命令 1.创建目录:-mkdir [jun@master ~]$ hadoop fs -mkdir /test [jun@master ~]$ hadoop fs -mkdir /te ...
用Python语言写Hadoop MapReduce程序Writing an Hadoop MapReduce Program in Python
In this tutorial I will describe how to write a simple MapReduce program for Hadoop in the Python pr ...
MapReduce编程：平均成绩
问题描述现在有三个文件分别代表学生的各科成绩,编程求各位同学的平均成绩. 编程思想 map函数将姓名作为key,成绩作为value输出,reduce根据key ...
Python实现Hadoop MapReduce程序
1.概述 Hadoop Streaming提供了一个便于进行MapReduce编程的工具包,使用它可以基于一些可执行命令.脚本语言或其他编程语言来实现Mapper和 Reducer,从而充分利用Had ...
Intellij idea开发Hadoop MapReduce程序
1.首先下载一个Hadoop包,仅Hadoop即可. http://mirrors.hust.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0 ...

随机推荐

Firefox 插件 JSview是一套比较实用的JS，CSS文件查看工具，很方便，很快捷地查看页面引用了哪些文件，作为Web前端开发者是一套必备的插件，由于Firefox升级过快，插件很快不兼容了，这里对插件做了一些调整，可以兼容最新Firefox浏览器（目前FireFox 21）
JSView Firefox Plugins Download 点击下载
导出WAS已部署的ear包的几种方法
可以通过下面几种办法将部署好的工程导出为一个ear包. 1.最简单的,通过was的控制台导出: 首先登录控制台,进入"企业应用程序"管理页面,选中要导出的工程,点击"导出 ...
单片机联网,UIP实现tcp/udp协议
UIP是单片机界联网的一个很好地选择,移植这个库有点复杂,首先是第一步,网卡驱动要写好,使用的网卡芯片为ENC28J60,驱动可以再工程包里面找到 //配置网卡硬件,并设置MAC地址 //返回值:0, ...
【转】聊聊HTTPS和SSL/TLS协议
要说清楚 HTTPS 协议的实现原理,至少需要如下几个背景知识.1. 大致了解几个基本术语(HTTPS.SSL.TLS)的含义2. 大致了解 HTTP 和 TCP 的关系(尤其是“短连接”VS“长连接 ...
Memcached源码分析之memcached.h
//memcached.h //返回在item中data字段key的地址,即把指针指向key #define ITEM_key(item) (((char*)&((item)->data ...
sql数据库恢复文件丢失误删除误格式化置疑报错修复数据库置疑修复总结/SQL SERVER 2000/2005/2008/2008R2
数据库置疑的原因会有多种多样,不同的问题采用的步骤也会有所不同,以下的步骤不能适用所有的情况,但包括了一些基本的步骤. 数据库置疑是指数据库内部处于不一致的状态,很有可能会有数据丢失.我们推荐您从做数 ...
BZOJ2720: [Violet 5]列队春游
2720: [Violet 5]列队春游 Time Limit: 5 Sec Memory Limit: 128 MBSubmit: 173 Solved: 125[Submit][Status] ...
WDA 程序文本翻译OTR
1.针对直接使用表字段,数据元素的情况: 1.1修改数据元素对应的语言值:DD04T. 1.2模拟SE63插入翻译条目:LXE_LOG 1.3运行时文件翻译条目:DDFTX *&------- ...
51NOD 1639 绑鞋带数学
http://www.51nod.com/onlineJudge/questionCode.html#!problemId=1639 假如一开始有一根绳子. 那么增加一根的时候,可以插在它的尾部,也可 ...
Python第三天序列数据类型数值字符串列表元组字典
Python第三天序列数据类型数值字符串列表元组字典数据类型数值字符串列表元组字典序列序列:字符串.列表.元组序列的两个主要特点是索引操作符和切片操作符- 索引操作符让我 ...

简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行

简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行

程序源码

编译

命令

依赖错误

打包

提交运行

样例输入

上传到HDFS

运行

输出

简单的java Hadoop MapReduce程序(计算平均成绩)从打包到提交及运行的更多相关文章

随机推荐

热门专题