Hadoop示例程序WordCount编译运行
首先确保Hadoop已正确安装及运行。
将WordCount.java拷贝出来
$ cp ./src/examples/org/apache/hadoop/examples/WordCount.java /home/hadoop/
在当前目录下创建一个存放WordCount.class的文件夹
$ mkdir class
编译WordCount.java
$ javac -classpath /usr/local/hadoop/hadoop-core-0.20.203.0.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar WordCount.java -d class
编译完成后class文件夹下会出现一个org文件夹
$ ls class
org
对编译好的class打包
$ cd class
$ jar cvf WordCount.jar *
已添加清单
正在添加: org/(输入 = 0) (输出 = 0)(存储了 0%)
正在添加: org/apache/(输入 = 0) (输出 = 0)(存储了 0%)
正在添加: org/apache/hadoop/(输入 = 0) (输出 = 0)(存储了 0%)
正在添加: org/apache/hadoop/examples/(输入 = 0) (输出 = 0)(存储了 0%)
正在添加: org/apache/hadoop/examples/WordCount$TokenizerMapper.class(输入 = 1790) (输出 = 765)(压缩了 57%)
正在添加: org/apache/hadoop/examples/WordCount$IntSumReducer.class(输入 = 1793) (输出 = 746)(压缩了 58%)
正在添加: org/apache/hadoop/examples/WordCount.class(输入 = 1911) (输出 = 996)(压缩了 47%)
至此java文件的编译工作已经完成
准备测试文件,启动Hadoop。
由于运行Hadoop时指定的输入文件只能是HDFS文件系统里的文件,所以我们必须将要测试的文件从本地文件系统拷贝到HDFS文件系统中。
$ hadoop fs -mkdir input
$ hadoop fs -ls
Found 1 items
drwxr-xr-x - hadoop supergroup 0 2014-03-26 10:39 /user/hadoop/input
$ hadoop fs -put file input
$ hadoop fs -ls input
Found 1 items
-rw-r--r-- 2 hadoop supergroup 75 2014-03-26 10:40 /user/hadoop/input/file
运行程序
$ cd class
$ ls
org WordCount.jar
$ hadoop jar WordCount.jar org.apache.hadoop.examples.WordCount input output
14/03/26 10:57:39 INFO input.FileInputFormat: Total input paths to process : 1
14/03/26 10:57:40 INFO mapred.JobClient: Running job: job_201403261015_0001
14/03/26 10:57:41 INFO mapred.JobClient: map 0% reduce 0%
14/03/26 10:57:54 INFO mapred.JobClient: map 100% reduce 0%
14/03/26 10:58:06 INFO mapred.JobClient: map 100% reduce 100%
14/03/26 10:58:11 INFO mapred.JobClient: Job complete: job_201403261015_0001
14/03/26 10:58:11 INFO mapred.JobClient: Counters: 25
14/03/26 10:58:11 INFO mapred.JobClient: Job Counters
14/03/26 10:58:11 INFO mapred.JobClient: Launched reduce tasks=1
14/03/26 10:58:11 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=12321
14/03/26 10:58:11 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/26 10:58:11 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/03/26 10:58:11 INFO mapred.JobClient: Launched map tasks=1
14/03/26 10:58:11 INFO mapred.JobClient: Data-local map tasks=1
14/03/26 10:58:11 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10303
14/03/26 10:58:11 INFO mapred.JobClient: File Output Format Counters
14/03/26 10:58:11 INFO mapred.JobClient: Bytes Written=51
14/03/26 10:58:11 INFO mapred.JobClient: FileSystemCounters
14/03/26 10:58:11 INFO mapred.JobClient: FILE_BYTES_READ=85
14/03/26 10:58:11 INFO mapred.JobClient: HDFS_BYTES_READ=184
14/03/26 10:58:11 INFO mapred.JobClient: FILE_BYTES_WRITTEN=42541
14/03/26 10:58:11 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=51
14/03/26 10:58:11 INFO mapred.JobClient: File Input Format Counters
14/03/26 10:58:11 INFO mapred.JobClient: Bytes Read=75
14/03/26 10:58:11 INFO mapred.JobClient: Map-Reduce Framework
14/03/26 10:58:11 INFO mapred.JobClient: Reduce input groups=7
14/03/26 10:58:11 INFO mapred.JobClient: Map output materialized bytes=85
14/03/26 10:58:11 INFO mapred.JobClient: Combine output records=7
14/03/26 10:58:11 INFO mapred.JobClient: Map input records=1
14/03/26 10:58:11 INFO mapred.JobClient: Reduce shuffle bytes=0
14/03/26 10:58:11 INFO mapred.JobClient: Reduce output records=7
14/03/26 10:58:11 INFO mapred.JobClient: Spilled Records=14
14/03/26 10:58:11 INFO mapred.JobClient: Map output bytes=131
14/03/26 10:58:11 INFO mapred.JobClient: Combine input records=14
14/03/26 10:58:11 INFO mapred.JobClient: Map output records=14
14/03/26 10:58:11 INFO mapred.JobClient: SPLIT_RAW_BYTES=109
14/03/26 10:58:11 INFO mapred.JobClient: Reduce input records=7
查看结果
$ hadoop fs -ls
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2014-03-26 10:40 /user/hadoop/input
drwxr-xr-x - hadoop supergroup 0 2014-03-26 10:58 /user/hadoop/output
可以发现hadoop中多了一个output文件,查看output中的文件信息
$ hadoop fs -ls output
Found 3 items
-rw-r--r-- 2 hadoop supergroup 0 2014-03-26 11:04 /user/hadoop/output/_SUCCESS
drwxr-xr-x - hadoop supergroup 0 2014-03-26 11:04 /user/hadoop/output/_logs
-rw-r--r-- 2 hadoop supergroup 65 2014-03-26 11:04 /user/hadoop/output/part-r-00000
查看运行结果
$ hadoop fs -cat output/part-r-00000
Bye 3
Hello 3
Word 1
World 3
bye 1
hello 2
world 1
至此,Hadoop下WordCount示例运行结束。
如果还想运行一遍就需要把output文件夹删除,否则会报异常,如下
14/03/26 11:41:30 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201403261015_0003
Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory output already exists
at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:134)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:830)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:791)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:791)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:465)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:494)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
删除output文件夹操作如下
$ hadoop fs -rmr output
Deleted hdfs://localhost:9000/user/hadoop/output
也可以直接运行Hadoop示例中已经编译过的jar文件
$ hadoop jar /usr/local/hadoop/hadoop-examples-0.20.203.0.jar wordcount input output
14/03/28 17:02:33 INFO input.FileInputFormat: Total input paths to process : 2
14/03/28 17:02:33 INFO mapred.JobClient: Running job: job_201403281439_0004
14/03/28 17:02:34 INFO mapred.JobClient: map 0% reduce 0%
14/03/28 17:02:49 INFO mapred.JobClient: map 100% reduce 0%
14/03/28 17:03:01 INFO mapred.JobClient: map 100% reduce 100%
14/03/28 17:03:06 INFO mapred.JobClient: Job complete: job_201403281439_0004
14/03/28 17:03:06 INFO mapred.JobClient: Counters: 25
14/03/28 17:03:06 INFO mapred.JobClient: Job Counters
14/03/28 17:03:06 INFO mapred.JobClient: Launched reduce tasks=1
14/03/28 17:03:06 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=17219
14/03/28 17:03:06 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/28 17:03:06 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/03/28 17:03:06 INFO mapred.JobClient: Launched map tasks=2
14/03/28 17:03:06 INFO mapred.JobClient: Data-local map tasks=2
14/03/28 17:03:06 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10398
14/03/28 17:03:06 INFO mapred.JobClient: File Output Format Counters
14/03/28 17:03:06 INFO mapred.JobClient: Bytes Written=65
14/03/28 17:03:06 INFO mapred.JobClient: FileSystemCounters
14/03/28 17:03:06 INFO mapred.JobClient: FILE_BYTES_READ=131
14/03/28 17:03:06 INFO mapred.JobClient: HDFS_BYTES_READ=343
14/03/28 17:03:06 INFO mapred.JobClient: FILE_BYTES_WRITTEN=63840
14/03/28 17:03:06 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=65
14/03/28 17:03:06 INFO mapred.JobClient: File Input Format Counters
14/03/28 17:03:06 INFO mapred.JobClient: Bytes Read=124
14/03/28 17:03:06 INFO mapred.JobClient: Map-Reduce Framework
14/03/28 17:03:06 INFO mapred.JobClient: Reduce input groups=9
14/03/28 17:03:06 INFO mapred.JobClient: Map output materialized bytes=137
14/03/28 17:03:06 INFO mapred.JobClient: Combine output records=11
14/03/28 17:03:06 INFO mapred.JobClient: Map input records=2
14/03/28 17:03:06 INFO mapred.JobClient: Reduce shuffle bytes=85
14/03/28 17:03:06 INFO mapred.JobClient: Reduce output records=9
14/03/28 17:03:06 INFO mapred.JobClient: Spilled Records=22
14/03/28 17:03:06 INFO mapred.JobClient: Map output bytes=216
14/03/28 17:03:06 INFO mapred.JobClient: Combine input records=23
14/03/28 17:03:06 INFO mapred.JobClient: Map output records=23
14/03/28 17:03:06 INFO mapred.JobClient: SPLIT_RAW_BYTES=219
14/03/28 17:03:06 INFO mapred.JobClient: Reduce input records=11
参考资料:http://www.cnblogs.com/aukle/p/3214984.html
http://blog.csdn.net/turkeyzhou/article/details/8121601
http://www.cnblogs.com/xia520pi/archive/2012/05/16/2504205.html
Hadoop示例程序WordCount编译运行的更多相关文章
- (转载)Hadoop示例程序WordCount详解
最近在学习云计算,研究Haddop框架,费了一整天时间将Hadoop在Linux下完全运行起来,看到官方的map-reduce的demo程序WordCount,仔细研究了一下,算做入门了. 其实Wor ...
- Hadoop示例程序WordCount详解及实例(转)
1.图解MapReduce 2.简历过程: Input: Hello World Bye World Hello Hadoop Bye Hadoop Bye Hadoop Hello Hadoop M ...
- CentOS7虚拟机配置、Hadoop搭建、wordCount DEMO运行
安装虚拟机 最开始先安装虚拟机,我是12.5.7版本,如果要跟着我做的话,版本最好和我一致,不然后面可能会出一些莫名其妙的错误,下载链接如下(注册码也在里面了): 链接:https://pan.bai ...
- MFC:“Debug Assertion Failed!” ——自动生成的单文档程序项目编译运行就有错误
今天照着孙鑫老师的VC++教程学习文件的操作,VS2010,单文档应用程序,项目文件命名为File,也就有了自动生成的CFileDoc.CFileView等类,一进去就编译运行(就是最初自动生成的项目 ...
- Hadoop Map/Reduce 示例程序WordCount
#进入hadoop安装目录 cd /usr/local/hadoop #创建示例文件:input #在里面输入以下内容: #Hello world, Bye world! vim input #在hd ...
- Hadoop入门程序WordCount的执行过程
首先编写WordCount.java源文件,分别通过map和reduce方法统计文本中每个单词出现的次数,然后按照字母的顺序排列输出, Map过程首先是多个map并行提取多个句子里面的单词然后分别列出 ...
- hadoop 提交程序并监控运行
程序编写及打包 使用maven导入第三方jar pom.xml <?xml version="1.0" encoding="UTF-8"?> < ...
- HelloWord程序代码的编写和HelloWord程序的编译运行
1.新建文件夹,存放代码 2.新建一个Java文件 文件后缀名.java(Hello.java) 3.编写代码public class Hello{public static void main(St ...
- 伪分布式环境下命令行正确运行hadoop示例wordcount
首先确保hadoop已经正确安装.配置以及运行. 1. 首先将wordcount源代码从hadoop目录中拷贝出来. [root@cluster2 logs]# cp /usr/local/h ...
随机推荐
- 【转】DLX 精确覆盖 重复覆盖
问题描述: 给定一个n*m的矩阵,有些位置为1,有些位置为0.如果G[i][j]==1则说明i行可以覆盖j列. Problem: 1)选定最少的行,使得每列有且仅有一个1. 2)选定最少的行,使得每列 ...
- Swift 3.0 的 open,public,internal,fileprivate,private 关键字
import Foundation /// final的含义保持不变 public final class FinalClass { } // 这个类在ModuleA的范围外是不能被继承的 ...
- UNIX标准化及实现之POSIX标准必需头文件
POSIX标准定义的必需头文件 头文件 说明 <dirent.h> 目录项 <fcntl.h> 文件控制 <fnmatch.h> 文件名匹配类型 <glob. ...
- 使用Areas(区域)分离ASP.NET MVC 项目
在使用Areas区域时,如果使用默认路由表,将造成路由表冲突,这种情况需要修改一下区域内<区域名称>AreaRegistration.cs和/App_Start/RouteConfig.a ...
- SQL SERVER 中identity用法
在数据库中, 常用的一个流水编号通常会使用 identity 栏位来进行设置, 这种编号的好处是一定不会重覆, 而且一定是唯一的, 这对table中的唯一值特性很重要, 通常用来做客户编号, 订单编号 ...
- SQL SERVER中架构的理解
在sqlserver 2005中,可能大家在工作或学习的时候会经常发现这样一些问题,你使用一个账户在数据库中创建了一张表,却发现你自己创建的表却没有修改和查询的权限,这是一件很郁闷的事情,在sqlse ...
- #pragma_pack(n)_与___attribute(aligned(n))
#pragma pack(n) 与 __attribute(aligned(n)) 在C语言中,结构是一种复合数据类型,其构成元素既可以是基本数据类型(如int.long.float等)的变量,也 ...
- CCTableView的使用和注意事项
#include "cocos-ext.h" using namespace cocos2d::extension; class TableViewTestLayer: publi ...
- Android调用相册截取图片遇到的问题
1.在Android中可以使用如下的方式来调用相册,选择图片进行裁剪使用,昨天在开发的时候遇到一个问题 private void cropBigImageUri(Uri uri, int output ...
- cmd运行java,含传参,引用jar
1,创建一个java project,完成编码 在Eclipse的资源管理器中选中你要打包的项目,右键点击,选择“导出”项,弹出导出对话框,在下面的Java目录下选择“JAR 文件”项,下一步,在导出 ...