运行Hadoop的示例程序WordCount-Running Hadoop Example
In the Hadoop directory (which you should find at /opt/hadoop/2.2.0) you can find a JAR containing some examples: the exact path is $HADOOP_COMMON_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar .
This JAR contains different examples of mapreduce programs. We'll launch the WordCount program, which is the equivalent of "Hello, world" for MapReduce. This programs just count the occurrences of every single word of the file given as the input.
To run this example we need to prepare something. We assume that we have the HDFS service running; if we didn't create a user directory, we have to do it now (assuming the hadoop user we're using is mapred):
$ hadoop fs -mkdir -p /user/mapred
When we pass "fs" as the first argument to the hadoop command, we're telling hadoop to work on HDFS filesystem; in this case, we used the mkdir command as a switch to create a new directory on HDFS.
Now that our user has a home directory, we can create a directory that we'll use lo load the input file for the mapreduce programs:
$ hadoop fs -mkdir inputdir
We can check the result issuing a "ls" command on HDFS:
$ hadoop fs -ls
Found 1 items
drwxr-xr-x - mapred mrusers 0 2014-02-11 22:54 inputdir
Now we can decide which file we'll count the words of; in this example, I'll use the text of the novella Flatland by Edwin Abbot, which is freely available on gutemberg project for download:
$ wget http://www.gutenberg.org/cache/epub/201/pg201.txt
Now we can put this file onto the HDFS, more precisely into the inputdir dir we created a moment ago:
$ hadoop fs -put pg201.txt inputdir
The switch "-put" tells Hadoop to get the file from the machine's file system and to put it onto the HDFS filesystem. We can check that the file is really there:
$ hadoop fs -ls inputdir
Found 1 items
drwxr-xr-x - mapred mrusers 227368 2014-02-11 22:59 inputdir/pg201.txt
Now we're ready to execute the MapReduce program. Hadoop tarball comes with a JAR containing the WordCount example; we can launch Hadoop with these parameters:
- jar: we're telling Hadoop we want to execute a mapreduce program contained in a JAR
- /opt/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar: this is the absolute path and filename of the JAR
- wordcount: tells Hadoop which of the many examples contained in the JAR to run
- inputdir: the directory on HDFS in which Hadoop can find the input file(s)
- outputdir: the directory on HDFS in which Hadoop must write the result of the program
$ hadoop jar /opt/hadoop-2.2.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount inputdir outputdir
and the output is:
14/02/11 23:16:19 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
14/02/11 23:16:20 INFO input.FileInputFormat: Total input paths to process : 1
14/02/11 23:16:20 INFO mapreduce.JobSubmitter: number of splits:1
14/02/11 23:16:21 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/02/11 23:16:21 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/02/11 23:16:21 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/02/11 23:16:21 INFO Configuration.deprecation: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class
14/02/11 23:16:21 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/02/11 23:16:21 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/02/11 23:16:21 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/02/11 23:16:21 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/02/11 23:16:21 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/02/11 23:16:21 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/02/11 23:16:21 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/02/11 23:16:21 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/02/11 23:16:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1392155226604_0001
14/02/11 23:16:22 INFO impl.YarnClientImpl: Submitted application application_1392155226604_0001 to ResourceManager at /0.0.0.0:8032
14/02/11 23:16:23 INFO mapreduce.Job: The url to track the job: http://hadoop-VirtualBox:8088/proxy/application_1392155226604_0001/
14/02/11 23:16:23 INFO mapreduce.Job: Running job: job_1392155226604_0001
14/02/11 23:16:38 INFO mapreduce.Job: Job job_1392155226604_0001 running in uber mode : false
14/02/11 23:16:38 INFO mapreduce.Job: map 0% reduce 0%
14/02/11 23:16:47 INFO mapreduce.Job: map 100% reduce 0%
14/02/11 23:16:57 INFO mapreduce.Job: map 100% reduce 100%
14/02/11 23:16:58 INFO mapreduce.Job: Job job_1392155226604_0001 completed successfully
14/02/11 23:16:58 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=121375
FILE: Number of bytes written=401139
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=227485
HDFS: Number of bytes written=88461
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=7693
Total time spent by all reduces in occupied slots (ms)=7383
Map-Reduce Framework
Map input records=4239
Map output records=37680
Map output bytes=366902
Map output materialized bytes=121375
Input split bytes=117
Combine input records=37680
Combine output records=8341
Reduce input groups=8341
Reduce shuffle bytes=121375
Reduce input records=8341
Reduce output records=8341
Spilled Records=16682
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=150
CPU time spent (ms)=5490
Physical memory (bytes) snapshot=399077376
Virtual memory (bytes) snapshot=1674149888
Total committed heap usage (bytes)=314048512
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=227368
File Output Format Counters
Bytes Written=88461
The last part of the output is a summary of the execution of the mapreduce program; just before this, we can spot the "Job job_1392155226604_0001 completed successfully" line, which tells us the mapreduce program has been executed successfully. As told, Hadoop wrote the output onto the outputdir on HDFS; let's see what's inside this dir:
$ hadoop fs -ls outputdir
Found 2 items
-rw-r--r-- 1 mapred mrusers 0 2014-02-11 23:16 outputdir/_SUCCESS
-rw-r--r-- 1 mapred mrusers 88461 2014-02-11 23:16 outputdir/part-r-00000
The presence of the _SUCCESS file confirms us the successful execution of the job; in the part-r-00000 Hadoop wrote the result of the execution. We can bring the file up to the filesystem of our machine using the "get" switch:
$ hadoop fs -get outputdir/part-r-00000 .
Now we can see the content of the file (this is a small subset of the whole file):
...
leading 2
leagues 1
leaning 1
leap 1
leaped 1
learn 7
learned 1
least 23
least. 1
leave 3
leaves 3
leaving 2
lecture 1
led 4
left 9
...
The wordcount program just count the occurrences of every single word and outputs it.
Well, we've successfully run our first mapreduce job on our Hadoop installation!
运行Hadoop的示例程序WordCount-Running Hadoop Example的更多相关文章
- hadoop第一个程序WordCount
hadoop第一个程序WordCount package test; import org.apache.hadoop.mapreduce.Job; import java.io.IOExceptio ...
- Hadoop示例程序WordCount编译运行
首先确保Hadoop已正确安装及运行. 将WordCount.java拷贝出来 $ cp ./src/examples/org/apache/hadoop/examples/WordCount.jav ...
- Hadoop Map/Reduce 示例程序WordCount
#进入hadoop安装目录 cd /usr/local/hadoop #创建示例文件:input #在里面输入以下内容: #Hello world, Bye world! vim input #在hd ...
- (转载)Hadoop示例程序WordCount详解
最近在学习云计算,研究Haddop框架,费了一整天时间将Hadoop在Linux下完全运行起来,看到官方的map-reduce的demo程序WordCount,仔细研究了一下,算做入门了. 其实Wor ...
- Hadoop示例程序WordCount详解及实例(转)
1.图解MapReduce 2.简历过程: Input: Hello World Bye World Hello Hadoop Bye Hadoop Bye Hadoop Hello Hadoop M ...
- [MapReduce_1] 运行 Word Count 示例程序
0. 说明 MapReduce 实现 Word Count 示意图 && Word Count 代码编写 1. MapReduce 实现 Word Count 示意图 1. Map:预 ...
- CC2650LaunchPad 运行contiki hello-world示例程序
最近做毕设,开始接触contiki. 下载并运行Instant Contiki 3.0 这是官方制作的虚拟机镜像,直接用vmware等工具就可以运行. 从这里下载. 下载并解压后,用vmware运行. ...
- 用Python语言写Hadoop MapReduce程序Writing an Hadoop MapReduce Program in Python
In this tutorial I will describe how to write a simple MapReduce program for Hadoop in the Python pr ...
- IDEA Maven Hadoop调试hdfs程序
IDEA 远程调试 Hadoop 两大特色:一是采用maven的pom配置:二是直接连接hdfs:9000端口,无须另外在服务端配置参数. 其实内容包含了两种方式:本地与远程调试.这里仅仅只是使用远程 ...
随机推荐
- Eclipse的SVN插件与本地SVN客户端关联不上
问题:当我们用SVN客户端把代码更新到本地,并导入到eclipse之后,却发现我们的SVN插件并没有起作用(没有代码入库.修改等小图标的显示,也没有check in,update等功能菜单).如果我们 ...
- WMI技术介绍和应用——WMI概述
https://blog.csdn.net/breaksoftware/article/details/8424317
- [实战]MVC5+EF6+MySql企业网盘实战(5)——ajax方式注册
写在前面 今天贴合到实际的客户需求仔细的想了下,其实在userInfo这个类里面很多字段都不是必须的.有很多的事业单位根本就不能上网,填写的邮箱也是exchange的,个人的详细信息都在ad里面可以取 ...
- LoadRunner常用函数汇总
LoadRunner命令汇总 . 命令行分析函数 (1)lr_get_attrib_double() 检索脚本命令行中使用的double类型变量 (2)lr_get_attrib_string() 检 ...
- 【ghost初级教程】 怎么搭建一个免费的ghost博客
ghost博客系统无疑是这个月最火热的话题之一,这个号称”只为博客“的系统,早在项目开始之初就受到了众人的关注.它使用了当前最火热node.js技术,10月14日发布了V0.3.3版本.江湖传言它将是 ...
- 关于如何在 Unity 的 UI 菜单中默认创建出的控件 Raycast Target 属性默认为 false
关于如何在 Unity 的 UI 菜单中默认创建出的控件 Raycast Target 属性默认为 false 我们在 Unity 中通过 UI 菜单创建的各种控件,比如 Text, Image 等, ...
- AM335x内核模块驱动之LED
在Ubuntu的任意可操作的文件才建立text目录 在text中建立zyr-hello.c: #include<linux/kernel.h> #include<linux/modu ...
- android 网络
韩梦飞沙 韩亚飞 313134555@qq.com yue31313 han_meng_fei_sha android - async - http 安卓 异步 超文本传输协议 xUtil a ...
- CodeForces 602C The Two Routes(最短路)
Description In Absurdistan, there are n towns (numbered 1 through n) and m bidirectional railways. T ...
- CF1051D Bicolorings dp
水题一道 $f[i][j][S]$表示$2 * i$的矩形,有$j$个联通块,某尾状态为$S$ 然后转移就行了... #include <vector> #include <cstd ...