Hadoop 2.6.3运行自带WordCount程序笔记

运行平台：Hadoop 2.6.3

模式：完全分布模式

1、准备统计文本，以一段文字为例：eg.txt

The Project Gutenberg EBook of War and Peace, by Leo Tolstoy

This eBook is for the use of anyone anywhere at no cost and with almost

no restrictions whatsoever.  You may copy it, give it away or re-use it

under the terms of the Project Gutenberg License included with this

eBook or online at www.gutenberg.org

Title: War and Peace

Author: Leo Tolstoy

2、在Shell中上传文本

hadoop fs -put ./eg.txt /

3、进入share/hadoop/mapreduce目录下，启动排序

hadoop jar hadoop-mapreduce-examples-2.6..jar wordcount /eg.txt /out

4、屏幕输出结果如下：

16/03/29 21:30:26 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032

16/03/29 21:30:30 INFO input.FileInputFormat: Total input paths to process : 1

16/03/29 21:30:30 INFO mapreduce.JobSubmitter: number of splits:1

16/03/29 21:30:31 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1459233715960_0004

16/03/29 21:30:31 INFO impl.YarnClientImpl: Submitted application application_1459233715960_0004

16/03/29 21:30:31 INFO mapreduce.Job: The url to track the job: http://m1.fredlab.org:8088/proxy/application_1459233715960_0004/

16/03/29 21:30:31 INFO mapreduce.Job: Running job: job_1459233715960_0004

16/03/29 21:30:47 INFO mapreduce.Job: Job job_1459233715960_0004 running in uber mode : false

16/03/29 21:30:47 INFO mapreduce.Job:  map 0% reduce 0%

16/03/29 21:30:57 INFO mapreduce.Job:  map 100% reduce 0%

16/03/29 21:31:09 INFO mapreduce.Job:  map 100% reduce 100%

16/03/29 21:31:10 INFO mapreduce.Job: Job job_1459233715960_0004 completed successfully

16/03/29 21:31:11 INFO mapreduce.Job: Counters: 49

        File System Counters

                FILE: Number of bytes read=547

                FILE: Number of bytes written=213761

                FILE: Number of read operations=0

                FILE: Number of large read operations=0

                FILE: Number of write operations=0

                HDFS: Number of bytes read=453

                HDFS: Number of bytes written=361

                HDFS: Number of read operations=6

                HDFS: Number of large read operations=0

                HDFS: Number of write operations=2

        Job Counters

                Launched map tasks=1

                Launched reduce tasks=1

                Data-local map tasks=1

                Total time spent by all maps in occupied slots (ms)=7594

                Total time spent by all reduces in occupied slots (ms)=9087

                Total time spent by all map tasks (ms)=7594

                Total time spent by all reduce tasks (ms)=9087

                Total vcore-milliseconds taken by all map tasks=7594

                Total vcore-milliseconds taken by all reduce tasks=9087

                Total megabyte-milliseconds taken by all map tasks=7776256

                Total megabyte-milliseconds taken by all reduce tasks=9305088

        Map-Reduce Framework

                Map input records=11

                Map output records=62

                Map output bytes=598

                Map output materialized bytes=547

                Input split bytes=98

                Combine input records=62

                Combine output records=45

                Reduce input groups=45

                Reduce shuffle bytes=547

                Reduce input records=45

                Reduce output records=45

                Spilled Records=90

                Shuffled Maps =1

                Failed Shuffles=0

                Merged Map outputs=1

                GC time elapsed (ms)=310

                CPU time spent (ms)=2010

                Physical memory (bytes) snapshot=273182720

                Virtual memory (bytes) snapshot=4122341376

                Total committed heap usage (bytes)=137498624

        Shuffle Errors

                BAD_ID=0

                CONNECTION=0

                IO_ERROR=0

                WRONG_LENGTH=0

                WRONG_MAP=0

                WRONG_REDUCE=0

        File Input Format Counters

                Bytes Read=355

        File Output Format Counters

                Bytes Written=361

5、结果文件位于hadoop集群/out目录下，如果执行成功，则出现_SUCCESS标识文件，并将结果存放于part-r-00000文件中。

Author:	1

EBook	1

Gutenberg	2

Leo	2

License	1

Peace	1

Peace,	1

Project	2

The	1

This	1

Title:	1

Tolstoy	2

War	2

You	1

almost	1

and	3

anyone	1

anywhere	1

at	2

away	1

by	1

copy	1

cost	1

eBook	2

for	1

give	1

included	1

is	1

it	2

it,	1

may	1

no	2

of	3

online	1

or	2

re-use	1

restrictions	1

terms	1

the	3

this	1

under	1

use	1

whatsoever.	1

with	2

www.gutenberg.org	1

可以到http://www.gutenberg.org/上下载更多txt版书籍文本来练习。

Hadoop 2.6.3运行自带WordCount程序笔记的更多相关文章

hadoop2.2使用手册2：如何运行自带wordcount
问题导读:1.hadoop2.x自带wordcount在什么位置?2.运行wordcount程序,需要做哪些准备? 此篇是在hadoop2完全分布式最新高可靠安装文档 hadoop2.X使用手册1:通 ...
大数据之路week07--day03（Hadoop深入理解，JAVA代码编写WordCount程序，以及扩展升级）
什么是MapReduce 你想数出一摞牌中有多少张黑桃.直观方式是一张一张检查并且数出有多少张是黑桃. MapReduce方法则是: 1.给在座的所有玩家中分配这摞牌 2.让每个玩家数自己手中的牌有几 ...
hadoop：如何运行自带wordcount
1.在linux系统创建文件 vi aa.txt --------i 进行编辑输入内容(多个单词例如:aa bb cc aa) 2.在HDFS上面创建文件夹 hdfs dfs -mkdir ...
MapReduce编程入门实例之WordCount：分别在Eclipse和Hadoop集群上运行
上一篇博文如何在Eclipse下搭建Hadoop开发环境,今天给大家介绍一下如何分别分别在Eclipse和Hadoop集群上运行我们的MapReduce程序! 1. 在Eclipse环境下运行MapR ...
Hadoop下WordCount程序
一.前言在之前我们已经在 CenOS6.5 下搭建好了 Hadoop2.x 的开发环境.既然环境已经搭建好了,那么现在我们就应该来干点正事嘛!比如来一个Hadoop世界的HelloWorld,也就是 ...
Hadoop入门完全分布式运行模式-集群配置
目录集群配置集群部署规划配置文件说明配置集群群起集群 1 配置workers 2 启动集群总结 3 集群基本测试上传文件到集群查看数据真实存储路径下载执行wordcount程序配 ...
spark wordcount程序
spark wordcount程序 IllegalAccessError错误这个错误是权限错误,错误的引用方法,比如方法中调用private,protect方法. 当然大家知道wordcount业务 ...
Hadoop_05_运行 Hadoop 自带 MapReduce程序
1. MapReduce使用 MapReduce是Hadoop中的分布式运算编程框架,只要按照其编程规范,只需要编写少量的业务逻辑代码即可实现一个强大的海量数据并发处理程序 2. 运行Hadoop自 ...
020_自己编写的wordcount程序在hadoop上面运行，不使用插件hadoop-eclipse-plugin-1.2.1.jar
1.Eclipse中无插件运行MP程序 1)在Eclipse中编写MapReduce程序 2)打包成jar包 3)使用FTP工具,上传jar到hadoop 集群环境 4)运行 2.具体步骤说明:该程 ...

随机推荐

【BZOJ 2154】Crash的数字表格（莫比乌斯+分块）
2154: Crash的数字表格 Description 今天的数学课上,Crash小朋友学习了最小公倍数(Least Common Multiple).对于两个正整数a和b,LCM(a, b)表示能 ...
猜测：信号槽的本质是使用Windows的自定义消息来实现的
在不断执行: void MyTool::DeleteAllFiles(){ for (i = 0; i <= n - 1; i++) { // do something }}在for循环没有执行 ...
【Xamarin挖墙脚系列：对设备/模拟器的查看调试监听】
原文:[Xamarin挖墙脚系列:对设备/模拟器的查看调试监听] 有时候我们需要查看模拟器中的文件,比如进行了文件IO操作,sqlite数据库的操作等.我们想查看内容,这时候,如何将内容导出来?由于A ...
vs2015 Xamarin.Android安装
原文:vs2015 Xamarin.Android安装 Xamarin.Android 安装步骤,以vs2015为例 1,安装vs2015中的跨平台项,但是安装在国内肯定失败,因为需要到谷歌下载当我 ...
applicationDefaultJvmArgs:
server.context-path=/HelloMultiServlet server.port=8080 applicationDefaultJvmArgs: [ "-agentlib ...
没有document.getElementByName
首先声明的是: document.getElementByName方法没有.document.getElementsByName得到的是标签的数组 document.getElementId得到的是某 ...
sort merge join导致temp被爆菊
SQL_ID cqsz37256v36j, child number 1 ------------------------------------- INSERT /*+append*/ INTO T ...
HDU-3719 二叉搜索树
http://acm.hdu.edu.cn/showproblem.php?pid=3791 用数组建立二叉树: 二叉搜索树 Time Limit: 2000/1000 MS (Java/Others ...
yml文件数据的简洁表达方法（Hashes to OpenStruct）
通过ruby编写测试脚本的时候,我还是喜欢采用yml来管理测试数据,就像以前的文章(Selenium WebDriver + Grid2 + RSpec之旅(五))提到的一样,但是在引用yml中的数据 ...
ios UI 之间的切换方法，using prepareForSegue and not
1, use prepareForSegue: - (void)prepareForSegue:(UIStoryboardSegue *)segue sender:(id)sender { RWTDe ...

Hadoop 2.6.3运行自带WordCount程序笔记

Hadoop 2.6.3运行自带WordCount程序笔记的更多相关文章

随机推荐

热门专题