目的:

初步感受一下hadoop mapreduce

环境:

hadoop 2.6.4

1 准备输入文件

paper.txt 内容一般为英文文章,随便弄点什么进去
hadoop@ssmaster:~$ hadoop fs -mkdir /input
hadoop@ssmaster:~$ ls
Desktop Documents Downloads examples.desktop hadoop-2.6..tar.gz Music paper.txt Pictures Public Templates Videos
hadoop@ssmaster:~$ hadoop fs -put paper.txt /input
hadoop@ssmaster:~$ hadoop fs -ls /input
Found items
-rw-r--r-- hadoop supergroup -- : /input/paper.txt

注意:输出目录/output 不用提前创建,程序会自动做这一步

2  执行

hadoop@ssmaster:~$ hadoop jar /opt/hadoop-2.6./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6..jar  wordcount /input /output
// :: INFO client.RMProxy: Connecting to ResourceManager at ssmaster/192.168.249.144:
// :: INFO input.FileInputFormat: Total input paths to process :
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477208120905_0001
// :: INFO impl.YarnClientImpl: Submitted application application_1477208120905_0001
// :: INFO mapreduce.Job: The url to track the job: http://ssmaster:8088/proxy/application_1477208120905_0001/
// :: INFO mapreduce.Job: Running job: job_1477208120905_0001
// :: INFO mapreduce.Job: Job job_1477208120905_0001 running in uber mode : false
// :: INFO mapreduce.Job: map % reduce %

6/10/23 00:51:38 INFO mapreduce.Job: map 0% reduce 0%
16/10/23 00:52:17 INFO mapreduce.Job: map 100% reduce 0%
16/10/23 00:52:39 INFO mapreduce.Job: map 100% reduce 100%
16/10/23 00:52:41 INFO mapreduce.Job: Job job_1477208120905_0001 completed successfully
16/10/23 00:52:41 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=2061
FILE: Number of bytes written=217797
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=1863
HDFS: Number of bytes written=1425
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=35792
Total time spent by all reduces in occupied slots (ms)=18540
Total time spent by all map tasks (ms)=35792
Total time spent by all reduce tasks (ms)=18540
Total vcore-milliseconds taken by all map tasks=35792
Total vcore-milliseconds taken by all reduce tasks=18540
Total megabyte-milliseconds taken by all map tasks=36651008
Total megabyte-milliseconds taken by all reduce tasks=18984960
Map-Reduce Framework
Map input records=11
Map output records=303
Map output bytes=2969
Map output materialized bytes=2061
Input split bytes=101
Combine input records=303
Combine output records=158
Reduce input groups=158
Reduce shuffle bytes=2061
Reduce input records=158
Reduce output records=158
Spilled Records=316
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=1093
CPU time spent (ms)=5550
Physical memory (bytes) snapshot=442781696
Virtual memory (bytes) snapshot=1448112128
Total committed heap usage (bytes)=276299776
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1762
File Output Format Counters
Bytes Written=1425

可以从Web监控页面查看执行状态

http://ssmaster:8088/cluster

Cluster Metrics

Apps Submitted Apps Pending Apps Running Apps Completed Containers Running Memory Used Memory Total Memory Reserved VCores Used VCores Total VCores Reserved Active Nodes Decommissioned Nodes Lost Nodes Unhealthy Nodes Rebooted Nodes
1 0 1 0 2 3 GB 8 GB 0 B 2 8 0 1 0 0 0 0
Show 
20
40
60
80
100

entries

Search: 
 
ID
User
Name
Application Type
Queue
StartTime
FinishTime
State
FinalStatus
Progress
Tracking UI
Blacklisted Nodes
application_1477208120905_0001 hadoop word count MAPREDUCE default Sun, 23 Oct 2016 07:51:13 GMT N/A RUNNING UNDEFINED   ApplicationMaster 0

3 查看输出结果

hadoop@ssmaster:~$ hadoop fs -ls /output
Found items
-rw-r--r-- hadoop supergroup -- : /output/_SUCCESS
-rw-r--r-- hadoop supergroup -- : /output/part-r-
hadoop@ssmaster:~$ hadoop fs -cat /output/part-r-
Always
Dream
There
a
all
along
always
...........
...........

Q 总结

非常简单,没什么感觉。

后续:

  • 自己编写mapreduce wordcount 程序
  • 搭建一个纯分布式,同样的程序处理一个大文件,观察一下速度

[b0004] Hadoop 版hello word mapreduce wordcount 运行的更多相关文章

  1. [b0013] Hadoop 版hello word mapreduce wordcount 运行(三)

    目的: 不用任何IDE,直接在linux 下输入代码.调试执行 环境: Linux  Ubuntu Hadoop 2.6.4 相关: [b0012] Hadoop 版hello word mapred ...

  2. [b0012] Hadoop 版hello word mapreduce wordcount 运行(二)

    目的: 学习Hadoop mapreduce 开发环境eclipse windows下的搭建 环境: Winows 7 64 eclipse 直接连接hadoop运行的环境已经搭建好,结果输出到ecl ...

  3. Hadoop版Helloworld之wordcount运行示例

    1.编写一个统计单词数量的java程序,并命名为wordcount.java,代码如下: import java.io.IOException; import java.util.StringToke ...

  4. Hadoop集群WordCount运行详解(转)

    原文链接:Hadoop集群(第6期)_WordCount运行详解 1.MapReduce理论简介 1.1 MapReduce编程模型 MapReduce采用"分而治之"的思想,把对 ...

  5. hadoop 2.7.3本地环境运行官方wordcount

    hadoop 2.7.3本地环境运行官方wordcount 基本环境: 系统:win7 虚机环境:virtualBox 虚机:centos 7 hadoop版本:2.7.3 本次先以独立模式(本地模式 ...

  6. Hadoop学习历程(四、运行一个真正的MapReduce程序)

    上次的程序只是操作文件系统,本次运行一个真正的MapReduce程序. 运行的是官方提供的例子程序wordcount,这个例子类似其他程序的hello world. 1. 首先确认启动的正常:运行 s ...

  7. (三)配置Hadoop1.2.1+eclipse(Juno版)开发环境,并运行WordCount程序

    配置Hadoop1.2.1+eclipse(Juno版)开发环境,并运行WordCount程序 一.   需求部分 在ubuntu上用Eclipse IDE进行hadoop相关的开发,需要在Eclip ...

  8. hadoop笔记之MapReduce的运行流程

    MapReduce的运行流程 MapReduce的运行流程 基本概念: Job&Task:要完成一个作业(Job),就要分成很多个Task,Task又分为MapTask和ReduceTask ...

  9. Hadoop(六)MapReduce的入门与运行原理

    一 MapReduce入门 1.1 MapReduce定义 Mapreduce是一个分布式运算程序的编程框架,是用户开发“基于hadoop的数据分析应用”的核心框架: Mapreduce核心功能是将用 ...

随机推荐

  1. 【转】Kotlin的inline内联函数

    原文链接:https://blog.csdn.net/Jaden_hool/article/details/78437947 方法调用流程 调用一个方法是一个压栈和出栈的过程,调用方法时将栈针压入方法 ...

  2. vs code 运行 Django 怎么修改端口

    1.具体操作步骤如下 默认情况下,通过 python manage.py runserver 命令行模式默认打开是 8000 端口,如下图所示: 在浏览器预览效果如下: 为了防止端口冲突,我们一般会修 ...

  3. 2. Linux-3.14.12内存管理笔记【系统启动阶段的memblock算法(2)】

    memory:表示可用可分配的内存: 结束完memblock算法初始化前的准备工作,回到memblock算法初始化及其算法实现上面.memblock是一个很简单的算法. memblock算法的实现是, ...

  4. CodeForces - 1251D (贪心+二分)

    题意 https://vjudge.net/problem/CodeForces-1251D 您是一个大型企业的负责人.在您的企业当中共有n位员工为您工作,而且非常有趣的事是这个n是一个奇数(n不能被 ...

  5. vscode 问题。。。。

    "program": "${workspaceFolder}/a.out", "preLaunchTask": "build&qu ...

  6. Mybatis----传入参数parameterType类型详解

    Mybatis----传入参数parameterType类型详解 前言 Mybatis的Mapper文件中的select.insert.update.delete元素中有一个parameterType ...

  7. 题解:P2130 狂奔的Wzf

    题目链接 solution 判断有无障碍的时候不需要以此枚举,利用前缀和,如果前缀为零证明没有障碍. 重复很多,写的很丑了,#3死活不过 #include<iostream> #inclu ...

  8. 浅谈C++ STL list 容器

    浅谈C++ STL list 容器 本篇随笔简单讲解一下\(C++STL\)中\(list\)容器的使用方法和使用技巧. list容器的概念 学习过\(C++STL\)的很多同学都知道,\(STL\) ...

  9. Fiddler修改请求数据

    截断方法一: 在菜单中选择“Rules”->“Automatic Breakpoint”->“Before Requests”,这种方式会截断所有Request请求. 2.浏览器打开站点, ...

  10. golang--连接redis数据库并进行增删查改

    (1)安装第三方开源的redis库: (2)在使用redis之前,需要安装第三方库,在GOPATH路径下执行安装指令--$GOPATH$:go get github.com/garyburd/redi ...