hadoop 2.7.3本地环境运行官方wordcount

基本环境:

系统:win7

虚机环境:virtualBox

虚机:centos 7

hadoop版本:2.7.3

本次先以独立模式(本地模式)来运行。

参考:

1 hadoop 安装

java环境

yum install java-1.8.0-openjdk

hadoop下载压缩包并安装

mkdir ~/hadoop/
cd ~/hadoop/ # http://apache.fayea.com/hadoop/common/hadoop-2.7.3/
curl http://apache.fayea.com/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz -O
# 如果下载出现中断,则可以使用-C参数继续下载 ls -l
#-rw-rw-r--. 1 jungle jungle 165297920 Jan 6 13:10 hadoop-2.7.3.tar.gz curl http://apache.fayea.com/apache/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz -C 165297920 -O
# ** Resuming transfer from byte position 165297920 …… # download checksum
curl http://apache.fayea.com/hadoop/core/hadoop-2.7.3/hadoop-2.7.3.tar.gz.mds -O # check
cat hadoop-2.7.3.tar.gz.mds md5sum hadoop-2.7.3.tar.gz
sha256sum hadoop-2.7.3.tar.gz tar -zxf hadoop-2.7.3.tar.gz
mv hadoop-2.7.3 hadoop-local

2 配置环境

因为是使用本地模式,需要配置的项非常少,只需要涉及环境变量。

# java path
whereis java
java: /usr/bin/java /usr/lib/java /etc/java /usr/share/java ls -l /usr/bin/java
lrwxrwxrwx. 1 root root 22 Dec 30 12:26 /usr/bin/java -> /etc/alternatives/java ls -l /etc/alternatives/java
lrwxrwxrwx. 1 root root 73 Dec 30 12:26 /etc/alternatives/java -> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/jre/bin/java

在~/.bashrc中增加如下三行

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.111-2.b15.el7_3.x86_64/jre
export HADOOP_INSTALL=/home/jungle/hadoop/hadoop-local
export PATH=$PATH:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin

确认hadoop可用:

hadoop version
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /home/jungle/hadoop/hadoop-local/share/hadoop/common/hadoop-common-2.7.3.jar

2 使用linux文件系统做测试

直接使用linux的文件系统做测试,即不使用hadoop fs相关命令。

2.1 wordcount

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar
An example program must be given as the first argument.
Valid program names are:
aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
dbcount: An example job that count the pageview counts from a database.
# ...
wordcount: A map/reduce program that counts the words in the input files.
# ... bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount
# Usage: wordcount <in> [<in>...] <out>

2.2 准备数据

mkdir -p dataLocal/input/
cd dataLocal/input/ echo "hello world, I am jungle. bye world" > file1.txt
echo "hello hadoop. hello jungle. bye hadoop." > file2.txt
echo "the great software is hadoop." >> file2.txt

2.3 运行

cd /home/jungle/hadoop/hadoop-local/

hadoop jar /home/jungle/hadoop/hadoop-local/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount  dataLocal/input/ dataLocal/outout
# dataLocal/outout 当前不存在,由程序生成
echo $? ls -la dataLocal/outout/
total 12
drwxrwxr-x. 2 jungle jungle 84 Jan 6 16:53 .
drwxrwxr-x. 4 jungle jungle 31 Jan 6 16:53 ..
-rw-r--r--. 1 jungle jungle 82 Jan 6 16:53 part-r-00000
-rw-r--r--. 1 jungle jungle 12 Jan 6 16:53 .part-r-00000.crc
-rw-r--r--. 1 jungle jungle 0 Jan 6 16:53 _SUCCESS
-rw-r--r--. 1 jungle jungle 8 Jan 6 16:53 ._SUCCESS.crc # 结果
cat dataLocal/outout//part-r-00000
I 1
am 1
bye 2
great 1
hadoop. 3
hello 3
is 1
jungle. 2
software 1
the 1
world. 2

2.4 最后是运行日志

通过日志,可以了解运行时的一些常用参数和配置。

17/01/06 16:53:26 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
17/01/06 16:53:26 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
17/01/06 16:53:26 INFO input.FileInputFormat: Total input paths to process : 2
17/01/06 16:53:26 INFO mapreduce.JobSubmitter: number of splits:2
17/01/06 16:53:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1147390429_0001
17/01/06 16:53:27 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
17/01/06 16:53:27 INFO mapreduce.Job: Running job: job_local1147390429_0001
17/01/06 16:53:27 INFO mapred.LocalJobRunner: OutputCommitter set in config null
17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/01/06 16:53:27 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Waiting for map tasks
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1147390429_0001_m_000000_0
17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/01/06 16:53:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/01/06 16:53:27 INFO mapred.MapTask: Processing split: file:/home/jungle/hadoop/hadoop-local/dataLocal/input/file2.txt:0+70
17/01/06 16:53:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/01/06 16:53:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/01/06 16:53:27 INFO mapred.MapTask: soft limit at 83886080
17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/01/06 16:53:27 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/01/06 16:53:27 INFO mapred.LocalJobRunner:
17/01/06 16:53:27 INFO mapred.MapTask: Starting flush of map output
17/01/06 16:53:27 INFO mapred.MapTask: Spilling map output
17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufend = 114; bufvoid = 104857600
17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214356(104857424); length = 41/6553600
17/01/06 16:53:27 INFO mapred.MapTask: Finished spill 0
17/01/06 16:53:27 INFO mapred.Task: Task:attempt_local1147390429_0001_m_000000_0 is done. And is in the process of committing
17/01/06 16:53:27 INFO mapred.LocalJobRunner: map
17/01/06 16:53:27 INFO mapred.Task: Task 'attempt_local1147390429_0001_m_000000_0' done.
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1147390429_0001_m_000000_0
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1147390429_0001_m_000001_0
17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/01/06 16:53:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/01/06 16:53:27 INFO mapred.MapTask: Processing split: file:/home/jungle/hadoop/hadoop-local/dataLocal/input/file1.txt:0+37
17/01/06 16:53:27 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
17/01/06 16:53:27 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
17/01/06 16:53:27 INFO mapred.MapTask: soft limit at 83886080
17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
17/01/06 16:53:27 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
17/01/06 16:53:27 INFO mapred.LocalJobRunner:
17/01/06 16:53:27 INFO mapred.MapTask: Starting flush of map output
17/01/06 16:53:27 INFO mapred.MapTask: Spilling map output
17/01/06 16:53:27 INFO mapred.MapTask: bufstart = 0; bufend = 65; bufvoid = 104857600
17/01/06 16:53:27 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214372(104857488); length = 25/6553600
17/01/06 16:53:27 INFO mapred.MapTask: Finished spill 0
17/01/06 16:53:27 INFO mapred.Task: Task:attempt_local1147390429_0001_m_000001_0 is done. And is in the process of committing
17/01/06 16:53:27 INFO mapred.LocalJobRunner: map
17/01/06 16:53:27 INFO mapred.Task: Task 'attempt_local1147390429_0001_m_000001_0' done.
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1147390429_0001_m_000001_0
17/01/06 16:53:27 INFO mapred.LocalJobRunner: map task executor complete.
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Waiting for reduce tasks
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Starting task: attempt_local1147390429_0001_r_000000_0
17/01/06 16:53:27 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
17/01/06 16:53:27 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
17/01/06 16:53:27 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@2aa26fdb
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=10
17/01/06 16:53:27 INFO reduce.EventFetcher: attempt_local1147390429_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
17/01/06 16:53:27 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1147390429_0001_m_000000_0 decomp: 98 len: 102 to MEMORY
17/01/06 16:53:27 INFO reduce.InMemoryMapOutput: Read 98 bytes from map-output for attempt_local1147390429_0001_m_000000_0
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 98, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->98
17/01/06 16:53:27 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1147390429_0001_m_000001_0 decomp: 68 len: 72 to MEMORY 17/01/06 16:53:27 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) 17/01/06 16:53:27 INFO reduce.InMemoryMapOutput: Read 68 bytes from map-output for attempt_local1147390429_0001_m_000001_0
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 68, inMemoryMapOutputs.size() -> 2, commitMemory -> 98, usedMemory ->166
17/01/06 16:53:27 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning 17/01/06 16:53:27 WARN io.ReadaheadPool: Failed readahead on ifile
EBADF: Bad file descriptor
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posix_fadvise(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX.posixFadviseIfPossible(NativeIO.java:267)
at org.apache.hadoop.io.nativeio.NativeIO$POSIX$CacheManipulator.posixFadviseIfPossible(NativeIO.java:146)
at org.apache.hadoop.io.ReadaheadPool$ReadaheadRequestImpl.run(ReadaheadPool.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) 17/01/06 16:53:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs
17/01/06 16:53:27 INFO mapred.Merger: Merging 2 sorted segments
17/01/06 16:53:27 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 156 bytes
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: Merged 2 segments, 166 bytes to disk to satisfy reduce memory limit
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: Merging 1 files, 168 bytes from disk
17/01/06 16:53:27 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
17/01/06 16:53:27 INFO mapred.Merger: Merging 1 sorted segments
17/01/06 16:53:27 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 160 bytes
17/01/06 16:53:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/01/06 16:53:27 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
17/01/06 16:53:27 INFO mapred.Task: Task:attempt_local1147390429_0001_r_000000_0 is done. And is in the process of committing
17/01/06 16:53:27 INFO mapred.LocalJobRunner: 2 / 2 copied.
17/01/06 16:53:27 INFO mapred.Task: Task attempt_local1147390429_0001_r_000000_0 is allowed to commit now
17/01/06 16:53:27 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1147390429_0001_r_000000_0' to file:/home/jungle/hadoop/hadoop-local/dataLocal/outout/_temporary/0/task_local1147390429_0001_r_000000
17/01/06 16:53:27 INFO mapred.LocalJobRunner: reduce > reduce
17/01/06 16:53:27 INFO mapred.Task: Task 'attempt_local1147390429_0001_r_000000_0' done.
17/01/06 16:53:27 INFO mapred.LocalJobRunner: Finishing task: attempt_local1147390429_0001_r_000000_0
17/01/06 16:53:27 INFO mapred.LocalJobRunner: reduce task executor complete.
17/01/06 16:53:28 INFO mapreduce.Job: Job job_local1147390429_0001 running in uber mode : false
17/01/06 16:53:28 INFO mapreduce.Job: map 100% reduce 100%
17/01/06 16:53:28 INFO mapreduce.Job: Job job_local1147390429_0001 completed successfully
17/01/06 16:53:28 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=889648
FILE: Number of bytes written=1748828
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=3
Map output records=18
Map output bytes=179
Map output materialized bytes=174
Input split bytes=256
Combine input records=18
Combine output records=14
Reduce input groups=11
Reduce shuffle bytes=174
Reduce input records=14
Reduce output records=11
Spilled Records=28
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=43
Total committed heap usage (bytes)=457912320
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=107
File Output Format Counters
Bytes Written=94

EBADF: Bad file descriptor。这两个错误,从网上看可以忽略。

hadoop 2.7.3本地环境运行官方wordcount的更多相关文章

  1. hadoop 2.7.3本地环境运行官方wordcount-基于HDFS

    接上篇<hadoop 2.7.3本地环境运行官方wordcount>.继续在本地模式下测试,本次使用hdfs. 2 本地模式使用fs计数wodcount 上面是直接使用的是linux的文件 ...

  2. hadoop 2.7.3伪分布式环境运行官方wordcount

    hadoop 2.7.3伪分布式模式运行wordcount 基本环境: 系统:win7 虚机环境:virtualBox 虚机:centos 7 hadoop版本:2.7.3 本次以伪分布式模式来运行w ...

  3. phpstudy等php本地环境运行缓慢的问题解决方法

    我们经常会使用些一键安装包部署本地服务器环境.比如phpstudy.但是会有不少人发现,wordpress等使用数据库的程序打开或者切换页面的速度明显低于静态站点.甚至需要好几秒.这个问题一直困扰了我 ...

  4. hadoop——配置eclipse下的map-reduce运行环境 1

    1.通过修改实例模板程序来实现自己的map-reduce: 为了让示例程序run起来: 1)安装eclipse 2)安装map-reduce的eclipse插件 eclipse的map-reduce插 ...

  5. spark本地环境的搭建到运行第一个spark程序

    搭建spark本地环境 搭建Java环境 (1)到官网下载JDK 官网链接:https://www.oracle.com/technetwork/java/javase/downloads/jdk8- ...

  6. Hadoop本地环境安装

    一.服务器环境 本人用的是阿里云的ECS的入门机器,配置1核2G,1M带宽,搭了个Hadoop单机环境,供参考 Linux发行版本:Centos7 JDK:阿里云镜像市场中选择JDK8 二.安装步骤 ...

  7. Flink从入门到放弃(入门篇2)-本地环境搭建&构建第一个Flink应用

    戳更多文章: 1-Flink入门 2-本地环境搭建&构建第一个Flink应用 3-DataSet API 4-DataSteam API 5-集群部署 6-分布式缓存 7-重启策略 8-Fli ...

  8. Hadoop.2.x_伪分布环境搭建

    一. 基本环境搭建 1. 设置主机名.静态IP/DNS.主机映射.windows主机映射(方便ssh访问与IP修改)等 设置主机名: vi /etc/sysconfig/network # 重启系统生 ...

  9. 以太坊remix-ide本地环境搭建

    remix-ide简介 ​ remix-ide是一款以太坊官方solisity语言的在线IDE,可用于智能合约的编写.测试与部署,不过某些时候可能是在离线环境下工作或者受限于网速原因,使用在线remi ...

随机推荐

  1. 如何一步一步用DDD设计一个电商网站(六)—— 给购物车加点料,集成售价上下文

    阅读目录 前言 如何在一个项目中实现多个上下文的业务 售价上下文与购买上下文的集成 结语 一.前言 前几篇已经实现了一个最简单的购买过程,这次开始往这个过程中增加一些东西.比如促销.会员价等,在我们的 ...

  2. Microsoft Loves Linux

    微软新任CEO纳德拉提出的“Microsoft Loves Linux”,并且微软宣布.NET框架的开源,近期Microsoft不但宣布了Linux平台的SQL Server,还宣布了Microsof ...

  3. JavaScript String对象

    本编主要介绍String 字符串对象. 目录 1. 介绍:阐述 String 对象的说明以及定义方式. 2. 实例属性:介绍 String 对象的实例属性: length. 3. 实例方法:介绍 St ...

  4. JavaScript的继承实现方式

    1.使用call或apply方法,将父对象的构造函数绑定在子对象上 function A(){ this.name = 'json'; } function B(){ A.call(this); } ...

  5. 谈谈一些有趣的CSS题目(十)-- 结构性伪类选择器

    开本系列,谈谈一些有趣的 CSS 题目,题目类型天马行空,想到什么说什么,不仅为了拓宽一下解决问题的思路,更涉及一些容易忽视的 CSS 细节. 解题不考虑兼容性,题目天马行空,想到什么说什么,如果解题 ...

  6. 将DataTable中的某列转换成数组或者List

    string[] arrRate = dtRate.AsEnumerable().Select(d => d.Field<string>("arry")).ToA ...

  7. java中Action层、Service层和Dao层的功能区分

    Action/Service/DAO简介: Action是管理业务(Service)调度和管理跳转的. Service是管理具体的功能的. Action只负责管理,而Service负责实施. DAO只 ...

  8. Linux基础介绍【第一篇】

    Linux简介 什么是操作系统? 操作系统,英文名称Operating System,简称OS,是计算机系统中必不可少的基础系统软件,它是应用程序运行以及用户操作必备的基础环境支撑,是计算机系统的核心 ...

  9. Maven实战:pom.xml与settings.xml

    pom.xml与settings.xml pom.xml与setting.xml,可以说是Maven中最重要的两个配置文件,决定了Maven的核心功能,虽然之前的文章零零碎碎有提到过pom.xml和s ...

  10. 博客已经迁移至 http://barretlee.com/entry/,时而同步分享到这里

    博客园是一个十分好的写作平台,不过个人比较喜欢倒腾,所以将文章都做了搬迁. 博客已经迁移至 http://barretlee.com/entry/,感谢一直以来的关注和支持. 博客订阅地址: http ...