hadoop运行wordcount实例，hdfs简单操作

1.查看hadoop版本

[hadoop@ltt1 sbin]$ hadoop version

Hadoop 2.6.-cdh5.12.0

Subversion http://github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0dd

Compiled by jenkins on --29T11:33Z

Compiled with protoc 2.5.

From source with checksum 7c45ae7a4592ce5af86bc4598c5b4

This command was run using /home/hadoop/hadoop260/share/hadoop/common/hadoop-common-2.6.-cdh5.12.0.jar

2.通过hadoop自带的jar文件，可以简单测试一些功能。

提君博客原创

查看hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar文件所支持的MapReduce功能列表

[hadoop@ltt1 sbin]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.-cdh5.12.0.jar

An example program must be given as the first argument.

Valid program names are:

aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.

aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.

bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.

dbcount: An example job that count the pageview counts from a database.

distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.

grep: A map/reduce program that counts the matches of a regex in the input.

join: A job that effects a join over sorted, equally partitioned datasets

multifilewc: A job that counts words from several files.

pentomino: A map/reduce tile laying program to find solutions to pentomino problems.

pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.

randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.

randomwriter: A map/reduce program that writes 10GB of random data per node.

secondarysort: An example defining a secondary sort to the reduce.

sort: A map/reduce program that sorts the data written by the random writer.

sudoku: A sudoku solver.

teragen: Generate data for the terasort

terasort: Run the terasort

teravalidate: Checking results of terasort

wordcount: A map/reduce program that counts the words in the input files.

wordmean: A map/reduce program that counts the average length of the words in the input files.

wordmedian: A map/reduce program that counts the median length of the words in the input files.

wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

3.在hdfs上创建文件夹

hadoop fs -mkdir /input

4.查看hdfs的更目录列表

[hadoop@ltt1 ~]$ hadoop fs -ls /
Found 2 items
drwxr-xr-x - hadoop supergroup 0 2017-09-17 08:11 /input
drwx------ - hadoop supergroup 0 2017-09-17 08:07 /tmp

5.上传本地文件到hdfs

hadoop fs -put $HADOOP_HOME/*.txt /input

6.查看hdfs上input目录下文件

[hadoop@ltt1 ~]$ hadoop fs -ls /input

Found  items

-rw-r--r--    hadoop supergroup       -- : /input/LICENSE.txt

-rw-r--r--    hadoop supergroup       -- : /input/NOTICE.txt

-rw-r--r--    hadoop supergroup        -- : /input/README.txt

7.wordcount简单测试。

提君博客原创

[hadoop@ltt1 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.-cdh5.12.0.jar wordcount /input /output

// :: INFO input.FileInputFormat: Total input paths to process :

// :: INFO mapreduce.JobSubmitter: number of splits:

// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505605169997_0002

// :: INFO impl.YarnClientImpl: Submitted application application_1505605169997_0002

// :: INFO mapreduce.Job: The url to track the job: http://ltt1.bg.cn:9180/proxy/application_1505605169997_0002/

// :: INFO mapreduce.Job: Running job: job_1505605169997_0002

// :: INFO mapreduce.Job: Job job_1505605169997_0002 running in uber mode : false

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job:  map % reduce %

// :: INFO mapreduce.Job: Job job_1505605169997_0002 completed successfully

// :: INFO mapreduce.Job: Counters: 50
>>提君博客原创  http://www.cnblogs.com/tijun/  <<

    File System Counters

        FILE: Number of bytes read=

        FILE: Number of bytes written=

        FILE: Number of read operations=

        FILE: Number of large read operations=

        FILE: Number of write operations=

        HDFS: Number of bytes read=

        HDFS: Number of bytes written=

        HDFS: Number of read operations=

        HDFS: Number of large read operations=

        HDFS: Number of write operations=

    Job Counters

        Launched map tasks=

        Launched reduce tasks=

        Data-local map tasks=

        Rack-local map tasks=

        Total time spent by all maps in occupied slots (ms)=

        Total time spent by all reduces in occupied slots (ms)=

        Total time spent by all map tasks (ms)=

        Total time spent by all reduce tasks (ms)=

        Total vcore-milliseconds taken by all map tasks=

        Total vcore-milliseconds taken by all reduce tasks=

        Total megabyte-milliseconds taken by all map tasks=

        Total megabyte-milliseconds taken by all reduce tasks=

    Map-Reduce Framework

        Map input records=

        Map output records=

        Map output bytes=

        Map output materialized bytes=

        Input split bytes=

        Combine input records=

        Combine output records=

        Reduce input groups=

        Reduce shuffle bytes=

        Reduce input records=

        Reduce output records=

        Spilled Records=

        Shuffled Maps =

        Failed Shuffles=

        Merged Map outputs=

        GC time elapsed (ms)=

        CPU time spent (ms)=

        Physical memory (bytes) snapshot=

        Virtual memory (bytes) snapshot=

        Total committed heap usage (bytes)=

    Shuffle Errors

        BAD_ID=

        CONNECTION=

        IO_ERROR=

        WRONG_LENGTH=

        WRONG_MAP=

        WRONG_REDUCE=

    File Input Format Counters

        Bytes Read=

    File Output Format Counters

        Bytes Written=

8.查看wordcount运行结果（由于结果太长，只举出了部分结果）

[hadoop@ltt1 ~]$ hadoop fs -cat /output/*

worldwide,    4

would    1

writing    2

writing,    4

written    19

xmlenc    1

year    1

you    12

your    5

zlib    1

 252.227-7014(a)(1))    1

§    1

“AS    1

“Contributor    1

“Contributor”    1

“Covered    1

“Executable”    1

“Initial    1

“Larger    1

“Licensable”    1

“License”    1

“Modifications”    1

“Original    1

“Participant”)    1

“Patent    1

“Source    1

“Your”)    1

“You”    2

“commercial    3

“control”    1

>>提君博客原创 http://www.cnblogs.com/tijun/ <<

至此，通过一个wordcount的一个小栗子，简介实践了一下hdfs的创建文件夹，上传文件，查看目录，运行wordcount实例。

提君博客原创

>>提君博客原创 http://www.cnblogs.com/tijun/ <<

hadoop运行wordcount实例，hdfs简单操作的更多相关文章

Hadoop3 在eclipse中访问hadoop并运行WordCount实例
前言: 毕业两年了,之前的工作一直没有接触过大数据的东西,对hadoop等比较陌生,所以最近开始学习了.对于我这样第一次学的人,过程还是充满了很多疑惑和不解的,不过我采取的策略是还是先让环 ...
hadoop2.6.5运行wordcount实例
运行wordcount实例在/tmp目录下生成两个文本文件,上面随便写两个单词. cd /tmp/ mkdir file cd file/ echo "Hello world" ...
[Linux][Hadoop] 运行WordCount例子
紧接上篇,完成Hadoop的安装并跑起来之后,是该运行相关例子的时候了,而最简单最直接的例子就是HelloWorld式的WordCount例子. 参照博客进行运行:http://xiejiangl ...
Spark源码编译并在YARN上运行WordCount实例
在学习一门新语言时,想必我们都是"Hello World"程序开始,类似地,分布式计算框架的一个典型实例就是WordCount程序,接触过Hadoop的人肯定都知道用MapRedu ...
[hadoop] hadoop 运行 wordcount
讲准备好的文本文件放到hdfs中执行 hadoop 安装包中的例子 [root@hadoop01 mapreduce]# hadoop jar hadoop-mapreduce-examples-2 ...
hadoop中常用的hdfs代码操作
一:向HDFS中上传任意文本文件,如果指定的文件在HDFS中已经存在,由用户指定是追加到原有文件末尾还是覆盖原有的文件: package hadoopTest; import org.apache.h ...
HDFS介绍及简单操作
目录 1.HDFS是什么? 2.HDFS设计基础与目标 3.HDFS体系结构 3.1 NameNode(NN)3.2 DataNode(DN)3.3 SecondaryNameNode(SNN)3.4 ...
Spark学习笔记-如何运行wordcount（使用jar包）
IDE:eclipse Spark:spark-1.1.0-bin-hadoop2.4 scala:2.10.4 创建scala工程,编写wordcount程序如下 package com.luoga ...
一文了解 Hadoop 运行机制
大数据技术栈在当下已经是比较成熟的了,Hadoop 作为大数据存储的基石,其重要程度不言而喻,作为一个想从 java 后端转向大数据开发的程序员来说,打好 Hadoop 基础,就相当于夯实建造房屋的地 ...

随机推荐

学习设计模式之MVC、MVP、MVVM
引言:认真学习了下广义MVC模式下前端怎么写,狭义的MVC其实是有一个变化过程:MVC MVP MVVM,网上看了很多的关于这方面的介绍,以前总是将视图数据逻辑写一个模块,最近尝试分开并用组件式的开发 ...
PHP数组的基本操作及遍历数组的经典操作
<?php 索引数组//数组第一种定义 $arr = array(1,2,3);var_dump($arr); //数组第二种定义$arr = [1,2,3];var_dump($arr); / ...
【转载】Linux的inode的理解
本文转在是博主为了以后温习.请访问原文链接 http://www.cnblogs.com/itech/archive/2012/05/15/2502284.html 一.inode是什么? 理解ino ...
js中的分支与循环
一.js的分支结构 js的分支结构包括:if-else结构.多重if结构.嵌套if结构和switch-case结构 1.if-else结构 1.结构的写法: if(判断条件){ //条件为 ...
第一次用上 Android Studio 2.3 过程及错误解决
因为要开发Android5.0的缘故,抛弃了eclipse转到了Android Studio,第一次使用就是遇到了许多问题,终于是解决问题了,特意写一篇博文给各位要准备从eclipse转到Androi ...
【Zigbee技术入门教程-02】一图读懂ZStack协议栈的基本架构和工作机理
[Zigbee技术入门教程-02]一图读懂ZStack协议栈的基本架构和工作机理广东职业技术学院欧浩源 ohy3686@foxmail.com Z-Stack协议栈是一个基于任务轮询方式的操作 ...
C#使用Xamarin开发可移植移动应用进阶篇(9.混淆代码,防止反编译)
前言系列目录 C#使用Xamarin开发可移植移动应用目录源码地址:https://github.com/l2999019/DemoApp 可以Star一下,随意 - - 说点什么.. 今天讲讲如 ...
Web初学-Web应用细节
一.web应用程序简介 WEB应用程序指供浏览器访问的程序,通常也简称为web应用. 一个web应用由多个静态web资源和动态web资源组成,如: html.css.js文件 Jsp文件.java程序 ...
linux（十二）之用户管理
前面学习了那么多关于linux的东西,相信大家都对linux应该有一个大概的了解了.现在给大家分享的是linux中的用户管理,接下来让我们进入正题吧! 今天其实放松了一整天了,有点后悔自己没有把这些 ...
python进阶学习（二）
本节学习图形用户界面 ------------------------ 本节介绍如何创建python程序的图形用户界面(GUI),也就是那些带有按钮和文本框的窗口.这里介绍wxPython : 下载地 ...

hadoop运行wordcount实例，hdfs简单操作

hadoop运行wordcount实例，hdfs简单操作的更多相关文章

随机推荐

热门专题