1、JDK安装 
下载网址: 
http://www.oracle.com/technetwork/java/javase/downloads/jdk-6u29-download-513648.html 
如果本地有安装包,则用SecureCRT连接Linux机器,然后用rz指令进行上传文件;

下载后获得jdk-6u29-linux-i586-rpm.bin文件,使用sh jdk-6u29-linux-i586-rpm.bin进行安装, 
等待安装完成即可;java默认会安装在/usr/java下;

在命令行输入:vi /etc/profile在里面添加如下内容export JAVA_HOME=/usr/java/jdk1.6.0_29export JAVA_BIN=/usr/java/jdk1.6.0_29/binexport PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport JAVA_HOME JAVA_BIN PATH CLASSPATH

进入 /usr/bin/目录cd /usr/binln -s -f /usr/java/jdk1.6.0_29/jre/bin/javaln -s -f /usr/java/jdk1.6.0_29/bin/javac 
在命令行输入java -version屏幕输出:java version "jdk1.6.0_02"Java(TM) 2 Runtime Environment, Standard Edition (build jdk1.6.0_02)Java HotSpot(TM) Client VM (build jdk1.6.0_02, mixed mode)则表示安装JDK1.6完毕.

2、Hadoop安装 
下载网址:http://www.apache.org/dyn/closer.cgi/hadoop/common/ 
如果本地有安装包,则用SecureCRT连接Linux机器,然后用rz指令进行上传文件;

下载后获得hadoop-0.21.0.tar.gz文件

解压 tar zxvf hadoop-0.21.0.tar.gz 
压缩:tar zcvf hadoop-0.21.0.tar.gz 目录名

在命令行输入:vi /etc/profile在里面添加如下内容 
export hadoop_home = /usr/george/dev/install/hadoop-0.21.0 
export JAVA_HOME=/usr/java/jdk1.6.0_29export JAVA_BIN=/usr/java/jdk1.6.0_29/binexport PATH=$PATH:$JAVA_HOME/bin:$hadoop_home/binexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarexport JAVA_HOME JAVA_BIN PATH CLASSPATH

需要注销用户或重启vm,就可以直接输入hadoop指令了; 
WordCount例子代码 
3.1 Java代码: 
package demo;

import java.io.IOException; 
import java.util.Iterator; 
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path; 
import org.apache.hadoop.io.IntWritable; 
import org.apache.hadoop.io.LongWritable; 
import org.apache.hadoop.io.Text; 
import org.apache.hadoop.mapred.FileInputFormat; 
import org.apache.hadoop.mapred.FileOutputFormat; 
import org.apache.hadoop.mapred.JobClient; 
import org.apache.hadoop.mapred.JobConf; 
import org.apache.hadoop.mapred.MapReduceBase; 
import org.apache.hadoop.mapred.Mapper; 
import org.apache.hadoop.mapred.OutputCollector; 
import org.apache.hadoop.mapred.Reducer; 
import org.apache.hadoop.mapred.Reporter; 
import org.apache.hadoop.mapred.TextInputFormat; 
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount { 
public static class Map extends MapReduceBase implements 
Mapper<LongWritable, Text, Text, IntWritable> { 
private final static IntWritable one = new IntWritable(1); 
private Text word = new Text();

public void map(LongWritable key, Text value, 
OutputCollector<Text, IntWritable> output, Reporter reporter) 
throws IOException { 
String line = value.toString(); 
StringTokenizer tokenizer = new StringTokenizer(line); 
while (tokenizer.hasMoreTokens()) { 
word.set(tokenizer.nextToken()); 
output.collect(word, one); 


}

public static class Reduce extends MapReduceBase implements 
Reducer<Text, IntWritable, Text, IntWritable> { 
public void reduce(Text key, Iterator<IntWritable> values, 
OutputCollector<Text, IntWritable> output, Reporter reporter) 
throws IOException { 
int sum = 0; 
while (values.hasNext()) { 
sum += values.next().get(); 

output.collect(key, new IntWritable(sum)); 

}

public static void main(String[] args) throws Exception { 
JobConf conf = new JobConf(WordCount.class); 
conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class); 
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class); 
conf.setCombinerClass(Reduce.class); 
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class); 
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0])); 
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf); 

}

3.2 编译: 
javac -classpath /usr/george/dev/install/hadoop-0.21.0/hadoop-hdfs-0.21.0.jar:/usr/george/dev/install/hadoop-0.21.0/hadoop-mapred-0.21.0.jar:/usr/george/dev/install/hadoop-0.21.0/hadoop-common-0.21.0.jar WordCount.java -d /usr/george/dev/wkspace/hadoop/wordcount/classes 
在windows中,多个classpath参数值用;分割;在linux中用:分割;

编译后,会在/usr/george/dev/wkspace/hadoop/wordcount/classes目录下生成三个class文件: 
WordCount.class  WordCount$Map.class  WordCount$Reduce.class

3.3将class文件打成jar包 
到/usr/george/dev/wkspace/hadoop/wordcount/classes目录,运行jar cvf WordCount.jar *.class就会生成: 
WordCount.class  WordCount.jar  WordCount$Map.class  WordCount$Reduce.class

3.4 创建输入数据: 
创建/usr/george/dev/wkspace/hadoop/wordcount/datas目录,在其下创建input1.txt和input2.txt文件: 
Touch input1.txt 
Vi input1.txt

文件内容如下: 
i love chinaare you ok?

按照同样的方法创建input2.txt,内容如下: 
hello, i love word 
You are ok

创建成功后可以通过cat input1.txt 和 cat input2.txt查看内容;

3.5 创建hadoop输入与输出目录: 
hadoop fs -mkdir wordcount/inputhadoop fs -mkdir wordcount/outputhadoop fs -put input1.txt wordcount/input/hadoop fs -put input2.txt wordcount/input/

Ps : 可以不创建out目录,要不运行WordCount程序时会报output文件已经存在,所以下面的命令行中使用了output1为输出目录; 
3.6运行 
到/usr/george/dev/wkspace/hadoop/wordcount/classes目录,运行 
[root@localhost classes]# hadoop jar WordCount.jar WordCount wordcount/input wordcount/output1 
11/12/02 05:53:59 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000 
11/12/02 05:53:59 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 
11/12/02 05:53:59 WARN mapreduce.JobSubmitter: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 
11/12/02 05:53:59 INFO mapred.FileInputFormat: Total input paths to process : 2 
11/12/02 05:54:00 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 
11/12/02 05:54:00 INFO mapreduce.JobSubmitter: number of splits:2 
11/12/02 05:54:00 INFO mapreduce.JobSubmitter: adding the following namenodes' delegation tokens:null 
11/12/02 05:54:00 INFO mapreduce.Job: Running job: job_201112020429_0003 
11/12/02 05:54:01 INFO mapreduce.Job:  map 0% reduce 0% 
11/12/02 05:54:20 INFO mapreduce.Job:  map 50% reduce 0% 
11/12/02 05:54:23 INFO mapreduce.Job:  map 100% reduce 0% 
11/12/02 05:54:29 INFO mapreduce.Job:  map 100% reduce 100% 
11/12/02 05:54:32 INFO mapreduce.Job: Job complete: job_201112020429_0003 
11/12/02 05:54:32 INFO mapreduce.Job: Counters: 33 
        FileInputFormatCounters 
                BYTES_READ=54 
        FileSystemCounters 
                FILE_BYTES_READ=132 
                FILE_BYTES_WRITTEN=334 
                HDFS_BYTES_READ=274 
                HDFS_BYTES_WRITTEN=65 
        Shuffle Errors 
                BAD_ID=0 
                CONNECTION=0 
                IO_ERROR=0 
                WRONG_LENGTH=0 
                WRONG_MAP=0 
                WRONG_REDUCE=0 
        Job Counters 
                Data-local map tasks=2 
                Total time spent by all maps waiting after reserving slots (ms)=0 
                Total time spent by all reduces waiting after reserving slots (ms)=0 
                SLOTS_MILLIS_MAPS=24824 
                SLOTS_MILLIS_REDUCES=6870 
                Launched map tasks=2 
                Launched reduce tasks=1 
        Map-Reduce Framework 
                Combine input records=12 
                Combine output records=12 
                Failed Shuffles=0 
                GC time elapsed (ms)=291 
                Map input records=4 
                Map output bytes=102 
                Map output records=12 
                Merged Map outputs=2 
                Reduce input groups=10 
                Reduce input records=12 
                Reduce output records=10 
                Reduce shuffle bytes=138 
                Shuffled Maps =2 
                Spilled Records=24 
                SPLIT_RAW_BYTES=220

3.7 查看输出目录 
[root@localhost classes]# hadoop fs -ls wordcount/output1 
11/12/02 05:54:59 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000 
11/12/02 05:55:00 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 
Found 2 items 
-rw-r--r--   1 root supergroup          0 2011-12-02 05:54 /user/root/wordcount/output1/_SUCCESS 
-rw-r--r--   1 root supergroup         65 2011-12-02 05:54 /user/root/wordcount/output1/part-00000

[root@localhost classes]# hadoop fs -cat /user/root/wordcount/output1/part-00000 
11/12/02 05:56:05 INFO security.Groups: Group mapping impl=org.apache.hadoop.security.ShellBasedUnixGroupsMapping; cacheTimeout=300000 
11/12/02 05:56:05 WARN conf.Configuration: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 
You     1 
are     2 
china   1 
hello,i 1 
i       1 
love    2 
ok      1 
ok?     1 
word    1 
you     1

hadoop安装与WordCount例子的更多相关文章

  1. 三.hadoop mapreduce之WordCount例子

    目录: 目录见文章1 这个案列完成对单词的计数,重写map,与reduce方法,完成对mapreduce的理解. Mapreduce初析 Mapreduce是一个计算框架,既然是做计算的框架,那么表现 ...

  2. RedHat 安装Hadoop并运行wordcount例子

    1.安装 Red Hat 环境 2.安装JDK 3.下载hadoop2.8.0 http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/had ...

  3. [Linux][Hadoop] 运行WordCount例子

    紧接上篇,完成Hadoop的安装并跑起来之后,是该运行相关例子的时候了,而最简单最直接的例子就是HelloWorld式的WordCount例子.   参照博客进行运行:http://xiejiangl ...

  4. hadoop的wordcount例子运行

    可以通过一个简单的例子来说明MapReduce到底是什么: 我们要统计一个大文件中的各个单词出现的次数.由于文件太大.我们把这个文件切分成如果小文件,然后安排多个人去统计.这个过程就是”Map”.然后 ...

  5. Hadoop【单机安装-测试程序WordCount】

    Hadoop程序说明,就是创建一个文本文件,然后统计这个文本文件中单词出现过多少次! (MapReduce 运行在本地   启动JVM ) 第一步    创建需要的文件目录,然后进入该文件中进行编辑 ...

  6. (二)Hadoop例子——运行example中的wordCount例子

    Hadoop例子——运行example中的wordCount例子 一.   需求说明 单词计数是最简单也是最能体现MapReduce思想的程序之一,可以称为 MapReduce版"Hello ...

  7. 【hadoop】看懂WordCount例子

    前言:今天刚开始看到map和reduce类里面的内容时,说实话一片迷茫,who are you?,最后实在没办法,上B站看别人的解说视频,再加上自己去网上查java的包的解释,终于把WordCount ...

  8. 转载:Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.04

    原文 http://www.powerxing.com/install-hadoop/ 当开始着手实践 Hadoop 时,安装 Hadoop 往往会成为新手的一道门槛.尽管安装其实很简单,书上有写到, ...

  9. Hadoop安装教程_单机/伪分布式配置_Hadoop2.6.0/Ubuntu14.04

    摘自: http://www.cnblogs.com/kinglau/p/3796164.html http://www.powerxing.com/install-hadoop/ 当开始着手实践 H ...

随机推荐

  1. NFC(10)NDEF uri格式规范及读写示例(解析与封装ndef uri)

    只有遵守NDEF uri 格式规范的数据才能写到nfc标签上. NDEF uri 格式规范 uri 只有两部分: 第1个字节是uri协议映射值,如:0x01 表示uri以 http://www.开头. ...

  2. [swustoj 1021] Submissions of online judge

    Submissions of online judge(1021) 问题描述 An online judge is a system to test programs in programming c ...

  3. css padding在ie7、ie6、firefox中的兼容问题

    padding 简写属性在一个声明中设置所有内边距属性. 说明这个简写属性设置元素所有内边距的宽度,或者设置各边上内边距的宽度.行内非替换元素上设置的内边距不会影响行高计算:因此,如果一个元素既有内边 ...

  4. HDU 4635 Strongly connected(强连通分量,变形)

    题意:给出一个有向图(不一定连通),问最多可添加多少条边而该图仍然没有强连通. 思路: 强连通分量必须先求出,每个强连通分量包含有几个点也需要知道,每个点只会属于1个强连通分量. 在使图不强连通的前提 ...

  5. Java [Leetcode 104]Maximum Depth of Binary Tree

    题目描述: Given a binary tree, find its maximum depth. The maximum depth is the number of nodes along th ...

  6. Android样式——Styles

    说明 样式(style)是属性的集合,用来指定View或者Window的外观和格式. 这些属性可以是height(高度).padding(内边距).font size(字体颜色)等. 样式定义在另一个 ...

  7. HDU 5617 Jam's maze 巧妙DP

    题意:给你一个字符矩阵,从(1,1)到(n,n)有很多种走法,每一种走法形成一个字符串,问有多少种走法形成的字符串是回文的 分析:(粘贴BC题解) 的是回文串,有人会想到后缀数组自动机马拉车什么的,其 ...

  8. unity3d实现序列帧动画

    首先准备一个序列帧图片如下的AngryBird: 场景中随便创建一个物体,这里以Cube为例 将图片拖放到Cube上,这样会在Cube的6各面都有3个bird,为了美观显示一个鸟,我们调整材质的Til ...

  9. bzoj 2244 [SDOI2011]拦截导弹(DP+CDQ分治+BIT)

    [题目链接] http://www.lydsy.com/JudgeOnline/problem.php?id=2244 [题意] 给定n个二元组,求出最长不上升子序列和各颗导弹被拦截的概率. [思路] ...

  10. ArcGIS 10.2与CityEngine2013共存的安装

    直接上干货 大前提:由于License Manager的不同版本无法同时安装,因此要想ArcGIS和CityEngine共存其License Manger必须一致. 通过校验安装包中License M ...