【hadoop】在eclipse上运行WordCount的操作过程

序：本以为今天花点时间将WordCount例子完全理解到，但高估自己了，更别说我只是在大学选修一学期的java，之后再也没碰过java语言了

总的来说，从宏观上能理解具体的程序思路，但具体到每个代码有什么作用，什么原理，那还需要花点时间，毕竟需要一点java基础和hadoop的运行机制的知识

首先启动hadoop；

[hadoop@hadoop01 eclipse]$ cd ~/hadoop-3.2.0

[hadoop@hadoop01 hadoop-3.2.0]$ sbin/start-all.sh

WARNING: Attempting to start all Apache Hadoop daemons as hadoop in 10 seconds.

WARNING: This is not a recommended production deployment configuration.

WARNING: Use CTRL-C to abort.

Starting namenodes on [hadoop01]

Starting datanodes

Starting secondary namenodes [hadoop01]

Starting resourcemanager

Starting nodemanagers

[hadoop@hadoop01 hadoop-3.2.0]$ jps

8497 NameNode

9121 ResourceManager

8868 SecondaryNameNode

9268 NodeManager

9630 Jps

然后，进入root权限打开eclipse；

[hadoop@hadoop01 hadoop-3.2.0]$ su root

Password:

[root@hadoop01 hadoop-3.2.0]# cd ..

[root@hadoop01 hadoop]# cd eclipse

[root@hadoop01 eclipse]# ./eclipse

在eclipse的window里面show view打开terminal；

在eclipse中点击打开open a terminal，在终端中输入命令：gedit input.txt；

在文档中任意输入内容；

在终端中输入命令：hadoop fs -put /home/hadoop/input.txt /test/；

最后，file--new--project--MapReduce project并取项目名“Wordcount”，再从创建的文件下src中new--package并为包取名“com.hadoop”，又在src下new--class并为类取名“Wordcount”，然后将下面的代码粘贴进去。

然后可以run as hadoop，成功运行得到计算结果

注：若package下无log4j.properties，会报错，需在该文件下手动添加该文件。
内容如下：

# Configure logging for testing: optionally with log file

#log4j.rootLogger=debug,appender

log4j.rootLogger=info,appender

#log4j.rootLogger=error,appender

#\u8F93\u51FA\u5230\u63A7\u5236\u53F0

log4j.appender.appender=org.apache.log4j.ConsoleAppender

#\u6837\u5F0F\u4E3ATTCCLayout

log4j.appender.appender.layout=org.apache.log4j.TTCCLayout

附代码：

/**

 * Licensed to the Apache Software Foundation (ASF) under one

 * or more contributor license agreements.  See the NOTICE file

 * distributed with this work for additional information

 * regarding copyright ownership.  The ASF licenses this file

 * to you under the Apache License, Version 2.0 (the

 * "License"); you may not use this file except in compliance

 * with the License.  You may obtain a copy of the License at

 *

 *     http://www.apache.org/licenses/LICENSE-2.0

 *

 * Unless required by applicable law or agreed to in writing, software

 * distributed under the License is distributed on an "AS IS" BASIS,

 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

 * See the License for the specific language governing permissions and

 * limitations under the License.

 */

package org.apache.hadoop.examples;

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

public class WordCount {

  public static class TokenizerMapper

       extends Mapper<Object, Text, Text, IntWritable>{

    private final static IntWritable one = new IntWritable(1);

    private Text word = new Text();

    public void map(Object key, Text value, Context context

                    ) throws IOException, InterruptedException {

      StringTokenizer itr = new StringTokenizer(value.toString());

      while (itr.hasMoreTokens()) {

        word.set(itr.nextToken());

        context.write(word, one);

      }

    }

  }

  public static class IntSumReducer

       extends Reducer<Text,IntWritable,Text,IntWritable> {

    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,

                       Context context

                       ) throws IOException, InterruptedException {

      int sum = 0;

      for (IntWritable val : values) {

        sum += val.get();

      }

      result.set(sum);

      context.write(key, result);

    }

  }

  public static void main(String[] args) throws Exception {

    Configuration conf = new Configuration();

    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();

    if (otherArgs.length < 2) {

      System.err.println("Usage: wordcount <in> [<in>...] <out>");

      System.exit(2);

    }

    Job job = Job.getInstance(conf, "word count");

    job.setJarByClass(WordCount.class);

    job.setMapperClass(TokenizerMapper.class);

    job.setCombinerClass(IntSumReducer.class);

    job.setReducerClass(IntSumReducer.class);

    job.setOutputKeyClass(Text.class);

    job.setOutputValueClass(IntWritable.class);

    for (int i = 0; i < otherArgs.length - 1; ++i) {

      FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

    }

    FileOutputFormat.setOutputPath(job,

      new Path(otherArgs[otherArgs.length - 1]));

    System.exit(job.waitForCompletion(true) ? 0 : 1);

  }

}

【hadoop】在eclipse上运行WordCount的操作过程的更多相关文章

linux下在eclipse上运行hadoop自带例子wordcount
启动eclipse:打开windows->open perspective->other->map/reduce 可以看到map/reduce开发视图.设置Hadoop locati ...
MapReduce编程入门实例之WordCount：分别在Eclipse和Hadoop集群上运行
上一篇博文如何在Eclipse下搭建Hadoop开发环境,今天给大家介绍一下如何分别分别在Eclipse和Hadoop集群上运行我们的MapReduce程序! 1. 在Eclipse环境下运行MapR ...
在Eclipse上运行Spark(Standalone,Yarn-Client)
欢迎转载,且请注明出处,在文章页面明显位置给出原文连接. 原文链接:http://www.cnblogs.com/zdfjf/p/5175566.html 我们知道有eclipse的Hadoop插件, ...
Spark源码编译并在YARN上运行WordCount实例
在学习一门新语言时,想必我们都是"Hello World"程序开始,类似地,分布式计算框架的一个典型实例就是WordCount程序,接触过Hadoop的人肯定都知道用MapRedu ...
mac上eclipse上运行word count
1.打开eclipse之后,建立wordcount项目 package wordcount; import java.io.IOException; import java.util.StringTo ...
解决在windows的eclipse上面运行WordCount程序出现的一系列问题详解
一．简介要在Windows下的 Eclipse上调试Hadoop2代码,所以我们在windows下的Eclipse配置hadoop-eclipse-plugin- 2.6.0.jar插件,并在运行H ...
在Hadoop 2.3上运行C++程序各种疑难杂症（Hadoop Pipes选择、错误集锦、Hadoop2.3编译等）
首记感觉Hadoop是一个坑,打着大数据最佳解决方案的旗帜到处坑害良民.记得以前看过一篇文章,说1TB以下的数据就不要用Hadoop了,体现不出太大的优势,有时候反而会成为累赘.因此Hadoop的 ...
hadoop 把mapreduce任务从本地提交到hadoop集群上运行
MapReduce任务有三种运行方式: 1.windows(linux)本地调试运行,需要本地hadoop环境支持 2.本地编译成jar包,手动发送到hadoop集群上用hadoop jar或者yar ...
Hadoop在window上运行 user=Administrator, access=WRITE, inode="hadoop"
win7下eclipse中错误的详细描述如下: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.securit ...

随机推荐

Redis自定义fastJson Serializer
public class FastJsonRedisSerializer<T> implements RedisSerializer<T> { public static fi ...
rabbitmq - 消息接收，解析xml格式数据时异常：ERROR not well-formed (invalid token): line 4, column 46
ERROR alsv odoo.addons.cus_alsv.utils.alsv_about_mq.get_data_from_mq: parse_xml_data_from_mq: not we ...
echo * 和ls *之间的区别？
背景描述: 今天一同事做入职考试,涉及到1题目,echo * 和ls *之间的区别,没有用过这个用法,再次记录下. 操作过程: 1.执行echo * [root@localhost ~]# echo ...
Linux服务器连接不上的几种解决办法
Linux远程服务器连接不上,或连接超时解决办法:1.测试网络是否通: ping 远程IP 2.如果能ping通则表示与服务器网络连接是正常,接下来测试端口:telnet 远程ip 端口 3.如 ...
史上最全的 DB2 错误代码大全
1 前言作为一个程序员,数据库是我们必须掌握的知识,经常操作数据库不可避免,but,在写 SQL 语句的时候,难免遇到各种问题.例如,当我们看着数据库报出的一大堆错误时,是否有种两眼发蒙的感觉呢?咳 ...
CefSharp 提示 flash player is out of date 运行此插件等问题解决办法
CefSharp 提示 flash player is out of date 或者需要手动右键点运行此插件脚本等问题解决办法因为中国版FlashPlayer变得Ad模式之后,只好用旧版本的 ...
搭建SpringCloud微服务
建立spring父模块删除不必要的src目录父模块中的pom.xml中添加相应的依赖以及插件.远程仓库地址 <!-- 项目的打包类型, 即项目的发布形式, 默认为 jar. 对于聚合项目的父 ...
Python3.7安装（解决ssl问题）
摘自:https://blog.csdn.net/love_cjiajia/article/details/82254371 python3.7安装(解决ssl的问题) 1) 安装准备 yum -y ...
[LeetCode] 150. Evaluate Reverse Polish Notation 计算逆波兰表达式
Evaluate the value of an arithmetic expression in Reverse Polish Notation. Valid operators are +, -, ...
django：下拉框二级联动实现
注意:只列举核心部分代码前台模板: 第一级下拉菜单: <div class="col-sm-4"> <select data-placeholder=" ...

【hadoop】在eclipse上运行WordCount的操作过程

【hadoop】在eclipse上运行WordCount的操作过程的更多相关文章

随机推荐

热门专题