转载自官方文档,最新版请见:http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.html

 

补充:建议添加如下环境变量

#hadoop configuration
export PATH=$PATH:/home/jediael/hadoop-2.4.1/bin:/home/jediael/hadoop-2.4.1/sbin
export HADOOP_HOME=/home/jediael/hadoop-2.4.1
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"

 

Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.

Purpose

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Prerequisites

Supported Platforms

  • GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
  • Windows is also a supported platform but the followings steps are for Linux only. To set up Hadoop on Windows, see wiki page.

Required Software

Required software for Linux include:

  1. Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
  2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

Installing Software

If your cluster doesn't have the requisite software you will need to install it.

For example on Ubuntu Linux:

  $ sudo apt-get install ssh
$ sudo apt-get install rsync

Download

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

  # set to the root of your Java installation
export JAVA_HOME=/usr/java/latest # Assuming your installation directory is /usr/local/hadoop
export HADOOP_PREFIX=/usr/local/hadoop

第二步不做好像没影响。

Try the following command:

  $ bin/hadoop

This will display the usage documentation for the hadoop script.

Now you are ready to start your Hadoop cluster in one of the three supported modes:

Standalone Operation

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

  $ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
$ cat output/*

Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Configuration

Use the following:

etc/hadoop/core-site.xml:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

  $ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

  $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Execution

The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.

  1. Format the filesystem:

      $ bin/hdfs namenode -format
  2. Start NameNode daemon and DataNode daemon:此步骤报很多警告,但不影响执行结果。
      $ sbin/start-dfs.sh

    The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

  3. Browse the web interface for the NameNode; by default it is available at:
    • NameNode - http://localhost:50070/
  4. Make the HDFS directories required to execute MapReduce jobs:
      $ bin/hdfs dfs -mkdir /user
    $ bin/hdfs dfs -mkdir /user/<username>
  5. Copy the input files into the distributed filesystem:
      $ bin/hdfs dfs -put etc/hadoop input
  6. Run some of the examples provided:
      $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'
  7. Examine the output files:

    Copy the output files from the distributed filesystem to the local filesystem and examine them:

      $ bin/hdfs dfs -get output output
    $ cat output/*

    or

    View the output files on the distributed filesystem:

      $ bin/hdfs dfs -cat output/*
  8. When you're done, stop the daemons with:
      $ sbin/stop-dfs.sh

YARN on Single Node

You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.

The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.

  1. Configure parameters as follows:

    etc/hadoop/mapred-site.xml:

    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    </configuration>

    etc/hadoop/yarn-site.xml:

    <configuration>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    </configuration>
  2. Start ResourceManager daemon and NodeManager daemon:
      $ sbin/start-yarn.sh
  3. Browse the web interface for the ResourceManager; by default it is available at:
    • ResourceManager - http://localhost:8088/
  4. Run a MapReduce job.
  5. When you're done, stop the daemons with:
      $ sbin/stop-yarn.sh
 

 

单机/伪分布式Hadoop2.4.1安装文档 2014-07-08 21:16 2275人阅读 评论(0) 收藏的更多相关文章

  1. 单机/伪分布式Hadoop2.4.1安装文档

    转载自官方文档,最新版请见:http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SingleCluster.h ...

  2. opnet安装及安装中出现问题的解决办法 分类: opnet 2014-04-06 21:50 397人阅读 评论(0) 收藏

    我使用的opnet14.5  win7 64位系统的http://pan.baidu.com/s/1qWyfxnu,电脑先刷了win7 64位原版系统. 选择了VS2013+opnet14.5的安装方 ...

  3. CocoaPods安装和使用教程 分类: ios技术 ios相关 2015-03-11 21:53 48人阅读 评论(0) 收藏

    目录 CocoaPods是什么? 如何下载和安装CocoaPods? 如何使用CocoaPods? 场景1:利用CocoaPods,在项目中导入AFNetworking类库 场景2:如何正确编译运行一 ...

  4. VS2010下安装Opencv 分类: Opencv 2014-11-02 13:51 778人阅读 评论(0) 收藏

    Opencv作为一种跨平台计算机视觉库,在图像处理领域得到广泛的应用,下面介绍如何在VS2010中安装配置Opencv 一.下载Opencv 下载网址:http://sourceforge.net/p ...

  5. JAVA代码规范 标签: java文档工作 2016-06-12 21:50 277人阅读 评论(5) 收藏

    开始做java的ITOO了,近期的工作内容就是按照代码规范来改自己负责的代码,之前做机房收费系统的时候,也是经常验收的,甚至于我们上次验收的时候,老师也去了.对于我们的代码规范,老师其实是很重视的,他 ...

  6. 安装hadoop2.6.0伪分布式环境 分类: A1_HADOOP 2015-04-27 18:59 409人阅读 评论(0) 收藏

    集群环境搭建请见:http://blog.csdn.net/jediael_lu/article/details/45145767 一.环境准备 1.安装linux.jdk 2.下载hadoop2.6 ...

  7. 安装spark1.3.1单机环境 分类: B8_SPARK 2015-04-27 14:52 1873人阅读 评论(0) 收藏

    本文介绍安装spark单机环境的方法,可用于测试及开发.主要分成以下4部分: (1)环境准备 (2)安装scala (3)安装spark (4)验证安装情况 1.环境准备 (1)配套软件版本要求:Sp ...

  8. 搭建hadoop2.6.0集群环境 分类: A1_HADOOP 2015-04-20 07:21 459人阅读 评论(0) 收藏

    一.规划 (一)硬件资源 10.171.29.191 master 10.171.94.155  slave1 10.251.0.197 slave3 (二)基本资料 用户:  jediael 目录: ...

  9. Hadoop1.2.1伪分布模式安装指南 分类: A1_HADOOP 2014-08-17 10:52 1346人阅读 评论(0) 收藏

    一.前置条件 1.操作系统准备 (1)Linux可以用作开发平台及产品平台. (2)win32只可用作开发平台,且需要cygwin的支持. 2.安装jdk 1.6或以上 3.安装ssh,并配置免密码登 ...

随机推荐

  1. ABAP调用外部WebService

    TCode:se80 选择 Package,输入我们自己的开发包,后回车 右击 开发包名称,选择菜单 出现创建向导窗体 选择"Service Consumer",点击 继续 选择& ...

  2. 漫话C++之string字符串类的使用(有汇编分析)

    C++中并不提倡继续使用C风格的字符串,而是为字符串定义了专门的类,名为string. 使用前的准备工作 在使用string类型时,需要包含string头文件,且string位于std命名空间内: # ...

  3. 类名引用static变量好处

    不仅强调了变量static的结构,而且在有些情况下他还为编译器进行优化提供了更好的机会.

  4. 蚂蚁金服入股36Kr给我的一点警示:应该相信自己的理性分析,不能盲目迷信权威

    最近3年,关注互联网和创业投资比较多,每周都会关注下本周发生的创业投融资大事件. 我注意到,一些自媒体作者经常会发布一些有"前瞻性"的文章,比如"美团大众要合并了&quo ...

  5. Maven搭建hadoop环境报Missing artifact jdk.tools:jdk.tools:jar:1.7(5种办法,2种正解)

    刚刚写的那一篇,是网上比较主流的解决办法. 鉴于实际情况,有伙伴的机器上没有遇到这个问题,我们再探究原因,最终还有4种情况需要说明. 先说,另外一种"正解". <depend ...

  6. 如何把excel同一个单元格内的文字和数字分别提取出来?

    平台:excel 2010 目的:把excel同一个单元格内的文字和数字分别提取出来 操作: 假设数据在A1单元格:如果文字在前,B1=left(A1,lenb(A1)-len(A1))可得文字,C1 ...

  7. ArcGIS教程:地理处理服务演示样例(河流网络)(三)

    设置输出符号系统 步骤: 展开 StoweStreamNet.tbx 并双击创建河流网络模型. 接受默认的 45 公顷并单击确定以运行模型. StreamNet 图层将加入至 ArcMap. 右键单击 ...

  8. 硬件——nrf51822第三篇,按键控制小灯

    现象是按键按下,小灯亮,按键抬起,小灯灭. 从这一节我们细致剖析gpio口的设置: nrf51822片上一共有32个数字引脚,分为4个port,如下: port 0 pin 0-7 port 1 pi ...

  9. 使用javascript实现图片上下切换效果并且实现顺序循环播放

    <!doctype html><html lang="en"><head> <meta charset="UTF-8" ...

  10. 【2017 ACM-ICPC 亚洲区(乌鲁木齐赛区)网络赛 G】Query on a string

    [链接]h在这里写链接 [题意] 让你维护字符串的一段区间内T子串的个数. [题解] 因为t不大,所以. 暴力维护一下a[i]就好. a[i]表示的是S串从i位置开始,能和T串匹配几个字符. 用树状数 ...