Hadoop: Setting up a Single Node Cluster.

Purpose

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Prerequisites

Supported Platforms

  • GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.

  • Windows is also a supported platform but the followings steps are for Linux only. To set up Hadoop on Windows, see wiki page.

Required Software

Required software for Linux include:

  1. Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.

  2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

Installing Software

If your cluster doesn’t have the requisite software you will need to install it.

For example on Ubuntu Linux:

  $ sudo apt-get install ssh
$ sudo apt-get install rsync

Download

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:

  # set to the root of your Java installation
export JAVA_HOME=/usr/java/latest

Try the following command:

  $ bin/hadoop

This will display the usage documentation for the hadoop script.

Now you are ready to start your Hadoop cluster in one of the three supported modes:

Standalone Operation

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

  $ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
$ cat output/*

Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Configuration

Use the following:

etc/hadoop/core-site.xml:

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>

etc/hadoop/hdfs-site.xml:

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

  $ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

  $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys

Execution

The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.

  1. Format the filesystem:

      $ bin/hdfs namenode -format
    
  2. Start NameNode daemon and DataNode daemon:

      $ sbin/start-dfs.sh
    

    The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).

  3. Browse the web interface for the NameNode; by default it is available at:

    • NameNode - http://localhost:50070/
  4. Make the HDFS directories required to execute MapReduce jobs:

      $ bin/hdfs dfs -mkdir /user
    $ bin/hdfs dfs -mkdir /user/<username>
  5. Copy the input files into the distributed filesystem:

      $ bin/hdfs dfs -put etc/hadoop input
    
  6. Run some of the examples provided:

      $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
    
  7. Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:

      $ bin/hdfs dfs -get output output
    $ cat output/*

    or

    View the output files on the distributed filesystem:

      $ bin/hdfs dfs -cat output/*
    
  8. When you’re done, stop the daemons with:

      $ sbin/stop-dfs.sh
    

YARN on a Single Node

You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.

The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.

  1. Configure parameters as follows:etc/hadoop/mapred-site.xml:

    <configuration>
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>
    </configuration>

    etc/hadoop/yarn-site.xml:

    <configuration>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    </configuration>
  2. Start ResourceManager daemon and NodeManager daemon:

      $ sbin/start-yarn.sh
    
  3. Browse the web interface for the ResourceManager; by default it is available at:

    • ResourceManager - http://localhost:8088/
  4. Run a MapReduce job.

  5. When you’re done, stop the daemons with:

      $ sbin/stop-yarn.sh
    

Fully-Distributed Operation

For information on setting up fully-distributed, non-trivial clusters see Cluster Setup.

Setting up a Single Node Cluster Hadoop on Ubuntu/Debian的更多相关文章

  1. Hadoop MapReduce Next Generation - Setting up a Single Node Cluster

    Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. Purpose This document describes ...

  2. CentOS6.4安装Hadoop2.0.5 alpha - Single Node Cluster

    1.安装JDK7 rpm到/usr/java/jdk1.7.0_40,并建立软链接/usr/java/default到/usr/java/jdk1.7.0_40 [root@server-308 ~] ...

  3. Hadoop Single Node Setup(hadoop本地模式和伪分布式模式安装-官方文档翻译 2.7.3)

    Purpose(目标) This document describes how to set up and configure a single-node Hadoop installation so ...

  4. Installing Apache Hadoop Single Node

    转载请注明出处:http://www.cnblogs.com/wubdut/p/4681286.html platform: Ubuntu 14.04 LTS hadoop 1.2.1 1. inst ...

  5. all rows from client_id can grow infinitely compared to a single node when hashing by client_id

    all rows from client_id can grow infinitely compared to a single node when hashing by client_id Re: ...

  6. Kubernetes - Launch Single Node Kubernetes Cluster

    Minikube is a tool that makes it easy to run Kubernetes locally. Minikube runs a single-node Kuberne ...

  7. Hadoop:部署Hadoop Single Node

    一.环境准备 1.系统环境 CentOS 7 2.软件环境 OpenJDK # 查询可安装的OpenJDK软件包[root@server1] yum search java | grep jdk... ...

  8. Node.cluster

    nodejs是一个单进程单线程的引擎,只能利用到单个cpu进行计算,面对当今服务器性能的提高,cpu的利用率显然对node应有的性能大打折扣,面对这个问题,cluster应运而生. cluster介绍 ...

  9. node cluster模块的使用和测试

    首先安装async包 用到的有http.cluster包 http和cluster都会node自带的包,无需安装 1:创建cluster.js,代码如下,更具cpu创建多个进程 var cluster ...

随机推荐

  1. 最新版chrome浏览器如何离线安装crx插件?(转载)

    原文链接:https://newsn.net/say/chrome-crx-offline.html mac新版chrome开启离线插件安装 对于mac新版chrome,注意,大家一定要按照顺序来.m ...

  2. 【Alpha】任务分解与分配

    Alpha阶段总体任务规划 Alpha阶段我们的任务主要是恢复原先项目的代码运行,并增加一部分物理实验(二)的内容以及完善之前项目未完成的功能,例如后台管理及用户管理界面.在恢复项目部分的主要工作是将 ...

  3. matplotlib画图无法显示图例 报错No handles with labels found to put in legend.

    很久没有matplotlib了,今天画图的时候发现了一个很小的问题....明明加了legend(),图表会出来,却无法正常显示图例.最后发现只要在plt.plot()加label图例就可以正常显示了.

  4. 三、OPENERP 中的对象关系类型

    OE中的对象关系一共分四种,one2one,one2many,many2one,many2many.他们的意思分别是一对一,一对多,多对一以及多对多. 我们新建一个模块来测试这四种类型 1.one2o ...

  5. JDBC处理Transaction

    package com.ayang.jdbc; import java.sql.*; /** * transaction的构成,随便写一句insenrt,一执行executeUpdate(),它自动提 ...

  6. HUE配置文件hue.ini 的impala模块详解(图文详解)(分HA集群)

    不多说,直接上干货! 我的集群机器情况是 bigdatamaster(192.168.80.10).bigdataslave1(192.168.80.11)和bigdataslave2(192.168 ...

  7. Android开源项目xUtils HttpUtils模块分析(转)

    xUtils是github上的一个Android开源工具项目,其中HttpUtils模块是处理网络连接部分,刚好最近想整理下Android网络编程知识,今天学习下xUtils中HttpUtils. x ...

  8. js关闭当前页面

    <a href="javascript:window.opener=null;window.open('','_self');window.close();">关闭&l ...

  9. Thread中断线程的方法

    转载:https://www.cnblogs.com/l2rf/p/5566895.html 线程对象属于一次性消耗品,一般线程执行完run方法之后,线程就正常结束了,线程结束之后就报废了,不能再次s ...

  10. python模块学习第 0000 题

    将你的 QQ 头像(或者微博头像)右上角加上红色的数字,类似于微信未读信息数量那种提示效果. 类似于图中效果: 好可爱>%<! 题目来源:https://github.com/Yixiao ...