Setting up a Single Node Cluster Hadoop on Ubuntu/Debian
Hadoop: Setting up a Single Node Cluster.
Purpose
This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
Prerequisites
Supported Platforms
GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
Windows is also a supported platform but the followings steps are for Linux only. To set up Hadoop on Windows, see wiki page.
Required Software
Required software for Linux include:
Java™ must be installed. Recommended Java versions are described at HadoopJavaVersions.
ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Installing Software
If your cluster doesn’t have the requisite software you will need to install it.
For example on Ubuntu Linux:
$ sudo apt-get install ssh
$ sudo apt-get install rsync
Download
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.
Prepare to Start the Hadoop Cluster
Unpack the downloaded Hadoop distribution. In the distribution, edit the file etc/hadoop/hadoop-env.sh to define some parameters as follows:
# set to the root of your Java installation
export JAVA_HOME=/usr/java/latest
Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.
Now you are ready to start your Hadoop cluster in one of the three supported modes:
Standalone Operation
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input
$ cp etc/hadoop/*.xml input
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
$ cat output/*
Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
Configuration
Use the following:
etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Setup passphraseless ssh
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
Execution
The following instructions are to run a MapReduce job locally. If you want to execute a job on YARN, see YARN on Single Node.
Format the filesystem:
$ bin/hdfs namenode -format
Start NameNode daemon and DataNode daemon:
$ sbin/start-dfs.sh
The hadoop daemon log output is written to the $HADOOP_LOG_DIR directory (defaults to $HADOOP_HOME/logs).
Browse the web interface for the NameNode; by default it is available at:
- NameNode - http://localhost:50070/
Make the HDFS directories required to execute MapReduce jobs:
$ bin/hdfs dfs -mkdir /user
$ bin/hdfs dfs -mkdir /user/<username>Copy the input files into the distributed filesystem:
$ bin/hdfs dfs -put etc/hadoop input
Run some of the examples provided:
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'
Examine the output files: Copy the output files from the distributed filesystem to the local filesystem and examine them:
$ bin/hdfs dfs -get output output
$ cat output/*or
View the output files on the distributed filesystem:
$ bin/hdfs dfs -cat output/*
When you’re done, stop the daemons with:
$ sbin/stop-dfs.sh
YARN on a Single Node
You can run a MapReduce job on YARN in a pseudo-distributed mode by setting a few parameters and running ResourceManager daemon and NodeManager daemon in addition.
The following instructions assume that 1. ~ 4. steps of the above instructions are already executed.
Configure parameters as follows:etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>Start ResourceManager daemon and NodeManager daemon:
$ sbin/start-yarn.sh
Browse the web interface for the ResourceManager; by default it is available at:
- ResourceManager - http://localhost:8088/
Run a MapReduce job.
When you’re done, stop the daemons with:
$ sbin/stop-yarn.sh
Fully-Distributed Operation
For information on setting up fully-distributed, non-trivial clusters see Cluster Setup.
Setting up a Single Node Cluster Hadoop on Ubuntu/Debian的更多相关文章
- Hadoop MapReduce Next Generation - Setting up a Single Node Cluster
Hadoop MapReduce Next Generation - Setting up a Single Node Cluster. Purpose This document describes ...
- CentOS6.4安装Hadoop2.0.5 alpha - Single Node Cluster
1.安装JDK7 rpm到/usr/java/jdk1.7.0_40,并建立软链接/usr/java/default到/usr/java/jdk1.7.0_40 [root@server-308 ~] ...
- Hadoop Single Node Setup(hadoop本地模式和伪分布式模式安装-官方文档翻译 2.7.3)
Purpose(目标) This document describes how to set up and configure a single-node Hadoop installation so ...
- Installing Apache Hadoop Single Node
转载请注明出处:http://www.cnblogs.com/wubdut/p/4681286.html platform: Ubuntu 14.04 LTS hadoop 1.2.1 1. inst ...
- all rows from client_id can grow infinitely compared to a single node when hashing by client_id
all rows from client_id can grow infinitely compared to a single node when hashing by client_id Re: ...
- Kubernetes - Launch Single Node Kubernetes Cluster
Minikube is a tool that makes it easy to run Kubernetes locally. Minikube runs a single-node Kuberne ...
- Hadoop:部署Hadoop Single Node
一.环境准备 1.系统环境 CentOS 7 2.软件环境 OpenJDK # 查询可安装的OpenJDK软件包[root@server1] yum search java | grep jdk... ...
- Node.cluster
nodejs是一个单进程单线程的引擎,只能利用到单个cpu进行计算,面对当今服务器性能的提高,cpu的利用率显然对node应有的性能大打折扣,面对这个问题,cluster应运而生. cluster介绍 ...
- node cluster模块的使用和测试
首先安装async包 用到的有http.cluster包 http和cluster都会node自带的包,无需安装 1:创建cluster.js,代码如下,更具cpu创建多个进程 var cluster ...
随机推荐
- flask _bootstrap中使用flash
在模板中获取flash闪现的那段代码要和内容块放在同一级别上.不然网页上是看不到闪现的内容的. 比如在基模板里定义一个content block ,里面一个是get_flashed_messages代 ...
- 搭建django环境
一.安装django(两种方式) 1.pip install django 2.python setup.py install(下载gjango包:https://www.djangoproject. ...
- OpenERP button 的三种类型
1. workflow: 默认是这种类型,如果你需要创建工作流类型的button使用这个 2. object: 调用function的类型,如果你需要调用py文件中同名的方法,使用该类型. 3.act ...
- Mac下Go2Shell打开配置界面
open -a Go2Shell --args config
- 开源:我的Android新闻客户端,速度快、体积小、支持离线阅读、操作简便、内容展现形式丰富多样、信息量大、功能全面 等(要代码的留下邮箱)
分享:我的Android新闻客户端,速度快.体积小.支持离线阅读.操作简便.内容展现形式丰富多样.信息量大.功能全面 等(要代码的留下邮箱) 历时30天我为了开发这个新闻客户端APP,以下简称觅闻 h ...
- 【Eclipse】编译使用Makefile的C工程
创建MakeFile project新建src文件夹,将文件复制到里面.右击makefile,make targets->create->名称填上allmake targets->b ...
- centos 7编译安装Python3.6.1
1.准备必要的库文件 yum install -y gcc zlib-devel openssl-devel sqlite-devel 2.进入源代码包 ./configure prefix=/usr ...
- 第十篇---javascript函数this关键字
<script type="text/javascript" charset="utf-8"> //this:this对象是指运行时期基于执行环境所 ...
- 什么是O/RMapping?为什么要用O/R Mapping?
什么是O/R Mapping ? O/R Mapping 就是有一大堆的类库,我们调用它的时候用面向对象的方式来调,它帮我们翻译成为面向关系的方式. 为什么要用O/R Mapping? 我们编程会更加 ...
- PHP之mb_internal_encoding使用
mb_internal_encoding (PHP 4 >= 4.0.6, PHP 5, PHP 7) mb_internal_encoding - Set/Get internal chara ...