一、前期环境

  • 安装概览
IP Host Name Software Node
192.168.23.128 ae01 JDK 1.7 NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker
192.168.23.129 ae02 JDK 1.7 DataNode, TaskTracker
192.168.23.130 ae03 JDK 1.7 DataNode, TaskTracker
    若使用虚拟机安装,可以安装 samba, smbfs方便对于文件的控制。
  • 系统环境: ubuntu-12.04.2-server-amd64
  • 安装目录: /usr/local/ae
  • JDK 安装目录: export JAVA_HOME=/usr/local/ae/jdk1.7.0_51
  • Hadoop版本: hadoop-1.2.1

二、服务器间无密码登录

进入每台服务器,分别安装SSH,并生成ssh-key。下面安装步骤只讲解在ae01安装SSH和生成SSH-KEY,ae02、ae03重复此步骤。

  • 安装SSH

    user@ae01:~$ sudo apt-get install openssh-server
  • 生成SSH-KEY
    user@ae01:~# ssh-keygen -t rsa -P ""
    Generating public/private rsa key pair.
    Enter file in which to save the key (/root/.ssh/id_rsa):
    Created directory '/root/.ssh'.
    Your identification has been saved in /root/.ssh/id_rsa.
    Your public key has been saved in /root/.ssh/id_rsa.pub.
    The key fingerprint is:
    :3d:a4:::c4:::6b:6b:1c:7c:e9:8f:: user@ae01
    The key's randomart image is:
    +--[ RSA ]----+
    | .=*.. |
    | === . . |
    | Oo= E |
    | = = . o |
    | S . . |
    | . + |
    | . . |
    | |
    | |
    +-----------------+
  • 配置无密码SSH登录
    如果你希望服务器a1不需要密码就能SSH登录服务器a2,你需要将a1生成的公共密码添加到a2的~/.ssh/authorized_keys文件.
    此次安装中,ae01是NameNode,需要无密码登录到DataNode(ae02,ae03)的服务器,所以我们需要分别将ae01生成的公钥,添加到ae02,ae03的authorized_keys文件。
    修改ae01的公钥名字为id_rsa_ae01.pub。
    user@ae01:~/.ssh$ sudo cp id_rsa.pub id_rsa_ae01.pub

    复制id_rsa_ae01.pub到服务器ae02

    user@ae01:~/.ssh$ scp ./id_rsa_ae01.pub user@192.168.23.129:~/.ssh/

    登录到ae02,将id_rsa_ae01.pub 添加到authorized_keys

    user@ae02:~/.ssh$ cat id_rsa_ae01.pub >> authorized_keys

    重新登录到ae01,并尝试无密码方面ae02

    user@ae01:~/.ssh$ ssh ae02
    Welcome to Ubuntu 12.04. LTS (GNU/Linux 3.5.--generic x86_64) * Documentation: https://help.ubuntu.com/ System information as of Thu Jun :: CST System load: 0.0 Processes:
    Usage of /: 10.3% of .45GB Users logged in:
    Memory usage: % IP address for eth0: 192.168.23.129
    Swap usage: % Graph this data and manage this system at https://landscape.canonical.com/ packages can be updated.
    updates are security updates. Last login: Tue Jun :: from 192.168.23.128

    对以上机器都进行如上操作,确保两两之间可以实现无密码ssh.

三、安装 Hadoop

  • 修改host文件,添加3台服务器的host

    user@ae01:/usr/local/ae$ sudo vim /etc/hosts
    127.0.0.1         localhost
    192.168.23.128 ae01
    192.168.23.129 ae02
    192.168.23.129 ae03
  • 解压Hadoop
    将hadoop-1.2.1.tar.gz 复制到 /usr/local/ae,解压
    user@ae01:/usr/local/ae$ sudo tar -zxvf hadoop-1.2..tar.gz
  • 添加Hadoop环境变量
    export HADOOP_HOME=/usr/local/ae/hadoop-1.2.
    export HADOOP_HOME_WARN_SUPPRESS=
    export PATH=$PATH:$HADOOP_HOME/bin
  • 配置Hadoop
    core-site.xml是全局配置,hdfs-site.xml和mapred-site.xml分别是hdfs和mapred的局部配置

    修改$HADOOP_HOME/conf/hadoop-env.sh 添加JAVA_HOME

    export JAVA_HOME=/usr/local/ae/jdk1..0_51

    修改$HADOOP_HOME/conf/core-site.xml 加入以下文件到<configuration>节点

    <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/ae/storage/hadoop/temp</value>
    <description>A base for other temporary directories.</description>
    </property> <property>
    <name>fs.default.name</name>
    <value>hdfs://ae01:9000</value>
    <description>
    The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to
    determine the host, port, etc. for a filesystem.
    </description>
    </property>
    <property>
    <name>fs.checkpoint.period</name>
    <value>3600</value>
    <description>
    The number of seconds between two periodic checkpoints.
    </description>
    </property> <property>
    <name>fs.checkpoint.size</name>
    <value>67108864</value>
    <description>
    The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.
    </description>
    </property>

    修改$HADOOP_HOME/conf/hdfs-site.xml 加入以下文件到<configuration>节点

    <property>
    <name>dfs.name.dir</name>
    <value>/usr/local/ae/storage/hadoop/name</value>
    <description>
    Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
    </description>
    </property> <property>
    <name>dfs.data.dir</name>
    <value>/usr/local/ae/storage/hadoop/data</value>
    <description>]
    Determines where on the local filesystem the DFS name node should store the name table(fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
    </description>
    </property> <property>
    <name>dfs.http.address</name>
    <value>ae01:50070</value>
    <description>
    The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.
    </description>
    </property> <property>
    <name>dfs.permissions</name>
    <value>false</value>
    <description>
    If "true", enable permission checking in HDFS. If "false", permission checking is turned off,
    but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.
    </description>
    </property> <property>
    <name>dfs.replication</name>
    <value>1</value>
    <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
    </description>
    </property>

    修改$HADOOP_HOME/conf/mapred-site.xml 加入以下文件到<configuration>节点

    <property>
    <name>mapred.job.tracker</name>
    <value>ae01:9001</value>
    <description>
    The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task.
    </description>
    </property>

    修改$HADOOP_HOME/conf/masters

    ae01

    修改$HADOOP_HOME/conf/slaves

    ae01
    ae02
    ae03

    在上述配置中:
    fs.default.name的值hdfs://ae01:9000 用来决定NameNode
    mapred.job.tracker的值ae01:9001 用来决定JobTracker
    masters文件的值决定SecondaryNameNode
    slaves文件的值决定DataNode和TaskTracker

    创建文件目录/usr/local/ae/storage/hadoop, 并赋予hadoop 文件夹足够的权限

    user@ae01: ~$ /usr/local/ae$ sudo chmod 777 ./storage/hadoop/

   将配置好的Hadoop复制到ae02和ae03,并在ae02和ae03上创建文件目录/usr/local/ae/storage/hadoop

  • 初始化和启动Hadoop

    登录ae01,初始化

    user@ae01:~$ hadoop namenode -format

    启动Hadoop

    user@ae01:~$ start-all.sh

    使用jps查看java进程
    ae01

    user@ae01:/usr/local/ae$ jps
    26239 JobTracker
    26158 SecondaryNameNode
    36052 Jps
    26468 TaskTracker
    25687 NameNode
    25926 DataNode

    ae02

    user@ae02:~$ jps
    25021 Jps
    18999 TaskTracker
    18791 DataNode

    ae03

    user@ae03:~$ jps
    3901 DataNode
    9485 Jps
    4106 TaskTracker

Hadoop 分布式环境搭建的更多相关文章

  1. 攻城狮在路上(陆)-- hadoop分布式环境搭建(HA模式)

    一.环境说明: 操作系统:Centos6.5 Linux node1 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 ...

  2. [大数据学习研究] 3. hadoop分布式环境搭建

    1. Java安装与环境配置 Hadoop是基于Java的,所以首先需要安装配置好java环境.从官网下载JDK,我用的是1.8版本. 在Mac下可以在终端下使用scp命令远程拷贝到虚拟机linux中 ...

  3. WMware 中CentOS系统Hadoop 分布式环境搭建(一)——Hadoop安装环境准备

    1.创建3台虚拟机并装好系统,这里使用64位CentOS. 2.Ping测试[确保两两能ping通]: [ping xxx.xxx.xxx.xxx] 3.安装SSH:[yum install ssh ...

  4. Hadoop学习笔记(3)——分布式环境搭建

    Hadoop学习笔记(3) ——分布式环境搭建 前面,我们已经在单机上把Hadoop运行起来了,但我们知道Hadoop支持分布式的,而它的优点就是在分布上突出的,所以我们得搭个环境模拟一下. 在这里, ...

  5. Hadoop学习笔记1:伪分布式环境搭建

    在搭建Hadoop环境之前,请先阅读如下博文,把搭建Hadoop环境之前的准备工作做好,博文如下: 1.CentOS 6.7下安装JDK , 地址: http://blog.csdn.net/yule ...

  6. 【HADOOP】| 环境搭建:从零开始搭建hadoop大数据平台(单机/伪分布式)-下

    因篇幅过长,故分为两节,上节主要说明hadoop运行环境和必须的基础软件,包括VMware虚拟机软件的说明安装.Xmanager5管理软件以及CentOS操作系统的安装和基本网络配置.具体请参看: [ ...

  7. 【转】Hadoop HDFS分布式环境搭建

    原文地址  http://blog.sina.com.cn/s/blog_7060fb5a0101cson.html Hadoop HDFS分布式环境搭建 最近选择给大家介绍Hadoop HDFS系统 ...

  8. 【Hadoop】伪分布式环境搭建、验证

    Hadoop伪分布式环境搭建: 自动部署脚本: #!/bin/bash set -eux export APP_PATH=/opt/applications export APP_NAME=Ares ...

  9. 【Hadoop基础教程】4、Hadoop之完全分布式环境搭建

    上一篇blog我们完成了Hadoop伪分布式环境的搭建,伪分布式模式也叫单节点集群模式, NameNode.SecondaryNameNode.DataNode.JobTracker.TaskTrac ...

随机推荐

  1. 打开Genesis设置单位为mm

    打开Genesis界面: 点击Configuration: 可看到只要设置get_def_units的值即可: 打开C:\genesis\sys\config配置文件,在最后一行加入:get_def_ ...

  2. No space left on device 解决Linux系统磁盘空间满的办法

    最近Linux电脑在执行mvn时候总是报错: No space left on device   原因是磁盘空间满了,我马上加了20G的硬盘容量,但是还是报错,上网查了一下,发现了解决方法,我用了其中 ...

  3. uitableviewcell cell.accessoryType 右箭头

    实现右侧的小灰色箭头  只要将cell的accessoryType属性设置为 UITableViewCellAccessoryDisclosureIndicator就可以了. 代码为:cell.acc ...

  4. markdown思维导图笔记

  5. 三、jQuery--jQuery基础--jQuery基础课程--第9章 jQuery 常用插件

    1.表单验证插件——validate 该插件自带包含必填.数字.URL在内容的验证规则,即时显示异常信息,此外,还允许自定义验证规则,插件调用方法如下:$(form).validate({option ...

  6. Linux Shell 高级编程技巧2----shell工具

    2.shell工具    2.1.日志文件        简介            创建日志文件是很重要的,记录了重要的信息.一旦出现错误,这些信息对于我们排错是非常有用的:监控的信息也可以记录到日 ...

  7. python多线程之semaphore(信号量)

    #!/usr/bin/env python # -*- coding: utf-8 -*- import threading import time import random semaphore = ...

  8. SQL分组和聚合(Grouping and Aggregates)

    这章应该是难点,也是成为SQL高手的必经之路. 注意有GROUP 语句时,WHERE和HAVING的场合. 前者用于检索前的条件过滤 . 后者用于检索出来结果之后的条件过滤. ========== ; ...

  9. windows常用命令

    打开"运行"对话框(Win+R),输入cmd,打开控制台命令窗口... 也可以通过cmd /c 命令 和 cmd /k 命令的方式来直接运行命令 注:/c表示执行完命令后关闭cmd ...

  10. 攻城狮在路上(壹) Hibernate(七)--- 通过Hibernate操纵对象(下)

    一.与触发器协同工作: 当Hibernate与数据库的触发器协同工作时,会出现以下两类问题: 1.触发器使Session缓存中的数据和数据库中的不一致: 出现此问题的原因是触发器运行在数据库内,它执行 ...