Hadoop2.x 集群搭建

一些重复的细节参考Hadoop1.X集群完全分布式模式环境部署

1 HADOOP 集群搭建

1.1 集群简介

HADOOP 集群具体来说包含两个集群:HDFS 集群和YARN集群,两者逻辑上分离,但物理上常在一起.

  • HDFS集群:负责海量数据的存储,集群中的角色主要有 NameNode / DataNode
  • YARN集群:负责海量数据运算时的资源调度,集群中的角色主要有 ResourceManager /NodeManager

本集群搭建案例,以 5 节点为例进行搭建,角色分配如下:

结点 角色 IP
node1 NameNode
SecondaryNameNode
192.168.33.200
node2 ResourceManager 192.168.33.201
node3 DataNode
NodeManager
192.168.33.202
node4 DataNode
NodeManager
192.168.33.203
node5 DataNode
NodeManager
192.168.33.204

部署图如下:

1.2 服务器准备

本案例使用虚拟机服务器来搭建 HADOOP 集群,所用软件及版本:

★ paraller Desktop 12

★ Centos 6.5 64bit

1.3 网络环境准备

  • 采用 NAT 方式联网
  • 网关地址:192.168.33.1
  • 5个服务器节点 IP 地址:
    • 192.168.33.200,
    • 192.168.33.201,
    • 192.168.33.202,
    • 192.168.33.203,
    • 192.168.33.204
  • 子网掩码:255.255.255.0

1.4 服务器系统设置

  • 添加 HADOOP 用户
  • 为 HADOOP 用户分配 sudoer 权限
  • 设置主机名
    • node1
    • node2
    • node3
    • node4
    • node5
  • 配置内网域名映射:
    • 192.168.33.200--------node1
    • 192.168.33.201--------node2
    • 192.168.33.202--------node3
    • 192.168.33.203--------node4
    • 192.168.33.204--------node5
  • 配置 ssh 免密登陆
  • 配置防火墙

1.5 环境安装

  • 上传 jdk 安装包
  • 规划安装目录 /home/hadoop/apps/jdk_1.7.65
  • 解压安装包
  • 配置环境变量 /etc/profile

1.6 HADOOP 安装部署

  • 上传 HADOOP 安装包
  • 规划安装目录 /home/hadoop/apps/hadoop-2.6.1
  • 解压安装包
  • 修改配置文件 $HADOOP_HOME/etc/hadoop/

最简化配置如下:

vi hadoop-env.sh

/home/hd2/tmp目录要先建好

# The java implementation to use.

export JAVA_HOME=/usr/local/jdk1.7.0_65

vi core-site.xml

<configuration> 

<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:9000</value>
</property> <property>
<name>hadoop.tmp.dir</name>
<value>/home/hd2/tmp</value>
</property> <property>
<name>hadoop.logfile.size</name>
<value>10000000</value>
<description>The max size of each log file</description>
</property> <property>
<name>hadoop.logfile.count</name>
<value>10</value>
<description>The max number of log files</description>
</property> </configuration>

vi hdfs-site.xml

<configuration>

<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hd2/data/name</value>
</property> <property>
<name>dfs.datanode.data.dir</name>
<value>/home/hd2/data/data</value>
</property> <property>
<name>dfs.replication</name>
<value>3</value>
</property> <property>
<name>dfs.secondary.http.address</name>
<value>node1:50090</value>
</property>
ca </configuration>

vi mapred-site.xml

<configuration> 

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property> </configuration>

vi yarn-site.xml

<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property> <property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

vi salves

node1
node2
node3
node4
node5

1.7 启动集群

初始化 HDFS

bin/hadoop namenode -format

启动 HDFS

sbin/start-dfs.sh

启动 YARN

sbin/start-yarn.sh

1.8 验证集群

浏览器访问http://192.168.33.200:50070



1.9 用worldcount程序测试集群

1.建立一个测试的目录

[hd2@node1 hadoop-2.4.1]$ hadoop fs -mkdir input

2.检验input文件夹是否创建成功

[hd2@node1 hadoop-2.4.1]$ hadoop fs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2014-08-18 09:02 input

3.建立测试文件

[hd2@node1 hadoop-2.4.1]$ vi test.txt

hello hadoop


hello World


Hello Java


Hey man


i am a programmer

4.将测试文件放到测试目录中

[hd2@node1 hadoop-2.4.1]$ hadoop fs -put test.txt input/

5.检验test.txt文件是否已经导入

[hd2@node1 hadoop-2.4.1]$ hadoop fs -ls input/
Found 1 items
-rw-r--r-- 1 root supergroup 62 2014-08-18 09:03 input/test.txt

6.执行wordcount程序

[hd2@node1 hadoop-2.4.1]$ hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount input/ output/

执行过程

17/04/19 21:07:19 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.33.200:8032
17/04/19 21:07:19 INFO input.FileInputFormat: Total input paths to process : 2
17/04/19 21:07:20 INFO mapreduce.JobSubmitter: number of splits:2
17/04/19 21:07:20 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1492605823444_0003
17/04/19 21:07:20 INFO impl.YarnClientImpl: Submitted application application_1492605823444_0003
17/04/19 21:07:20 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1492605823444_0003/
17/04/19 21:07:20 INFO mapreduce.Job: Running job: job_1492605823444_0003
17/04/19 21:07:26 INFO mapreduce.Job: Job job_1492605823444_0003 running in uber mode : false
17/04/19 21:07:26 INFO mapreduce.Job: map 0% reduce 0%
17/04/19 21:07:33 INFO mapreduce.Job: map 100% reduce 0%
17/04/19 21:07:40 INFO mapreduce.Job: map 100% reduce 100%
17/04/19 21:07:42 INFO mapreduce.Job: Job job_1492605823444_0003 completed successfully
17/04/19 21:07:42 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=68
FILE: Number of bytes written=279333
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=246
HDFS: Number of bytes written=25
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=8579
Total time spent by all reduces in occupied slots (ms)=5101
Total time spent by all map tasks (ms)=8579
Total time spent by all reduce tasks (ms)=5101
Total vcore-seconds taken by all map tasks=8579
Total vcore-seconds taken by all reduce tasks=5101
Total megabyte-seconds taken by all map tasks=8784896
Total megabyte-seconds taken by all reduce tasks=5223424
Map-Reduce Framework
Map input records=2
Map output records=6
Map output bytes=62
Map output materialized bytes=74
Input split bytes=208
Combine input records=6
Combine output records=5
Reduce input groups=3
Reduce shuffle bytes=74
Reduce input records=5
Reduce output records=3
Spilled Records=10
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=430
CPU time spent (ms)=1550
Physical memory (bytes) snapshot=339206144
Virtual memory (bytes) snapshot=1087791104
Total committed heap usage (bytes)=242552832
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=38
File Output Format Counters
Bytes Written=25

执行结果

[hd2@node1 hadoop-2.4.1]$ hadoop fs -ls /user/hd2/out/
Found 2 items
-rw-r--r-- 3 hd2 supergroup 0 2017-04-19 21:07 /user/hd2/out/_SUCCESS
-rw-r--r-- 3 hd2 supergroup 25 2017-04-19 21:07 /user/hd2/out/part-r-00000
[hd2@node1 hadoop-2.4.1]$ hadoop fs -cat /user/hd2/out/part-r-00000
hadoop 2
hello 3
world 1

Hadoop2.x 集群搭建的更多相关文章

  1. Hadoop2.20集群搭建

    hadoop2.0已经发布了稳定版本了,增加了很多特性,比如HDFS HA.YARN等. 注意:apache提供的hadoop-2.2.0的安装包是在32位操作系统编译的,因为hadoop依赖一些C+ ...

  2. hadoop2.2集群搭建问题只能启动一个datanode问题

    按照教程http://cn.soulmachine.me/blog/20140205/搭建总是出现如下问题: 2014-04-13 23:53:45,450 INFO org.apache.hadoo ...

  3. 大数据学习——hadoop2.x集群搭建

    1.准备Linux环境 1.0先将虚拟机的网络模式选为NAT 1.1修改主机名 vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=itcast ### ...

  4. Hadoop2学习记录(1) |HA完全分布式集群搭建

    准备 系统:CentOS 6或者RedHat 6(这里用的是64位操作) 软件:JDK 1.7.hadoop-2.3.0.native64位包(可以再csdn上下载,这里不提供了) 部署规划 192. ...

  5. centos下hadoop2.6.0集群搭建详细过程

    一 .centos集群环境配置 1.创建一个namenode节点,5个datanode节点 主机名 IP namenodezsw 192.168.129.158 datanode1zsw 192.16 ...

  6. 懒人记录 Hadoop2.7.1 集群搭建过程

    懒人记录 Hadoop2.7.1 集群搭建过程 2016-07-02 13:15:45 总结 除了配置hosts ,和免密码互连之外,先在一台机器上装好所有东西 配置好之后,拷贝虚拟机,配置hosts ...

  7. hadoop2.2.0的ha分布式集群搭建

    hadoop2.2.0 ha集群搭建 使用的文件如下:    jdk-6u45-linux-x64.bin    hadoop-2.2.0.x86_64.tar    zookeeper-3.4.5. ...

  8. Hadoop2.0 HA集群搭建步骤

    上一次搭建的Hadoop是一个伪分布式的,这次我们做一个用于个人的Hadoop集群(希望对大家搭建集群有所帮助): 集群节点分配: Park01 Zookeeper NameNode (active) ...

  9. hadoop2.6.0集群搭建

    p.MsoNormal { margin: 0pt; margin-bottom: .0001pt; text-align: justify; font-family: Calibri; font-s ...

随机推荐

  1. 【转】目前为止最透彻的的Netty高性能原理和框架架构解析

    转自:https://zhuanlan.zhihu.com/p/48591893 1.引言 Netty 是一个广受欢迎的异步事件驱动的Java开源网络应用程序框架,用于快速开发可维护的高性能协议服务器 ...

  2. 内存映射文件MappedByteBuffer和Buffer的Scattering与Gathering

    上一篇讲到的DirectByteBuffer继承自MappedByteBuffer 一.MappedByteBuffer MappedByteBuffer的定义: A direct byte buff ...

  3. Mesa: GeoReplicated, Near RealTime, Scalable Data Warehousing

    Mesa的定义并没有反映出他的特点,因为分布式,副本,高可用,他都是依赖google的其他基础设施完成的 他最大的特点是,和传统数仓比,可以做到near real-time的返回聚合的查询结果 算入实 ...

  4. jeecg数据库切换至mysql8.0方式

    1.修改pom.xml   mysql版本 <mysql.version>8.0.11</mysql.version> 2.修改dbconfig.properties文件 hi ...

  5. (转) centos7 RPM包之rpm命令

    原文:https://blog.csdn.net/capecape/article/details/78529159 RPM包与源码包的区别1.软件包分类 源码包:C源代码包 rpm包:编译之后的二进 ...

  6. MySQL导数据笔记

    2019-12-16 9:08:43 星期一 MySQL 5.6 limit / order 有bug, 如果主键不是自增的, 只能全表导出导入, 增量导入导出的话会报主键重复 触发器: 批量导入数据 ...

  7. RunTime总结:

    oc动态性, 运行时将代码转化为runtime的C代码 RunTime运行流程: 生成对应objc_msgSend方法 isa指针查看当前类有没有这个方法, 之后寻找父类, 每个方法SEL(方法选择器 ...

  8. shell基础知识7-字段分隔符与迭代器

    什么是内部字段分隔符 内部字段分隔符(Internal Field Separator,IFS)是shell脚本编程中的一个重要概念.在处理 文本数据时,它的作用可不小. 作为分隔符,IFS有其特殊用 ...

  9. OpenShift 4.2 离线安装补充记录

    OpenShift4.2详细安装参考同事王征的安装手册(感谢王征大师的研究和答疑解惑, 大坑文章都已经搞定了,我这里是一些小坑) https://github.com/wangzheng422/doc ...

  10. MySQL创建触发器的时候报1419错误( 1419 - You do not have the SUPER privilege and binary logging is enabled )

    mysql创建触发器的时候报错: 解决方法:第一步,用root用户登录:mysql -u root -p第二步,设置参数log_bin_trust_function_creators为1:set gl ...