Hadoop - [04] 分布式部署
Zookeeper的分布式部署 >> Hadoop的分布式部署
一、集群规划
| 主机名 | node01 | node02 | node03 |
| JDK | ○ | ○ | ○ |
| Zookeeper | ○ | ○ | ○ |
| NameNode | ○ | ○ | |
| JournalNode | ○ | ○ | ○ |
| DataNode | ○ | ○ | ○ |
| ResourceManager | ○ | ○ | |
| NodeManager | ○ | ○ | ○ |
二、安装部署
1、将 hadoop-2.5.2.tar.gz 上传到node01、node02、node03的 /opt/software目录下
2、将 hadoop-2.5.2.tar.gz 解压到 /opt/module 目录下
[root@node01 software]# tar -zxvf hadoop-2.5.2.tar.gz -C /opt/module/
hadoop-2.5.2/
hadoop-2.5.2/bin/
hadoop-2.5.2/bin/hadoop
hadoop-2.5.2/bin/hdfs
hadoop-2.5.2/bin/mapred
hadoop-2.5.2/bin/yarn.cmd
hadoop-2.5.2/bin/hadoop.cmd
hadoop-2.5.2/bin/hdfs.cmd
hadoop-2.5.2/bin/mapred.cmd
......
[root@node01 software]#
3、修改配置文件/opt/module/hadoop-2.5.2/etc/hadoop/hadoop-env.sh
......
# 配置JDK
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.372.b07-1.el7_9.x86_64/
......
# 定义一些变量
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root
export HDFS_JOURNALNODE_USER=root
export HDFS_ZKFC_USER=root
注意:hadoop-env.sh脚本里必须指定JAVA_HOME的具体的值,要不然后面使用start-dfs.sh和stop-dfs.sh就会无效。
4、修改配置文件/opt/module/hadoop-2.5.2/etc/hadoop/core-site.xml

5、修改配置文件/opt/module/hadoop-2.5.2/etc/hadoop/hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 副本数默认为3副本 -->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!-- 关闭权限检查 -->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<!-- dfs.namenode.name.dir:namenode的目录放的路径在hadoop.tmp.dir之上做了修改
file://${hadoop.tmp.dir}/dfs/name
dfs.datanode.data.dir:namenode的目录放的路径在hadoop.tmp.dir之上做了修改
file://${hadoop.tmp.dir}/dfs/data
-->
<!-- 为nameservice起一个别名 -->
<property>
<name>dfs.nameservices</name>
<value>nameservice1</value>
</property>
<property>
<name>dfs.ha.namenodes.nameservice1</name>
<value>nn1,nn2</value>
</property>
<!-- Active NN -->
<property>
<name>dfs.namenode.rpc-address.nameservice1.nn1</name>
<value>node01:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservice1.nn1</name>
<value>node01:9870</value>
</property>
<!-- Secondary NN -->
<property>
<name>dfs.namenode.rpc-address.nameservice1.nn2</name>
<value>node02:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.nameservice1.nn2</name>
<value>node02:9870</value>
</property>
<!-- Journalnode列表: 负责Hadoop与Zookeeper进行沟通 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node01:8485;node02:8485;node03:8485/nameservice1</value>
</property>
<!-- 自动切换namenode -->
<property>
<name>dfs.client.failover.proxy.provider.nameservice1</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- Journal的存储位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/data/hadoop/data/journal/</value>
</property>
<!-- 启动故障转移 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- SSH免密码登录 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
</configuration>
dfs.replication:副本数、
dfs.permissions.enabled:权限检查,true:在HDFS中启用权限检查.false:关闭权限检查
dfs.nameservices:以逗号分隔的主节点列表(namenode组)
dfs.namenode.rpc-address.nameservice1.nn1:处理所有客户端请求的RPC地址
dfs.namenode.http-address.nameservice1.nn1:DFS NameNodeWebUI将侦听的地址和基本端口。
dfs.namenode.rpc-address.nameservice1.nn2:处理所有客户端请求的RPC地址
dfs.namenode.http-address.nameservice1.nn2:DFS NameNodeWebUI将侦听的地址和基本端口。
dfs.namenode.shared.edits.dir:HA集群中多个namenodes之间共享存储的目录。此目录将由活动目录写入,由备用目录读取,以保持名称空间同步。在非HA集群中保持为空。
dfs.client.failover.proxy.provider.nameservice1: 主机配置的故障转移代理提供程序的类名的前缀(加上所需的名称服务ID)。有关更详细的信息,请参阅HDFS高可用性文档的“配置详细信息”部分。
dfs.journalnode.edits.dir:存储日志编辑文件的目录。
dfs.ha.automatic-failover.enabled:是否启用自动故障转移。
dfs.ha.fencing.methods:免密登录
dfs.ha.fencing.ssh.private-key-files:私钥目录(/root/.ssh/id_rsa)
6、 修改配置文件:/opt/module/hadoop-2.5.2/etc/hadoop/mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- yarn -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- ctrl+shift+/ -->
<!-- <property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property> -->
<!-- 一旦启动了yarn,建议换成必须设置最大内存 -->
<property>
<name>mapreduce.map.memory.mb</name>
<value>200</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx200M</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>200</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx200M</value>
</property>
</configuration>
7、修改配置文件: /opt/module/hadoop-2.5.2/etc/hadoop/yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 配置yarn -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
<!-- 配置Yarn开启高可用 -->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 配置Yarn服务的名称 -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yarn1</value>
</property>
<!-- ResourceManager列表 -->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 配置ResourceManager主机名以及webapp的端口号 -->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node01</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm1</name>
<value>node01:8088</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node02</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address.rm2</name>
<value>node02:8088</value>
</property>
<!-- 指定Yarn所依赖的Zookeeper集群列表 -->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node01:2181,node02:2181,node03:2181</value>
</property>
</configuration>
设置成
hadoop.zk.address之后,sbin/start-yarn.sh和sbin/stop-yarn.sh会失效,所以需要设置成:yarn.resourcemanager.zk-address
8、将解压后的hadoop-2.5.2分发到node02、node03节点的/opt/module目录下
[root@node01 module]# scp -r -p hadoop-2.5.2/ root@node02:$PWD
[root@node01 module]# scp -r -p hadoop-2.5.2/ root@node03:$PWD
首次启动初始化
1、启动zookeeper(在node01、node02、node03节点执行)
cd /opt/module/zookeeper-3.4.5/bin
./zkServer.sh restart
2、启动journalnode(在node01、node02、node03节点执行)
sbin/hadoop-daemon.sh start journalnode
脚本路径:
/opt/module/hadoop-2.5.2/sbin/hadoop-daemon.sh
3、格式化namenode
[root@node01 hadoop-2.5.2]# bin/hdfs namenode -format
23/06/06 01:02:22 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = node01/192.168.56.121
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.5.2
......
脚本路径:
/opt/module/hadoop-2.5.2/bin/hdfs
4、将格式化后的元数据分发到ctos79-02(2nn)的 /data/hadoop/data目录下

5、启动ctos79-01节点的NameNode(nn)

6、在ctos79-02节点执行bin/hdfs namenode -bootstrapStandby

注意:在命令执行之后会有提示输入的地方,输入Y即可。(应该是前面从ctos79-01分发了hadoop-root目录导致的。)

7、 启动node02的namenode(2nn)
# 启动namenode
[root@node02 hadoop-2.5.2]# sbin/hadoop-daemon.sh start namenode
starting namenode, logging to /opt/module/hadoop-2.5.2/logs/hadoop-root-namenode-node02.out
[root@node02 hadoop-2.5.2]# jps
2080 Jps
1795 JournalNode
2009 NameNode
1647 QuorumPeerMain
[root@node02 hadoop-2.5.2]#
8、在其中一个zookeeper节点初始化 ZKFC(注意:在zookeeper服务运行的情况下执行该操作)

9、群起/停 namenode、journalnode、datanode
[root@node01 hadoop-2.5.2]# sbin/start-dfs.sh
Starting namenodes on [node01 node02]
node01: namenode running as process 4059. Stop it first.
node02: namenode running as process 2752. Stop it first.
localhost: datanode running as process 3905. Stop it first.
Starting journal nodes [node01 node02 node03]
node03: journalnode running as process 1786. Stop it first.
node01: journalnode running as process 2032. Stop it first.
node02: journalnode running as process 1795. Stop it first.
Starting ZK Failover Controllers on NN hosts [node01 node02]
node01: starting zkfc, logging to /opt/module/hadoop-2.5.2/logs/hadoop-root-zkfc-node01.out
node02: starting zkfc, logging to /opt/module/hadoop-2.5.2/logs/hadoop-root-zkfc-node02.out
[root@node01 hadoop-2.5.2]#
[root@node01 hadoop-2.5.2]#
[root@node01 hadoop-2.5.2]#
[root@node01 hadoop-2.5.2]# sbin/stop-dfs.sh
Stopping namenodes on [node01 node02]
node01: no namenode to stop
node02: stopping namenode
localhost: stopping datanode
Stopping journal nodes [node01 node02 node03]
node01: stopping journalnode
node02: stopping journalnode
node03: stopping journalnode
Stopping ZK Failover Controllers on NN hosts [node01 node02]
node01: stopping zkfc
node02: stopping zkfc
[root@node01 hadoop-2.5.2]#
10、启停yarn(只能启动当前节点上yarn相关组件:ResourceManager、NodeManager)
[root@node01 hadoop-2.5.2]# sbin/stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
no proxyserver to stop
[root@node01 hadoop-2.5.2]# jps
6579 DataNode
6933 DFSZKFailoverController
1943 QuorumPeerMain
6487 NameNode
6759 JournalNode
8314 Jps
[root@node01 hadoop-2.5.2]#
[root@node01 hadoop-2.5.2]#
[root@node01 hadoop-2.5.2]# sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/module/hadoop-2.5.2/logs/yarn-root-resourcemanager-node01.out
localhost: starting nodemanager, logging to /opt/module/hadoop-2.5.2/logs/yarn-root-nodemanager-node01.out
[root@node01 hadoop-2.5.2]# jps
6579 DataNode
6933 DFSZKFailoverController
8470 NodeManager
1943 QuorumPeerMain
6487 NameNode
6759 JournalNode
8366 ResourceManager
8590 Jps
[root@node01 hadoop-2.5.2]#
11、群起/停 hdfs和yarn的脚本
sbin/start-all.sh = sbin/start-dfs.sh + sbin/start-yarn.sh
遇到的问题
1、使用start-dfs.sh、start-yarn.sh发现nodeManager和Datanode都没起来
问题解决:发现是 /opt/module/hadoop-3.3.6/etc/hadoop/workers中未配置hadoop集群三个节点的主机名。
页面


— 要养成终生学习的习惯 —
Hadoop - [04] 分布式部署的更多相关文章
- ubuntu下hadoop完全分布式部署
三台机器分别命名为: hadoop-master ip:192.168.0.25 hadoop-slave1 ip:192.168.0.26 hadoop-slave2 ip:192.168.0.27 ...
- Hadoop 完全分布式部署
完全分布式部署Hadoop 分析: 1)准备3台客户机(关闭防火墙.静态ip.主机名称) 2)安装jdk 3)配置环境变量 4)安装hadoop 5)配置环境变量 6)安装ssh 7)集群时间同步 7 ...
- Hadoop 完全分布式部署(三节点)
用来测试,我在VMware下用Centos7搭起一个三节点的Hadoop完全分布式集群.其中NameNode和DataNode在同一台机器上,如果有条件建议大家把NameNode单独放在一台机器上,因 ...
- Hadoop伪分布式部署
一.Hadoop组件依赖关系: 步骤 1)关闭防火墙和禁用SELinux 切换到root用户 关闭防火墙:service iptables stop Linux下开启/关闭防火墙的两种方法 1.永久性 ...
- ubuntu hadoop伪分布式部署
环境 ubuntu hadoop2.8.1 java1.8 1.配置java1.8 2.配置ssh免密登录 3.hadoop配置 环境变量 配置hadoop环境文件hadoop-env.sh core ...
- hadoop完全分布式部署
1.我们先看看一台节点的hdfs的信息:(已经安装了hadoop的虚拟机:安装hadoophttps://www.cnblogs.com/lyx666/p/12335360.html) start-d ...
- Hadoop+HBase分布式部署
test 版本选择
- Apache Hadoop 2.9.2 完全分布式部署
Apache Hadoop 2.9.2 完全分布式部署(HDFS) 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.环境准备 1>.操作平台 [root@node101.y ...
- Hadoop生态圈-zookeeper完全分布式部署
Hadoop生态圈-zookeeper完全分布式部署 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 本篇博客部署是建立在Hadoop高可用基础之上的,关于Hadoop高可用部署请参 ...
- Hadoop生态圈-phoenix完全分布式部署以及常用命令介绍
Hadoop生态圈-phoenix完全分布式部署 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. phoenix只是一个插件,我们可以用hive给hbase套上一个JDBC壳,但是你 ...
随机推荐
- Python中指数概率分布函数的绘图详解
在数据科学和统计学中,指数分布是一种应用广泛的连续概率分布,通常用于建模独立随机事件发生的时间间隔.通过Python,我们可以方便地计算和绘制指数分布的概率密度函数(PDF).本文将详细介绍指数分布的 ...
- linux服务器清理指定日期所有垃圾日志文件
阿里云服务器推荐购买99元 硬盘再大,也架不住日志文件多. 1.找到并清除30天前的所有日志文件. find / -name "*.log" -mtime +30 -exec rm ...
- Qt数据库应用1-数据导入导出csv
一.前言 在经历过大大小小十几个甚至几十个纯QtWidget项目后,涉及到数据库相关的项目,几乎都有一个需求,将少量的信息数据比如设备信息.防区信息等,导出到文件保存好,然后用户可以打开该表格进行编辑 ...
- Qt编写地图综合应用46-设备点位(添加、删除、清空、重置)
一.前言 在学习JS语法的时候发现其实程序都大同小异,正所谓一通百通,熟悉各大概的语法以后基本都可以上手,和C++最大的不同就是他没有数据类型的概念,作为解释性的语言,是在执行的时候自动去转换数据类型 ...
- Eclipse生成javadoc方法与错误解决
1.Eclipse生成javadoc方法与错误解决 2.Eclipse自动生成文件注释以及使用javadoc命令自动生成API文档 3.maven配置生成java doc文档中文乱码问题解决方案 4. ...
- Windows 配置自动更新重启策略
I. 打开策略编辑器 [Win + R]打开 "运行" 窗口,输入: gpedit.msc 打开"本地组策略编辑器". II. 设置不自动重启 启用策略,选择在 ...
- AICA第6期-学习笔记汇总
AICA第6期-学习笔记汇总 AICA第六期|预科班课程 1.<跨上AI的战车> 2.<产业中NLP任务的技术选型与落地> 3.<计算机视觉产业落地挑战与应对> 4 ...
- Optional的使用与解析
引言 今天在项目中看到了大量Optional的使用,之前我也了解过Optional,是Java8中的新特性,并且便利地为空指针问题提供了处理方法,可以避免繁琐的if/else. 但是并没有真正在项目中 ...
- Solution -「NOI 2017」「洛谷 P3825」游戏
\(\mathscr{Description}\) Link. 给大家看个乐子: link, 懒得概括题意啦. \(\mathscr{Solution}\) 对于没有 X 的情况, 显然可 ...
- 一些devops、软件工程的个人感悟
1.devops不是简单的工具,是思想. (1)devops核心在于快速编译构建.自动测试化.自动部署发布 (2)工具只是辅助手段,无论是Jenkins.腾讯蓝盾等等,甚至是手动bat+bash搭建, ...