Linux系统搭建Hadoop集群
一、环境说明
IP地址 | 主机名 | 备注 | 操作系统 |
---|---|---|---|
192.168.92.11 | hserver1 | namenode | Ubuntu 16.04 |
192.168.92.12 | hserver2 | datanode | Ubuntu 16.04 |
192.168.12.13 | hserver3 | datanode | Ubuntu 16.04 |
二、环境初始化
1. 关闭防火墙
如果使用CentOS系统搭建集群环境,需要将防火墙关闭。本文中使用Ubuntu操作系统,所以可以忽略此步骤。
2. 配置主机名
将三台机器的主机名分别配置为hserver1、hserver2、hserver3:
hostnamectl set-hostname hserver1
配置主机名后将主机名信息写入到hosts文件中:
root@hserver1:~# cat >> /etc/hosts << EOF
192.168.92.11 hserver1
192.168.92.12 hserver2
192.168.92.13 hserver3
EOF
3. 生成密钥并配置免密认证
首先在三台机器上生成密钥:
root@hserver1:~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -N ''
分别对三台机器进行免密配置:
root@hserver1:~# apt-get install sshpass -y
root@hserver1:~# for host in 192.168.92.{11..13} hserver{1..3}; do ssh-keyscan $host >>~/.ssh/known_hosts 2>/dev/null; done
root@hserver1:~# for host in 192.168.92.{11..13}; do sshpass -p'123456' ssh-copy-id root@$host &>/dev/null; done
三、安装JDK和Hadoop
1. 安装JDK
分别在三台机器上安装openjdk-8-jdk-headless:
root@hserver1:~# apt-get install openjdk-8-jdk-headless -y
查看Java版本:
root@hserver1:~# java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~16.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)
配置环境变量(/etc/prifile文件):
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin
2. 下载Hadoop
【注意】:以下所有的操作均需要在三台机器上进行
Hadoop的官网:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
这里下载的是2.9.2版本(二进制包),将下载好的包上传到服务器中并解压至/opt/hadoop目录下:
root@hserver1:~# mkdir /opt/hadoop
root@hserver1:~# tar zxf hadoop-2.9.2.tar.gz -C /opt/hadoop/
解压完成后在服务器中创建如下目录:
mkdir /usr/local/hadoop
mkdir /usr/local/hadoop/tmp
mkdir /usr/local/hadoop/var
mkdir /usr/local/hadoop/dfs
mkdir /usr/local/hadoop/dfs/name
mkdir /usr/local/hadoop/dfs/data
3. 配置Hadoop
首先需要修改/opt/hadoop/hadoop-2.9.2/etc/hadoop目录下的一些文件:
- 修改core-site.xml文件,在
<configuration>
块中添加如下配置项:
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hserver1:9000</value>
</property>
- 修改hdfs-site.xml文件,在
<configuration>
块中添加如下配置项:
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<description>need not permissions</description>
</property>
【注意】dfs.permissions.enabled配置为false后,可以允许不要检查权限就生成dfs上的文件,但是需要防止误删除,请将它设置为true,或者直接将该property节点删除,因为默认就是true。
- 将mapred-site.xml.template复制一份,命名为mapred-site.xml并修改该文件,在
<configuration>
块中添加如下配置项:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hserver1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hserver1:19888</value>
</property>
- 修改slave文件,将其中的localhost删除,添加如下内容:
hserver2
hserver3
- 修改yarn-site.xml文件,在
<configuration>
块中添加如下配置项:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hserver1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
<description>The address of the applications manager interface in the RM.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
<description>The address of the scheduler interface.</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hserver1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hserver1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
<description>The http address of the RM web application.</description>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1024</value>
<discription>每个节点可用内存,单位MB,默认8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
【注意】yarn.nodemanager.vmem-check-enabled这个的意思是忽略虚拟内存的检查,如果是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。如果是实体机上,并且内存够多,可以将这个配置去掉。
- 修改hadoop-env.sh脚本,将JAVA_HOME的环境变量修改为如下内容:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
4. 初始化Hadoop
因为hserver1是namenode,hserver2和hserver3都是datanode,所以只需要对hserver1进行初始化操作,也就是对hdfs进行格式化。进入到hserver1这台机器的/opt/hadoop/hadoop-2.9.2/bin目录,执行如下命令进行初始化:
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
20/05/06 16:19:26 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hserver1/192.168.92.11
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.9.2
STARTUP_MSG: classpath = /opt/hadoop/hadoop-2.9.2/etc/hadoop:/opt/hadoop/hadoop-
...
...
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 826afbeae31ca687bc2f8471dc841b66ed2c6704; compiled by 'ajisaka' on 2018-11-13T12:42Z
STARTUP_MSG: java = 1.8.0_252
************************************************************/
20/05/06 16:19:26 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
20/05/06 16:19:26 INFO namenode.NameNode: createNameNode [-format]
20/05/06 16:19:27 WARN common.Util: Path /usr/local/hadoop/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
20/05/06 16:19:27 WARN common.Util: Path /usr/local/hadoop/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-18e78322-4eac-4cf8-8b79-737a015623ca
20/05/06 16:19:27 INFO namenode.FSEditLog: Edit logging is async:true
20/05/06 16:19:27 INFO namenode.FSNamesystem: KeyProvider: null
20/05/06 16:19:27 INFO namenode.FSNamesystem: fsLock is fair: true
20/05/06 16:19:27 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
20/05/06 16:19:27 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)
20/05/06 16:19:27 INFO namenode.FSNamesystem: supergroup = supergroup
20/05/06 16:19:27 INFO namenode.FSNamesystem: isPermissionEnabled = false
20/05/06 16:19:27 INFO namenode.FSNamesystem: HA Enabled: false
20/05/06 16:19:27 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
20/05/06 16:19:27 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
20/05/06 16:19:27 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
20/05/06 16:19:27 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
20/05/06 16:19:27 INFO blockmanagement.BlockManager: The block deletion will start around 2020 May 06 16:19:27
20/05/06 16:19:27 INFO util.GSet: Computing capacity for map BlocksMap
20/05/06 16:19:27 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:27 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
20/05/06 16:19:27 INFO util.GSet: capacity = 2^21 = 2097152 entries
20/05/06 16:19:27 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
20/05/06 16:19:27 WARN conf.Configuration: No unit for dfs.heartbeat.interval(3) assuming SECONDS
20/05/06 16:19:27 WARN conf.Configuration: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
20/05/06 16:19:27 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
20/05/06 16:19:27 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
20/05/06 16:19:27 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
20/05/06 16:19:27 INFO blockmanagement.BlockManager: defaultReplication = 2
20/05/06 16:19:27 INFO blockmanagement.BlockManager: maxReplication = 512
20/05/06 16:19:27 INFO blockmanagement.BlockManager: minReplication = 1
20/05/06 16:19:27 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
20/05/06 16:19:27 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
20/05/06 16:19:27 INFO blockmanagement.BlockManager: encryptDataTransfer = false
20/05/06 16:19:27 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
20/05/06 16:19:27 INFO namenode.FSNamesystem: Append Enabled: true
20/05/06 16:19:28 INFO namenode.FSDirectory: GLOBAL serial map: bits=24 maxEntries=16777215
20/05/06 16:19:28 INFO util.GSet: Computing capacity for map INodeMap
20/05/06 16:19:28 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:28 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
20/05/06 16:19:28 INFO util.GSet: capacity = 2^20 = 1048576 entries
20/05/06 16:19:28 INFO namenode.FSDirectory: ACLs enabled? false
20/05/06 16:19:28 INFO namenode.FSDirectory: XAttrs enabled? true
20/05/06 16:19:28 INFO namenode.NameNode: Caching file names occurring more than 10 times
20/05/06 16:19:28 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: falseskipCaptureAccessTimeOnlyChange: false
20/05/06 16:19:28 INFO util.GSet: Computing capacity for map cachedBlocks
20/05/06 16:19:28 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:28 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
20/05/06 16:19:28 INFO util.GSet: capacity = 2^18 = 262144 entries
20/05/06 16:19:28 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
20/05/06 16:19:28 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
20/05/06 16:19:28 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
20/05/06 16:19:28 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
20/05/06 16:19:28 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
20/05/06 16:19:28 INFO util.GSet: Computing capacity for map NameNodeRetryCache
20/05/06 16:19:28 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:28 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
20/05/06 16:19:28 INFO util.GSet: capacity = 2^15 = 32768 entries
20/05/06 16:19:28 INFO namenode.FSImage: Allocated new BlockPoolId: BP-2038544107-192.168.92.11-1588753168340
20/05/06 16:19:28 INFO common.Storage: Storage directory /usr/local/hadoop/dfs/name has been successfully formatted.
20/05/06 16:19:28 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
20/05/06 16:19:28 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 322 bytes saved in 0 seconds .
20/05/06 16:19:28 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
20/05/06 16:19:28 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hserver1/192.168.92.11
************************************************************/
以上的内容中如果没有出现报错的话,代表初始化成功。初始化完成后在/usr/local/hadoop/dfs/name/目录下可以看到新增了一个current目录和一些文件:
root@hserver1:/usr/local/hadoop/dfs/name# tree /usr/local/hadoop/dfs/name/
/usr/local/hadoop/dfs/name/
└── current
├── fsimage_0000000000000000000
├── fsimage_0000000000000000000.md5
├── seen_txid
└── VERSION
1 directory, 4 files
5. 启动Hadoop
因为hserver1是namenode,hserver2和hserver3都是datanode,所以只需要在hserver1上执行启动命令:
root@hserver1:/usr/local/hadoop/dfs/name# cd /opt/hadoop/hadoop-2.9.2/sbin/
root@hserver1:/opt/hadoop/hadoop-2.9.2/sbin# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hserver1]
hserver1: starting namenode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-namenode-hserver1.out
hserver2: starting datanode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-datanode-hserver2.out
hserver3: starting datanode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-datanode-hserver3.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:EluzQS5IRZaQAqRlc2O+h1rOS7jfaBSNlmgKqeknA6c.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-secondarynamenode-hserver1.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.9.2/logs/yarn-root-resourcemanager-hserver1.out
hserver3: starting nodemanager, logging to /opt/hadoop/hadoop-2.9.2/logs/yarn-root-nodemanager-hserver3.out
hserver2: starting nodemanager, logging to /opt/hadoop/hadoop-2.9.2/logs/yarn-root-nodemanager-hserver2.out
第一次执行上面命令的时候需要进行交互,输入yes即可
6. 测试Hadoop
启动Hadoop后,需要测试是否启动成功
可以在浏览器中输入namenode的IP地址192.168.92.11:50070即可访问overview页面
访问192.168.92.11:8088即可访问cluster页面。
四、上传本地文件至HDFS文件系统
- 首先创建一个目录用于存放上传的文件
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hdfs dfs -mkdir /upload
# 如果需要创建多级目录可以使用-p选项
此时在namenode的overview页面→Utilities→Browse the file system选项中可以看到新创建的目录以及信息
将本地的/home/test.log文件上传到hdfs文件系统中
# 前面的路径为文件在服务器中的路径,后面的为hdfs中的路径
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hdfs dfs -put /home/test.log /upload
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hdfs dfs -ls /upload
Found 1 items
-rw-r--r-- 2 root supergroup 24 2020-05-07 14:47 /upload/test.log
- 上传完成后可以在浏览器中对应的目录下查看到该文件的信息
Linux系统搭建Hadoop集群的更多相关文章
- Linux下搭建Hadoop集群
本文地址: 1.前言 本文描述的是如何使用3台Hadoop节点搭建一个集群.本文中,使用的是三个Ubuntu虚拟机,并没有使用三台物理机.在使用物理机搭建Hadoop集群的时候,也可以参考本文.首先这 ...
- Linux上搭建Hadoop集群
本文将为初学者的搭建简单的伪分布式集群,将搭建一台虚拟机,用于学习Hadoop 工具:vm虚拟机,centOS7,jdk-8,Hadoop2.7,xftp,xshell 用户:在虚拟机中创建一个had ...
- Linux下搭建Hadoop集群(Centos7.0)
Hadoop集群安装 概述 集群 cluster,将很多任务进程分布到多台计算机上:通过联合使用多台计算机的存储.计算能力完成更庞大的任务.为了实现无限量的存储和计算能力,在生产环境中必须使用集群来满 ...
- linux系统搭建zookeeper集群
转载至:https://blog.csdn.net/weixin_38111957/article/details/82927878 一.引言 今天咱们就来搭建一下zookeeper集群,当然搭建集群 ...
- Linux环境下Hadoop集群搭建
Linux环境下Hadoop集群搭建 前言: 最近来到了武汉大学,在这里开始了我的研究生生涯.昨天通过学长们的耐心培训,了解了Hadoop,Hdfs,Hive,Hbase,MangoDB等等相关的知识 ...
- Linux 搭建Hadoop集群 成功
内容基于(自己的真是操作步骤编写) Linux 搭建Hadoop集群---Jdk配置 Linux 搭建Hadoop集群 ---SSH免密登陆 一:下载安装 Hadoop 1.1:下载指定的Hadoop ...
- Linux上安装Hadoop集群(CentOS7+hadoop-2.8.0)--------hadoop环境的搭建
Linux上安装Hadoop集群(CentOS7+hadoop-2.8.0)------https://blog.csdn.net/pucao_cug/article/details/71698903 ...
- 搭建Hadoop集群 (一)
上面讲了如何搭建Hadoop的Standalone和Pseudo-Distributed Mode(搭建单节点Hadoop应用环境), 现在我们来搭建一个Fully-Distributed Mode的 ...
- 使用Windows Azure的VM安装和配置CDH搭建Hadoop集群
本文主要内容是使用Windows Azure的VIRTUAL MACHINES和NETWORKS服务安装CDH (Cloudera Distribution Including Apache Hado ...
随机推荐
- Java随谈(一)魔术数字、常量和枚举
本文适合对 Java 或 C 有一些了解的用户阅读,推荐阅读时间15分钟. 导言 写这个系列的原因? 我曾经听过一种说法,如果不了解Liunx的网络通讯,就很难理解理解Java的IO:如果不知道Jav ...
- Leetcode-哈希表
136. 只出现一次的数字 https://leetcode-cn.com/problems/single-number/ 给定一个非空整数数组,除了某个元素只出现一次以外,其余每个元素均出现两次.找 ...
- 别人写的很好Arduino教材
原文来自:https://www.arduino.cn/thread-31720-1-1.html 上一篇:Arduino教程--通过 库管理器 添加库 http://www.arduino.cn/t ...
- C++extern关键字理解
extern是一种"外部声明"的关键字,字面意思就是在此处声明某种变量或函数,在外部定义. 下面的示意图是我的理解. extern关键字的主要作用是扩大变量/函数的作用域,使得其它 ...
- 【题解】 P2734 [USACO3.3]游戏 A Game
\(\color{purple}{Link}\) \(\text{Solution:}\) 考虑区间\([l,r]\)的最优解.显然它可以由\([l+1,r]\)或\([l,r-1]\)转移而来.至此 ...
- JavaScript写秒表
1.HTML部分 <div id="div1"> <span id="hour">00</span> <span> ...
- lua 1.0 源码分析 -- 总结
读完 lua1.0 的源码感触:1. 把复杂的代码写简单2. pack 的内存回收3. hash 实现简单,但是应该可以改进,看高版本的代码怎么实现4. lua 初始化环境做了什么,就是一组全局变量初 ...
- day32 Pyhton 模块02复习 序列化
一. 什么是序列化 在我们存储数据或者网络传输数据的时候. 需要对我们的对象进行处理. 把对象处理成方便存储和传输的数据格式. 这个过程叫序列化 不同的序列化, 结果也不同. 但是目的是一样的. 都是 ...
- 接口管理平台Yapi
1.介绍 YApi 是由去哪儿移动架构组推出的一款开源项目,是高效.易用.功能强大的 api 管理平台,旨在为开发.产品.测试人员提供更优雅的接口管理服务. 官网:https://yapi.ymfe. ...
- P2340 [USACO03FALL]Cow Exhibition G题解
新的奇巧淫技 原题传送门 众所周知,模拟退火是一种很强大的算法,DP很强,但我模拟退火也不虚,很多题你如果不会的话基本可以拿来水很多分.比如这道题,我用模拟退火可以轻松水过(虽然我是足足交了两页才过) ...