Linux系统搭建Hadoop集群
一、环境说明
| IP地址 | 主机名 | 备注 | 操作系统 |
|---|---|---|---|
| 192.168.92.11 | hserver1 | namenode | Ubuntu 16.04 |
| 192.168.92.12 | hserver2 | datanode | Ubuntu 16.04 |
| 192.168.12.13 | hserver3 | datanode | Ubuntu 16.04 |
二、环境初始化
1. 关闭防火墙
如果使用CentOS系统搭建集群环境,需要将防火墙关闭。本文中使用Ubuntu操作系统,所以可以忽略此步骤。
2. 配置主机名
将三台机器的主机名分别配置为hserver1、hserver2、hserver3:
hostnamectl set-hostname hserver1
配置主机名后将主机名信息写入到hosts文件中:
root@hserver1:~# cat >> /etc/hosts << EOF
192.168.92.11 hserver1
192.168.92.12 hserver2
192.168.92.13 hserver3
EOF
3. 生成密钥并配置免密认证
首先在三台机器上生成密钥:
root@hserver1:~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -N ''
分别对三台机器进行免密配置:
root@hserver1:~# apt-get install sshpass -y
root@hserver1:~# for host in 192.168.92.{11..13} hserver{1..3}; do ssh-keyscan $host >>~/.ssh/known_hosts 2>/dev/null; done
root@hserver1:~# for host in 192.168.92.{11..13}; do sshpass -p'123456' ssh-copy-id root@$host &>/dev/null; done
三、安装JDK和Hadoop
1. 安装JDK
分别在三台机器上安装openjdk-8-jdk-headless:
root@hserver1:~# apt-get install openjdk-8-jdk-headless -y
查看Java版本:
root@hserver1:~# java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~16.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)
配置环境变量(/etc/prifile文件):
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin
2. 下载Hadoop
【注意】:以下所有的操作均需要在三台机器上进行
Hadoop的官网:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
这里下载的是2.9.2版本(二进制包),将下载好的包上传到服务器中并解压至/opt/hadoop目录下:
root@hserver1:~# mkdir /opt/hadoop
root@hserver1:~# tar zxf hadoop-2.9.2.tar.gz -C /opt/hadoop/
解压完成后在服务器中创建如下目录:
mkdir /usr/local/hadoop
mkdir /usr/local/hadoop/tmp
mkdir /usr/local/hadoop/var
mkdir /usr/local/hadoop/dfs
mkdir /usr/local/hadoop/dfs/name
mkdir /usr/local/hadoop/dfs/data
3. 配置Hadoop
首先需要修改/opt/hadoop/hadoop-2.9.2/etc/hadoop目录下的一些文件:
- 修改core-site.xml文件,在
<configuration>块中添加如下配置项:
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hserver1:9000</value>
</property>
- 修改hdfs-site.xml文件,在
<configuration>块中添加如下配置项:
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<description>need not permissions</description>
</property>
【注意】dfs.permissions.enabled配置为false后,可以允许不要检查权限就生成dfs上的文件,但是需要防止误删除,请将它设置为true,或者直接将该property节点删除,因为默认就是true。
- 将mapred-site.xml.template复制一份,命名为mapred-site.xml并修改该文件,在
<configuration>块中添加如下配置项:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hserver1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hserver1:19888</value>
</property>
- 修改slave文件,将其中的localhost删除,添加如下内容:
hserver2
hserver3
- 修改yarn-site.xml文件,在
<configuration>块中添加如下配置项:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hserver1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
<description>The address of the applications manager interface in the RM.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
<description>The address of the scheduler interface.</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hserver1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hserver1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
<description>The http address of the RM web application.</description>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>1024</value>
<discription>每个节点可用内存,单位MB,默认8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
【注意】yarn.nodemanager.vmem-check-enabled这个的意思是忽略虚拟内存的检查,如果是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。如果是实体机上,并且内存够多,可以将这个配置去掉。
- 修改hadoop-env.sh脚本,将JAVA_HOME的环境变量修改为如下内容:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
4. 初始化Hadoop
因为hserver1是namenode,hserver2和hserver3都是datanode,所以只需要对hserver1进行初始化操作,也就是对hdfs进行格式化。进入到hserver1这台机器的/opt/hadoop/hadoop-2.9.2/bin目录,执行如下命令进行初始化:
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
20/05/06 16:19:26 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hserver1/192.168.92.11
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.9.2
STARTUP_MSG: classpath = /opt/hadoop/hadoop-2.9.2/etc/hadoop:/opt/hadoop/hadoop-
...
...
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 826afbeae31ca687bc2f8471dc841b66ed2c6704; compiled by 'ajisaka' on 2018-11-13T12:42Z
STARTUP_MSG: java = 1.8.0_252
************************************************************/
20/05/06 16:19:26 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
20/05/06 16:19:26 INFO namenode.NameNode: createNameNode [-format]
20/05/06 16:19:27 WARN common.Util: Path /usr/local/hadoop/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
20/05/06 16:19:27 WARN common.Util: Path /usr/local/hadoop/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-18e78322-4eac-4cf8-8b79-737a015623ca
20/05/06 16:19:27 INFO namenode.FSEditLog: Edit logging is async:true
20/05/06 16:19:27 INFO namenode.FSNamesystem: KeyProvider: null
20/05/06 16:19:27 INFO namenode.FSNamesystem: fsLock is fair: true
20/05/06 16:19:27 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
20/05/06 16:19:27 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)
20/05/06 16:19:27 INFO namenode.FSNamesystem: supergroup = supergroup
20/05/06 16:19:27 INFO namenode.FSNamesystem: isPermissionEnabled = false
20/05/06 16:19:27 INFO namenode.FSNamesystem: HA Enabled: false
20/05/06 16:19:27 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
20/05/06 16:19:27 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
20/05/06 16:19:27 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
20/05/06 16:19:27 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
20/05/06 16:19:27 INFO blockmanagement.BlockManager: The block deletion will start around 2020 May 06 16:19:27
20/05/06 16:19:27 INFO util.GSet: Computing capacity for map BlocksMap
20/05/06 16:19:27 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:27 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
20/05/06 16:19:27 INFO util.GSet: capacity = 2^21 = 2097152 entries
20/05/06 16:19:27 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
20/05/06 16:19:27 WARN conf.Configuration: No unit for dfs.heartbeat.interval(3) assuming SECONDS
20/05/06 16:19:27 WARN conf.Configuration: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
20/05/06 16:19:27 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
20/05/06 16:19:27 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
20/05/06 16:19:27 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
20/05/06 16:19:27 INFO blockmanagement.BlockManager: defaultReplication = 2
20/05/06 16:19:27 INFO blockmanagement.BlockManager: maxReplication = 512
20/05/06 16:19:27 INFO blockmanagement.BlockManager: minReplication = 1
20/05/06 16:19:27 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
20/05/06 16:19:27 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
20/05/06 16:19:27 INFO blockmanagement.BlockManager: encryptDataTransfer = false
20/05/06 16:19:27 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
20/05/06 16:19:27 INFO namenode.FSNamesystem: Append Enabled: true
20/05/06 16:19:28 INFO namenode.FSDirectory: GLOBAL serial map: bits=24 maxEntries=16777215
20/05/06 16:19:28 INFO util.GSet: Computing capacity for map INodeMap
20/05/06 16:19:28 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:28 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
20/05/06 16:19:28 INFO util.GSet: capacity = 2^20 = 1048576 entries
20/05/06 16:19:28 INFO namenode.FSDirectory: ACLs enabled? false
20/05/06 16:19:28 INFO namenode.FSDirectory: XAttrs enabled? true
20/05/06 16:19:28 INFO namenode.NameNode: Caching file names occurring more than 10 times
20/05/06 16:19:28 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: falseskipCaptureAccessTimeOnlyChange: false
20/05/06 16:19:28 INFO util.GSet: Computing capacity for map cachedBlocks
20/05/06 16:19:28 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:28 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
20/05/06 16:19:28 INFO util.GSet: capacity = 2^18 = 262144 entries
20/05/06 16:19:28 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
20/05/06 16:19:28 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
20/05/06 16:19:28 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
20/05/06 16:19:28 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
20/05/06 16:19:28 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
20/05/06 16:19:28 INFO util.GSet: Computing capacity for map NameNodeRetryCache
20/05/06 16:19:28 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:28 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
20/05/06 16:19:28 INFO util.GSet: capacity = 2^15 = 32768 entries
20/05/06 16:19:28 INFO namenode.FSImage: Allocated new BlockPoolId: BP-2038544107-192.168.92.11-1588753168340
20/05/06 16:19:28 INFO common.Storage: Storage directory /usr/local/hadoop/dfs/name has been successfully formatted.
20/05/06 16:19:28 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
20/05/06 16:19:28 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 322 bytes saved in 0 seconds .
20/05/06 16:19:28 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
20/05/06 16:19:28 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hserver1/192.168.92.11
************************************************************/
以上的内容中如果没有出现报错的话,代表初始化成功。初始化完成后在/usr/local/hadoop/dfs/name/目录下可以看到新增了一个current目录和一些文件:
root@hserver1:/usr/local/hadoop/dfs/name# tree /usr/local/hadoop/dfs/name/
/usr/local/hadoop/dfs/name/
└── current
├── fsimage_0000000000000000000
├── fsimage_0000000000000000000.md5
├── seen_txid
└── VERSION
1 directory, 4 files
5. 启动Hadoop
因为hserver1是namenode,hserver2和hserver3都是datanode,所以只需要在hserver1上执行启动命令:
root@hserver1:/usr/local/hadoop/dfs/name# cd /opt/hadoop/hadoop-2.9.2/sbin/
root@hserver1:/opt/hadoop/hadoop-2.9.2/sbin# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hserver1]
hserver1: starting namenode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-namenode-hserver1.out
hserver2: starting datanode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-datanode-hserver2.out
hserver3: starting datanode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-datanode-hserver3.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:EluzQS5IRZaQAqRlc2O+h1rOS7jfaBSNlmgKqeknA6c.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-secondarynamenode-hserver1.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.9.2/logs/yarn-root-resourcemanager-hserver1.out
hserver3: starting nodemanager, logging to /opt/hadoop/hadoop-2.9.2/logs/yarn-root-nodemanager-hserver3.out
hserver2: starting nodemanager, logging to /opt/hadoop/hadoop-2.9.2/logs/yarn-root-nodemanager-hserver2.out
第一次执行上面命令的时候需要进行交互,输入yes即可
6. 测试Hadoop
启动Hadoop后,需要测试是否启动成功
可以在浏览器中输入namenode的IP地址192.168.92.11:50070即可访问overview页面
访问192.168.92.11:8088即可访问cluster页面。
四、上传本地文件至HDFS文件系统
- 首先创建一个目录用于存放上传的文件
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hdfs dfs -mkdir /upload
# 如果需要创建多级目录可以使用-p选项
此时在namenode的overview页面→Utilities→Browse the file system选项中可以看到新创建的目录以及信息
将本地的/home/test.log文件上传到hdfs文件系统中
# 前面的路径为文件在服务器中的路径,后面的为hdfs中的路径
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hdfs dfs -put /home/test.log /upload
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hdfs dfs -ls /upload
Found 1 items
-rw-r--r-- 2 root supergroup 24 2020-05-07 14:47 /upload/test.log
- 上传完成后可以在浏览器中对应的目录下查看到该文件的信息
Linux系统搭建Hadoop集群的更多相关文章
- Linux下搭建Hadoop集群
本文地址: 1.前言 本文描述的是如何使用3台Hadoop节点搭建一个集群.本文中,使用的是三个Ubuntu虚拟机,并没有使用三台物理机.在使用物理机搭建Hadoop集群的时候,也可以参考本文.首先这 ...
- Linux上搭建Hadoop集群
本文将为初学者的搭建简单的伪分布式集群,将搭建一台虚拟机,用于学习Hadoop 工具:vm虚拟机,centOS7,jdk-8,Hadoop2.7,xftp,xshell 用户:在虚拟机中创建一个had ...
- Linux下搭建Hadoop集群(Centos7.0)
Hadoop集群安装 概述 集群 cluster,将很多任务进程分布到多台计算机上:通过联合使用多台计算机的存储.计算能力完成更庞大的任务.为了实现无限量的存储和计算能力,在生产环境中必须使用集群来满 ...
- linux系统搭建zookeeper集群
转载至:https://blog.csdn.net/weixin_38111957/article/details/82927878 一.引言 今天咱们就来搭建一下zookeeper集群,当然搭建集群 ...
- Linux环境下Hadoop集群搭建
Linux环境下Hadoop集群搭建 前言: 最近来到了武汉大学,在这里开始了我的研究生生涯.昨天通过学长们的耐心培训,了解了Hadoop,Hdfs,Hive,Hbase,MangoDB等等相关的知识 ...
- Linux 搭建Hadoop集群 成功
内容基于(自己的真是操作步骤编写) Linux 搭建Hadoop集群---Jdk配置 Linux 搭建Hadoop集群 ---SSH免密登陆 一:下载安装 Hadoop 1.1:下载指定的Hadoop ...
- Linux上安装Hadoop集群(CentOS7+hadoop-2.8.0)--------hadoop环境的搭建
Linux上安装Hadoop集群(CentOS7+hadoop-2.8.0)------https://blog.csdn.net/pucao_cug/article/details/71698903 ...
- 搭建Hadoop集群 (一)
上面讲了如何搭建Hadoop的Standalone和Pseudo-Distributed Mode(搭建单节点Hadoop应用环境), 现在我们来搭建一个Fully-Distributed Mode的 ...
- 使用Windows Azure的VM安装和配置CDH搭建Hadoop集群
本文主要内容是使用Windows Azure的VIRTUAL MACHINES和NETWORKS服务安装CDH (Cloudera Distribution Including Apache Hado ...
随机推荐
- 推荐条+fragment
主布局 package com.example.dell.day1215; import android.support.design.widget.TabLayout; import android ...
- 这里有一份Java程序员的珍藏书单,请您注意查收
前言 不要因为迷茫,而停止了脚下前进的路.给大家推荐一份Java程序员必看的书单,豆瓣评分都挺不错的,每一本都值得去读,都值得去收藏,加油呀 本文已经收录到github https://github. ...
- Book of Shaders 03 - 学习随机与噪声生成算法
0x00 随机 我们不能预测天空中乌云的样子,因为它的纹理总是具有不可预测性.这种不可预测性叫做随机 (random). 在计算机图形学中,我们通常使用随机来模拟自然界中的噪声.如何获得一个随机值呢, ...
- 免费开源工作流Smartflow-Sharp v2.0
@font-face { font-family: 宋体 } @font-face { font-family: "Cambria Math" } @font-face { fon ...
- 票房和口碑称霸国庆档,用 Python 爬取猫眼评论区看看电影《我和我的家乡》到底有多牛
今年的国庆档电影市场的表现还是比较强势的,两名主力<我和我的家乡>和<姜子牙>起到了很好的带头作用. <姜子牙>首日破 2 亿,一举刷新由<哪吒之魔童降世&g ...
- Java知识系统回顾整理01基础05控制流程05 continue
continue:继续下一次循环 一.continue 题目: 如果是双数,后面的代码不执行,直接进行下一次循环 要求效果: 答案: public class HelloWorld { public ...
- 【题解】CF940F Machine Learning
Link 题目大意:单点修改,每次询问一个区间的所有颜色出现次数的\(\text{Mex}.\) 例如,区间中三种颜色分别出现了\(2,2,3\)次,又因为其他颜色出现次数一定是\(0\),所以这里的 ...
- 详解Class加载过程
1.Class文件内容格式 2.一个class文件是被加载到内存的过程是怎样的? loading 把一个class文件装到内存里,class文件是一个二进制,一个个的字节 linking Verifi ...
- 对ACE和ATL积分
下载source code - 39.66 KB 介绍 这篇文章展示了一种结合ACE和ATL的方法.它不打算作为功能演示,而是作为一个小型的"入门"解决方案,展示实现此目标的可行方 ...
- 多测师浅谈 学员实现价值就是我们的幸福_高级讲师肖sir
学员实现价值就是我们的幸福 作为一名资深的IT高级讲师,在传统的行业IT薪资基本都是过万,作为一名IT培训教师,培养出在不同领域的测试,并且接触各种各样的产品,目前市场流行的比如银行业务系统,语音类系 ...