一、环境说明

IP地址 主机名 备注 操作系统
192.168.92.11 hserver1 namenode Ubuntu 16.04
192.168.92.12 hserver2 datanode Ubuntu 16.04
192.168.12.13 hserver3 datanode Ubuntu 16.04

二、环境初始化

1. 关闭防火墙

如果使用CentOS系统搭建集群环境,需要将防火墙关闭。本文中使用Ubuntu操作系统,所以可以忽略此步骤。

2. 配置主机名

将三台机器的主机名分别配置为hserver1、hserver2、hserver3:

hostnamectl set-hostname hserver1

配置主机名后将主机名信息写入到hosts文件中:

root@hserver1:~# cat >> /etc/hosts << EOF
192.168.92.11 hserver1
192.168.92.12 hserver2
192.168.92.13 hserver3
EOF

3. 生成密钥并配置免密认证

首先在三台机器上生成密钥:

root@hserver1:~# ssh-keygen -t rsa -f ~/.ssh/id_rsa -N ''

分别对三台机器进行免密配置:

root@hserver1:~# apt-get install sshpass -y
root@hserver1:~# for host in 192.168.92.{11..13} hserver{1..3}; do ssh-keyscan $host >>~/.ssh/known_hosts 2>/dev/null; done
root@hserver1:~# for host in 192.168.92.{11..13}; do sshpass -p'123456' ssh-copy-id root@$host &>/dev/null; done

三、安装JDK和Hadoop

1. 安装JDK

分别在三台机器上安装openjdk-8-jdk-headless:

root@hserver1:~# apt-get install openjdk-8-jdk-headless -y

查看Java版本:

root@hserver1:~# java -version
openjdk version "1.8.0_252"
OpenJDK Runtime Environment (build 1.8.0_252-8u252-b09-1~16.04-b09)
OpenJDK 64-Bit Server VM (build 25.252-b09, mixed mode)

配置环境变量(/etc/prifile文件):

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export CLASSPATH=$:CLASSPATH:$JAVA_HOME/lib/
export PATH=$PATH:$JAVA_HOME/bin

2. 下载Hadoop

【注意】:以下所有的操作均需要在三台机器上进行

Hadoop的官网:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz

这里下载的是2.9.2版本(二进制包),将下载好的包上传到服务器中并解压至/opt/hadoop目录下:

root@hserver1:~# mkdir /opt/hadoop
root@hserver1:~# tar zxf hadoop-2.9.2.tar.gz -C /opt/hadoop/

解压完成后在服务器中创建如下目录:

mkdir /usr/local/hadoop
mkdir /usr/local/hadoop/tmp
mkdir /usr/local/hadoop/var
mkdir /usr/local/hadoop/dfs
mkdir /usr/local/hadoop/dfs/name
mkdir /usr/local/hadoop/dfs/data

3. 配置Hadoop

首先需要修改/opt/hadoop/hadoop-2.9.2/etc/hadoop目录下的一些文件:

  1. 修改core-site.xml文件,在<configuration>块中添加如下配置项:
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hserver1:9000</value>
</property>
  1. 修改hdfs-site.xml文件,在<configuration>块中添加如下配置项:
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
<description>Path on the local filesystem where theNameNode stores the namespace and transactions logs persistently.</description>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
<description>Comma separated list of paths on the localfilesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
<description>need not permissions</description>
</property>

【注意】dfs.permissions.enabled配置为false后,可以允许不要检查权限就生成dfs上的文件,但是需要防止误删除,请将它设置为true,或者直接将该property节点删除,因为默认就是true。

  1. 将mapred-site.xml.template复制一份,命名为mapred-site.xml并修改该文件,在<configuration>块中添加如下配置项:
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hserver1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hserver1:19888</value>
</property>
  1. 修改slave文件,将其中的localhost删除,添加如下内容:
hserver2
hserver3
  1. 修改yarn-site.xml文件,在<configuration>块中添加如下配置项:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hserver1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
<description>The address of the applications manager interface in the RM.</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
<description>The address of the scheduler interface.</description>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hserver1:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hserver1:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
<description>The http address of the RM web application.</description>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>1024</value>
        <discription>每个节点可用内存,单位MB,默认8182MB</discription>
</property>
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>

【注意】yarn.nodemanager.vmem-check-enabled这个的意思是忽略虚拟内存的检查,如果是安装在虚拟机上,这个配置很有用,配上去之后后续操作不容易出问题。如果是实体机上,并且内存够多,可以将这个配置去掉。

  1. 修改hadoop-env.sh脚本,将JAVA_HOME的环境变量修改为如下内容:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64

4. 初始化Hadoop

因为hserver1是namenode,hserver2和hserver3都是datanode,所以只需要对hserver1进行初始化操作,也就是对hdfs进行格式化。进入到hserver1这台机器的/opt/hadoop/hadoop-2.9.2/bin目录,执行如下命令进行初始化:

root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hadoop namenode -format
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it. 20/05/06 16:19:26 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = hserver1/192.168.92.11
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.9.2
STARTUP_MSG: classpath = /opt/hadoop/hadoop-2.9.2/etc/hadoop:/opt/hadoop/hadoop-
...
...
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 826afbeae31ca687bc2f8471dc841b66ed2c6704; compiled by 'ajisaka' on 2018-11-13T12:42Z
STARTUP_MSG: java = 1.8.0_252
************************************************************/
20/05/06 16:19:26 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
20/05/06 16:19:26 INFO namenode.NameNode: createNameNode [-format]
20/05/06 16:19:27 WARN common.Util: Path /usr/local/hadoop/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
20/05/06 16:19:27 WARN common.Util: Path /usr/local/hadoop/dfs/name should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-18e78322-4eac-4cf8-8b79-737a015623ca
20/05/06 16:19:27 INFO namenode.FSEditLog: Edit logging is async:true
20/05/06 16:19:27 INFO namenode.FSNamesystem: KeyProvider: null
20/05/06 16:19:27 INFO namenode.FSNamesystem: fsLock is fair: true
20/05/06 16:19:27 INFO namenode.FSNamesystem: Detailed lock hold time metrics enabled: false
20/05/06 16:19:27 INFO namenode.FSNamesystem: fsOwner = root (auth:SIMPLE)
20/05/06 16:19:27 INFO namenode.FSNamesystem: supergroup = supergroup
20/05/06 16:19:27 INFO namenode.FSNamesystem: isPermissionEnabled = false
20/05/06 16:19:27 INFO namenode.FSNamesystem: HA Enabled: false
20/05/06 16:19:27 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling
20/05/06 16:19:27 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit: configured=1000, counted=60, effected=1000
20/05/06 16:19:27 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
20/05/06 16:19:27 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
20/05/06 16:19:27 INFO blockmanagement.BlockManager: The block deletion will start around 2020 May 06 16:19:27
20/05/06 16:19:27 INFO util.GSet: Computing capacity for map BlocksMap
20/05/06 16:19:27 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:27 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
20/05/06 16:19:27 INFO util.GSet: capacity = 2^21 = 2097152 entries
20/05/06 16:19:27 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
20/05/06 16:19:27 WARN conf.Configuration: No unit for dfs.heartbeat.interval(3) assuming SECONDS
20/05/06 16:19:27 WARN conf.Configuration: No unit for dfs.namenode.safemode.extension(30000) assuming MILLISECONDS
20/05/06 16:19:27 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
20/05/06 16:19:27 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.min.datanodes = 0
20/05/06 16:19:27 INFO blockmanagement.BlockManagerSafeMode: dfs.namenode.safemode.extension = 30000
20/05/06 16:19:27 INFO blockmanagement.BlockManager: defaultReplication = 2
20/05/06 16:19:27 INFO blockmanagement.BlockManager: maxReplication = 512
20/05/06 16:19:27 INFO blockmanagement.BlockManager: minReplication = 1
20/05/06 16:19:27 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
20/05/06 16:19:27 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
20/05/06 16:19:27 INFO blockmanagement.BlockManager: encryptDataTransfer = false
20/05/06 16:19:27 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
20/05/06 16:19:27 INFO namenode.FSNamesystem: Append Enabled: true
20/05/06 16:19:28 INFO namenode.FSDirectory: GLOBAL serial map: bits=24 maxEntries=16777215
20/05/06 16:19:28 INFO util.GSet: Computing capacity for map INodeMap
20/05/06 16:19:28 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:28 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
20/05/06 16:19:28 INFO util.GSet: capacity = 2^20 = 1048576 entries
20/05/06 16:19:28 INFO namenode.FSDirectory: ACLs enabled? false
20/05/06 16:19:28 INFO namenode.FSDirectory: XAttrs enabled? true
20/05/06 16:19:28 INFO namenode.NameNode: Caching file names occurring more than 10 times
20/05/06 16:19:28 INFO snapshot.SnapshotManager: Loaded config captureOpenFiles: falseskipCaptureAccessTimeOnlyChange: false
20/05/06 16:19:28 INFO util.GSet: Computing capacity for map cachedBlocks
20/05/06 16:19:28 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:28 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
20/05/06 16:19:28 INFO util.GSet: capacity = 2^18 = 262144 entries
20/05/06 16:19:28 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
20/05/06 16:19:28 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
20/05/06 16:19:28 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
20/05/06 16:19:28 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
20/05/06 16:19:28 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
20/05/06 16:19:28 INFO util.GSet: Computing capacity for map NameNodeRetryCache
20/05/06 16:19:28 INFO util.GSet: VM type = 64-bit
20/05/06 16:19:28 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
20/05/06 16:19:28 INFO util.GSet: capacity = 2^15 = 32768 entries
20/05/06 16:19:28 INFO namenode.FSImage: Allocated new BlockPoolId: BP-2038544107-192.168.92.11-1588753168340
20/05/06 16:19:28 INFO common.Storage: Storage directory /usr/local/hadoop/dfs/name has been successfully formatted.
20/05/06 16:19:28 INFO namenode.FSImageFormatProtobuf: Saving image file /usr/local/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 using no compression
20/05/06 16:19:28 INFO namenode.FSImageFormatProtobuf: Image file /usr/local/hadoop/dfs/name/current/fsimage.ckpt_0000000000000000000 of size 322 bytes saved in 0 seconds .
20/05/06 16:19:28 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
20/05/06 16:19:28 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hserver1/192.168.92.11
************************************************************/

以上的内容中如果没有出现报错的话,代表初始化成功。初始化完成后在/usr/local/hadoop/dfs/name/目录下可以看到新增了一个current目录和一些文件:

root@hserver1:/usr/local/hadoop/dfs/name# tree /usr/local/hadoop/dfs/name/
/usr/local/hadoop/dfs/name/
└── current
├── fsimage_0000000000000000000
├── fsimage_0000000000000000000.md5
├── seen_txid
└── VERSION 1 directory, 4 files

5. 启动Hadoop

因为hserver1是namenode,hserver2和hserver3都是datanode,所以只需要在hserver1上执行启动命令:

root@hserver1:/usr/local/hadoop/dfs/name# cd /opt/hadoop/hadoop-2.9.2/sbin/
root@hserver1:/opt/hadoop/hadoop-2.9.2/sbin# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hserver1]
hserver1: starting namenode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-namenode-hserver1.out
hserver2: starting datanode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-datanode-hserver2.out
hserver3: starting datanode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-datanode-hserver3.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is SHA256:EluzQS5IRZaQAqRlc2O+h1rOS7jfaBSNlmgKqeknA6c.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /opt/hadoop/hadoop-2.9.2/logs/hadoop-root-secondarynamenode-hserver1.out
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop/hadoop-2.9.2/logs/yarn-root-resourcemanager-hserver1.out
hserver3: starting nodemanager, logging to /opt/hadoop/hadoop-2.9.2/logs/yarn-root-nodemanager-hserver3.out
hserver2: starting nodemanager, logging to /opt/hadoop/hadoop-2.9.2/logs/yarn-root-nodemanager-hserver2.out

第一次执行上面命令的时候需要进行交互,输入yes即可

6. 测试Hadoop

启动Hadoop后,需要测试是否启动成功

可以在浏览器中输入namenode的IP地址192.168.92.11:50070即可访问overview页面

访问192.168.92.11:8088即可访问cluster页面。

四、上传本地文件至HDFS文件系统

  1. 首先创建一个目录用于存放上传的文件
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hdfs dfs -mkdir /upload

# 如果需要创建多级目录可以使用-p选项
  1. 此时在namenode的overview页面→Utilities→Browse the file system选项中可以看到新创建的目录以及信息

  2. 将本地的/home/test.log文件上传到hdfs文件系统中

# 前面的路径为文件在服务器中的路径,后面的为hdfs中的路径
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hdfs dfs -put /home/test.log /upload
root@hserver1:/opt/hadoop/hadoop-2.9.2/bin# ./hdfs dfs -ls /upload
Found 1 items
-rw-r--r-- 2 root supergroup 24 2020-05-07 14:47 /upload/test.log
  1. 上传完成后可以在浏览器中对应的目录下查看到该文件的信息

Linux系统搭建Hadoop集群的更多相关文章

  1. Linux下搭建Hadoop集群

    本文地址: 1.前言 本文描述的是如何使用3台Hadoop节点搭建一个集群.本文中,使用的是三个Ubuntu虚拟机,并没有使用三台物理机.在使用物理机搭建Hadoop集群的时候,也可以参考本文.首先这 ...

  2. Linux上搭建Hadoop集群

    本文将为初学者的搭建简单的伪分布式集群,将搭建一台虚拟机,用于学习Hadoop 工具:vm虚拟机,centOS7,jdk-8,Hadoop2.7,xftp,xshell 用户:在虚拟机中创建一个had ...

  3. Linux下搭建Hadoop集群(Centos7.0)

    Hadoop集群安装 概述 集群 cluster,将很多任务进程分布到多台计算机上:通过联合使用多台计算机的存储.计算能力完成更庞大的任务.为了实现无限量的存储和计算能力,在生产环境中必须使用集群来满 ...

  4. linux系统搭建zookeeper集群

    转载至:https://blog.csdn.net/weixin_38111957/article/details/82927878 一.引言 今天咱们就来搭建一下zookeeper集群,当然搭建集群 ...

  5. Linux环境下Hadoop集群搭建

    Linux环境下Hadoop集群搭建 前言: 最近来到了武汉大学,在这里开始了我的研究生生涯.昨天通过学长们的耐心培训,了解了Hadoop,Hdfs,Hive,Hbase,MangoDB等等相关的知识 ...

  6. Linux 搭建Hadoop集群 成功

    内容基于(自己的真是操作步骤编写) Linux 搭建Hadoop集群---Jdk配置 Linux 搭建Hadoop集群 ---SSH免密登陆 一:下载安装 Hadoop 1.1:下载指定的Hadoop ...

  7. Linux上安装Hadoop集群(CentOS7+hadoop-2.8.0)--------hadoop环境的搭建

    Linux上安装Hadoop集群(CentOS7+hadoop-2.8.0)------https://blog.csdn.net/pucao_cug/article/details/71698903 ...

  8. 搭建Hadoop集群 (一)

    上面讲了如何搭建Hadoop的Standalone和Pseudo-Distributed Mode(搭建单节点Hadoop应用环境), 现在我们来搭建一个Fully-Distributed Mode的 ...

  9. 使用Windows Azure的VM安装和配置CDH搭建Hadoop集群

    本文主要内容是使用Windows Azure的VIRTUAL MACHINES和NETWORKS服务安装CDH (Cloudera Distribution Including Apache Hado ...

随机推荐

  1. Java随谈(一)魔术数字、常量和枚举

    本文适合对 Java 或 C 有一些了解的用户阅读,推荐阅读时间15分钟. 导言 写这个系列的原因? 我曾经听过一种说法,如果不了解Liunx的网络通讯,就很难理解理解Java的IO:如果不知道Jav ...

  2. Leetcode-哈希表

    136. 只出现一次的数字 https://leetcode-cn.com/problems/single-number/ 给定一个非空整数数组,除了某个元素只出现一次以外,其余每个元素均出现两次.找 ...

  3. 别人写的很好Arduino教材

    原文来自:https://www.arduino.cn/thread-31720-1-1.html 上一篇:Arduino教程--通过 库管理器 添加库 http://www.arduino.cn/t ...

  4. C++extern关键字理解

    extern是一种"外部声明"的关键字,字面意思就是在此处声明某种变量或函数,在外部定义. 下面的示意图是我的理解. extern关键字的主要作用是扩大变量/函数的作用域,使得其它 ...

  5. 【题解】 P2734 [USACO3.3]游戏 A Game

    \(\color{purple}{Link}\) \(\text{Solution:}\) 考虑区间\([l,r]\)的最优解.显然它可以由\([l+1,r]\)或\([l,r-1]\)转移而来.至此 ...

  6. JavaScript写秒表

    1.HTML部分 <div id="div1"> <span id="hour">00</span> <span> ...

  7. lua 1.0 源码分析 -- 总结

    读完 lua1.0 的源码感触:1. 把复杂的代码写简单2. pack 的内存回收3. hash 实现简单,但是应该可以改进,看高版本的代码怎么实现4. lua 初始化环境做了什么,就是一组全局变量初 ...

  8. day32 Pyhton 模块02复习 序列化

    一. 什么是序列化 在我们存储数据或者网络传输数据的时候. 需要对我们的对象进行处理. 把对象处理成方便存储和传输的数据格式. 这个过程叫序列化 不同的序列化, 结果也不同. 但是目的是一样的. 都是 ...

  9. 接口管理平台Yapi

    1.介绍 YApi 是由去哪儿移动架构组推出的一款开源项目,是高效.易用.功能强大的 api 管理平台,旨在为开发.产品.测试人员提供更优雅的接口管理服务. 官网:https://yapi.ymfe. ...

  10. P2340 [USACO03FALL]Cow Exhibition G题解

    新的奇巧淫技 原题传送门 众所周知,模拟退火是一种很强大的算法,DP很强,但我模拟退火也不虚,很多题你如果不会的话基本可以拿来水很多分.比如这道题,我用模拟退火可以轻松水过(虽然我是足足交了两页才过) ...