软件环境

  操作系统 :  OracleLinux-R6-U6

  主机名:  hadoop

  java:  jdk1.7.0_75

  hadoop: hadoop-2.4.1

环境搭建

  1、软件安装

  由于所需的软件均为绿色包,所以将java和hadoop分别解压到操作系统根目录即可。

[root@hadoop training]# ls -l /
总用量 110
dr-xr-xr-x. 2 root root 4096 5月 17 19:13 bin
dr-xr-xr-x. 5 root root 1024 5月 17 17:45 boot
drwxr-xr-x. 2 root root 4096 10月 15 2014 cgroup
drwxr-xr-x. 19 root root 3780 5月 18 01:36 dev
drwxr-xr-x. 131 root root 12288 5月 18 17:59 etc
drwxr-xr-x. 11 67974 users 4096 5月 18 18:22 hadoop-2.4.1
drwxr-xr-x. 2 root root 4096 11月 1 2011 home
drwxr-xr-x. 8 uucp 143 4096 12月 19 2014 jdk1.7.0_75

  

  2、配置环境变量

  修改profile文件

[root@hadoop training]# cat ~/.bash_profile
# .bash_profile # Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi # User specific environment and startup programs PATH=$PATH:$HOME/bin export PATH # set python environment
PYTHON_PATH=/python2.7
export PATH=$PYTHON_PATH/bin:$PATH # set java environment 分为JAVA JDK CLASSPATH三类
export JAVA_HOME=/jdk1.7.0_75
export JRE_HOME=/jdk1.7.0_75/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
export HADOOP_HOME=/hadoop-2.4.1
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

  修改hosts文件

[root@hadoop ~]# cat /etc/hosts
127.0.0.1 localhost
172.10.236.21 hadoop

  

  3、hadoop分布式文件配置

  hadoop的所有配置文件均在/hadoop-2.4.1/etc/hadoop/目录下

  配置hadoop-env.sh,修改java_home

# The java implementation to use.
export JAVA_HOME=/jdk1.7.0_75

  

  配置hdfs-site.xml,伪分布只需要设置一个复制节点即可。

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

  

  配置core-site.xml,设置namenode格式化数据的存储目录,操作系统每次重启/tmp目录下的数据被清除,所以需要为namenode数据设置一个别的目录。

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop-2.4.1/tmp</value>
</property>
</configuration>

  配置mapred-site.xml,由于默认只有mapred-site.xml.template文件,所以拷贝一份mapred-site.xml.template为mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
--> <!-- Put site-specific property overrides in this file. --> <configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

  

  配置yarn-site.xml

<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration> <!-- Site specific YARN configuration properties -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

 

最后一步: 

  格式化namenode

  至此hadoop所有配置文件已全部配置完成,现在格式化namenode,以记录处理hadoop分布式信息了。

  # hdfs namenode -format

……
17/05/18 18:28:19 INFO util.GSet: VM type = 32-bit
17/05/18 18:28:19 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
17/05/18 18:28:19 INFO util.GSet: capacity = 2^19 = 524288 entries
17/05/18 18:28:19 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
17/05/18 18:28:19 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
17/05/18 18:28:19 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
17/05/18 18:28:19 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
17/05/18 18:28:19 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
17/05/18 18:28:19 INFO util.GSet: Computing capacity for map NameNodeRetryCache
17/05/18 18:28:19 INFO util.GSet: VM type = 32-bit
17/05/18 18:28:19 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
17/05/18 18:28:19 INFO util.GSet: capacity = 2^16 = 65536 entries
17/05/18 18:28:19 INFO namenode.AclConfigFlag: ACLs enabled? false
17/05/18 18:28:20 INFO namenode.FSImage: Allocated new BlockPoolId: BP-39137453-172.10.236.21-1495103299866
17/05/18 18:28:20 INFO common.Storage: Storage directory /hadoop-2.4.1/tmp/dfs/name has been successfully formatted.
17/05/18 18:28:20 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
17/05/18 18:28:20 INFO util.ExitUtil: Exiting with status 0
17/05/18 18:28:20 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop/172.10.236.21
************************************************************/

  查看格式化信息

[root@hadoop hadoop]# ls /hadoop-2.4.1/tmp/
dfs nm-local-dir

 

至此,hadoop环境已经可以使用了,可以通过start-all.sh来启动hadoop所有服务

  检查进程

[root@hadoop ~]# jps
16570 Jps
15893 DataNode
16461 NodeManager
16179 ResourceManager
16041 SecondaryNameNode
15774 NameNode

  

hdfs环境可用性测试

创建dfs目录

# hdfs dfs -mkdir /logs

查看创建的目录

# hdfs dfs -ls /

Found 1 items
drwxr-xr-x - root supergroup 0 2017-05-18 18:32 /logs  

向新建目录发送数据文件

# hdfs dfs -put install.log /logs

查看文件发送结果

[root@hadoop ~]# hdfs dfs -ls /logs

Found 1 items
-rw-r--r-- 1 root supergroup 57162 2017-05-18 18:32 /logs/install.log

  

hadoop伪分布模式配置成功并可以使用了。

无密码验证配置

由于每次启动hadoop服务都需要输入密码,对于hadoop集群节点太多的情况下显然不合适,所以需要设置启动hadoop服务无密码的方法。

生成密码文件,并将公钥文件拷贝到hadoop服务器(这里是自己)

# ssh-keygen -t rsa

# 将id_rsa.pub内容拷贝到authorized_keys中

# ssh-copy-id id_rsa.pub hadoop

[root@hadoop ~]# ls ~/.ssh/
authorized_keys id_rsa id_rsa.pub known_hosts

  

启动hadoop

[root@hadoop ~]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [hadoop]
The authenticity of host 'hadoop (172.10.236.21)' can't be established.
RSA key fingerprint is b9:d5:64:bb:f9:34:77:22:d7:a7:09:a6:1e:ab:ba:83.
Are you sure you want to continue connecting (yes/no)? yes
hadoop: Warning: Permanently added 'hadoop' (RSA) to the list of known hosts.
hadoop: starting namenode, logging to /hadoop-2.4.1/logs/hadoop-root-namenode-hadoop.out
localhost: starting datanode, logging to /hadoop-2.4.1/logs/hadoop-root-datanode-hadoop.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /hadoop-2.4.1/logs/hadoop-root-secondarynamenode-hadoop.out
starting yarn daemons
starting resourcemanager, logging to /hadoop-2.4.1/logs/yarn-root-resourcemanager-hadoop.out
localhost: starting nodemanager, logging to /hadoop-2.4.1/logs/yarn-root-nodemanager-hadoop.out

  

hadoop伪分布模式安装的更多相关文章

  1. Hadoop 伪分布模式安装

    ( 温馨提示:图片中有id有姓名,不要盗用哦,可参考流程,有问题评论区留言哦 ) 一.任务目标 1.了解Hadoop的3种运行模式 2.熟练掌握Hadoop伪分布模式安装流程 3.培养独立完成Hado ...

  2. Hadoop学习笔记(3)hadoop伪分布模式安装

    为了学习这部分的功能,我们这里的linux都是使用root用户登录的.所以每个命令的前面都有一个#符号. 伪分布模式安装步骤: 关闭防火墙 修改ip地址 修改hostname 设置ssh自动登录 安装 ...

  3. hadoop伪分布模式的配置和一些常用命令

    大数据的发展历史 3V:volume.velocity.variety(结构化和非结构化数据).value(价值密度低) 大数据带来的技术挑战 存储容量不断增加 获取有价值的信息的难度:搜索.广告.推 ...

  4. 【原】Hadoop伪分布模式的安装

    Hadoop伪分布模式的安装 [环境参数] (1)Host OS:Win7 64bit (2)IDE:Eclipse Version: Luna Service Release 2 (4.4.2) ( ...

  5. Spark新手入门——2.Hadoop集群(伪分布模式)安装

    主要包括以下三部分,本文为第二部分: 一. Scala环境准备 查看 二. Hadoop集群(伪分布模式)安装 三. Spark集群(standalone模式)安装 查看 Hadoop集群(伪分布模式 ...

  6. Hadoop单点伪分布模式安装

    Hadoop单点伪分布模式安装 概述 单点 single-node,单节点,即一台计算机. 伪分布式模式 pseudo-distributed mode 所谓集群,表面上看是多台计算机联合完成任务:但 ...

  7. 伪分布模式安装hadoop

    准备工具: 虚拟机:VMware Linux系统:CentOS hadoop-1.1.2.tar.gz jdk-7u75-linux-x64.gz CentOS的网络配置 1.设置主机中VMware ...

  8. Hadoop伪分布模式配置

    本作品由Man_华创作,采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可.基于http://www.cnblogs.com/manhua/上的作品创作. 请先按照上一篇文章H ...

  9. Linux环境搭建Hadoop伪分布模式

    Hadoop有三种分布模式:单机模式.伪分布.全分布模式,相比于其他两种,伪分布是最适合初学者开发学习使用的,可以了解Hadoop的运行原理,是最好的选择.接下来,就开始部署环境. 首先要安装好Lin ...

随机推荐

  1. LINUX下 一句话添加用户并设置ROOT权限

    来源:linux一条命令添加用户并设置密码 linux一条命令添加一个root级别账户并设置密码 LINUX提权,除非是拿的EXP反弹CMD,才会有回显,这样添加管理员方便了. 通常是在SHELL,菜 ...

  2. Android 开发工具类 12_PullXmlTools

    xml 格式数据 <?xml version="1.0" encoding="UTF-8"?> <user-list> <user ...

  3. HUE配置文件hue.ini 的mapred_clusters模块详解(图文详解)(分HA集群和非HA集群)

    不多说,直接上干货! 我的集群机器情况是 bigdatamaster(192.168.80.10).bigdataslave1(192.168.80.11)和bigdataslave2(192.168 ...

  4. mysql预编译

    一.背景: 用Mybatis+mysql的架构做开发,大家都知道,Mybatis内置参数,形如#{xxx}的,均采用了sql预编译的形式,举例如下: <select id=”aaa” param ...

  5. Mahout实战---评估推荐程序

    推荐程序的一般评测标准有MAE(平均绝对误差),Precision(查准率),recall(查全率) 针对Mahout实战---运行第一个推荐引擎 的推荐程序,将使用上面三个标准分别测量 MAE(平均 ...

  6. C++ STL使用说明

    标准模板库(Standard Template Library,STL)是一系列通用化组件的集合,包括容器(container).算法(algorithm)和迭代器(iterator). 迭代器ite ...

  7. [CTSC 2018]假面

    Description 题库链接 有 \(n\) 个敌方单位,初始第 \(i\) 个单位的血量为 \(m_i\) .共 \(Q\) 次操作,分两种: 对某一个单位以 \(p\) 的概率造成 \(1\) ...

  8. Microsoft SQL SERVER 2008 R2 REPORT SERVICE 匿名登录

    SQL SERVER 2008 R2 是微软目前最新的数据库版本,在之前的SQL SERVER 2005中,我们可以通过修改IIS对应的SSRS站点及SSRS的配置文件,将SSRS配置成匿名登录的方式 ...

  9. [javaSE] 网络编程(UDP通信)

    UDP发送端 获取DatagramSocket对象,new出来 获取DatagramPacket对象,new出来,构造参数:byte[]数组,int长度,InetAddress对象,int端口 调用D ...

  10. Microservices与DDD的关系

    Microservices(微服务架构)和DDD(领域驱动设计)是时下最炙手可热的两个技术词汇.在最近两年的咨询工作中总是会被不同的团队和角色询问,由此也促使我思考为什么这两个技术词汇被这么深入人心的 ...