hadoop 2.7.3伪分布式安装

hadoop集群的伪分布式部署由于只需要一台服务器，在测试，开发过程中还是很方便实用的，有必要将搭建伪分布式的过程记录下来，好记性不如烂笔头。

hadoop 2.7.3

JDK 1.8.91

到Apache的官网下载hadoop的二进制安装包。

cd /home/fuxin.zhao/soft

tar -czvf hadoop 2.7.3.tar.gz

cd hadoop-2.7.3

cd etc/hadoop/

pwd

1. 建立本机到本机的免密登录

ssh-keygen -t rsa -P ""

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

ssh localhost

1. 修改hadoop的配置文件

位于$HADOOP_HOME/conf目录下的修改四个配置文件：slaves、core-site.xml

hdfs-site.xml 、mapred-site.xml 、 yarn-site.xml

vi etc/hadoop/yarn-env.sh

export JAVA_HOME=/usr/local/jdk

vi etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/local/jdk

vi slaves

##加入本机的hostname

fuxin.zhao@ubuntuServer01:~/soft/hadoop-2.7.3/etc/hadoop$ vi slaves

ubuntuServer01

vi core-site.xml

<configuration>

 <property>

   <name>fs.defaultFS</name>

   <value>hdfs://ubuntuServer01:9000</value>

 </property>

 <property>

   <name>hadoop.tmp.dir</name>

   <value>file:/home/fuxin.zhao/hadoop/tmp</value>

   <description>Abase for other temporary directories.</description>

 </property>

</configuration>

vi hdfs-site.xml：

<configuration>

    <property>

         <name>dfs.replication</name>

         <value>1</value>

    </property>

    <property>

         <name>dfs.namenode.name.dir</name>

         <value>file:/home/fuxin.zhao/hadoop/tmp/dfs/name</value>

    </property>

    <property>

         <name>dfs.datanode.data.dir</name>

         <value>file:/home/fuxin.zhao/hadoop/tmp/dfs/data</value>

    </property>

   <property>

    <name>dfs.block.size</name>

    <value>67108864</value>

   </property>

</configuration>

vi yarn-site.xml

<configuration>

<property>

  <name>yarn.nodemanager.aux-services</name>

  <value>mapreduce_shuffle</value>

</property>

<property>

  <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>

  <value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

<property>

  <name>yarn.scheduler.minimum-allocation-mb</name>

  <value>512</value>

</property>

<property>

  <name>yarn.scheduler.maximum-allocation-mb</name>

  <value>2048</value>

</property>

<property>

  <name>yarn.scheduler.minimum-allocation-vcores</name>

  <value>1</value>

</property>

<property>

  <name>yarn.scheduler.maximum-allocation-vcores</name>

  <value>2</value>

</property>

</configuration>

vi mapred-site.xml

<configuration>

<property>

	<name>mapreduce.framework.name</name>

	<value>yarn</value>

</property>

<property>

	<name>yarn.app.mapreduce.am.resource.mb</name>

	<value>512</value>

</property>

<property>

	<name>mapreduce.map.memory.mb</name>

	<value>512</value>

</property>

<property>

	<name>mapreduce.reduce.memory.mb</name>

	<value>512</value>

</property>

</configuration>

vi .bashrc

export JAVA_HOME=/usr/local/jdk

export HADOOP_HOME=/home/fuxin.zhao/soft/hadoop-2.7.3

export PATH=${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:$PATH

配置完成后，执行 NameNode 的格式化:

./bin/hdfs namenode -format

./sbin/start-dfs.sh

./sbin/start-yarn.sh

mr-jobhistory-daemon.sh start historyserver

查看hdfs的web页面：

http://ubuntuserver01:50070/

http://ubuntuserver01:8088/

hadoop fs -ls /

hadoop fs -mkdir /user

hadoop fs -mkdir /user/fuxin.zhao

hadoop fs -touchz textFile

运行官方自带的测试job（teragen and terasort）：

测试job（teragen and terasort）

#在/tmp/terasort/1000000下生成100M数据

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar teragen 1000000 /tmp/terasort/1000000-input

#排序，输出到/tmp/terasort/1000000-output

hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar terasort /tmp/terasort/1000000-input /tmp/terasort/1000000-output

#删除临时文件

hadoop fs -rm -r /tmp/terasort/1000000-input

hadoop fs -rm -r /tmp/terasort/1000000-output