基于Hadoop2.5.0的集群搭建
http://download.csdn.net/download/yameing/8011891
一、 规划
1. 准备安装包
JDK:http://download.oracle.com/otn-pub/java/jdk/7u67-b01/jdk-7u67-linux-x64.tar.gz
Hadoop:http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.5.0/hadoop-2.5.0.tar.gz
Hive:http://apache.fayea.com/apache-mirror/hive/hive-0.13.1/apache-hive-0.13.1-bin.tar.gz
ZK:http://mirrors.cnnic.cn/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
HBase:http://apache.fayea.com/apache-mirror/hbase/hbase-0.98.5/hbase-0.98.5-hadoop2-bin.tar.gz
MySql:http://ftp.nchu.edu.tw/Unix/Database/MySQL/Downloads/MySQL-5.6/mysql-5.6.12-linux-glibc2.5-x86_64.tar.gz
MysqlConnector:http://ftp.nchu.edu.tw/Unix/Database/MySQL/Downloads/Connector-J/mysql-connector-java-5.1.25.zip
Sqoop:complete based on sqoop-1.4.5 and current hadoop version
http://mirror.bit.edu.cn/apache/sqoop/1.4.5/sqoop-1.4.5.tar.gz
2. 环境规划
| 
 类型  | 
 名称  | 
 配置  | 
 IP  | 
 安装内容  | 
| 
 Hadoop集群主节点  | 
 mycluster1  | 
 16核*32G*2T  | 
 192.168.2.92  | 
 Hadoop  | 
| 
 mycluster2  | 
 16核*32G*6T  | 
 192.168.2.88  | 
||
| 
 Hadoop集群从节点  | 
 mycluster3  | 
 4核*8G*250G  | 
 192.168.1.84  | 
|
| 
 mycluster4  | 
 4核*8G*250G  | 
 192.168.1.85  | 
||
| 
 mycluster5  | 
 4核*8G*250G  | 
 192.168.1.86  | 
||
| 
 mycluster6  | 
 4核*8G*250G  | 
 192.168.1.87  | 
||
| 
 mycluster7  | 
 4核*8G*250G  | 
 192.168.1.88  | 
||
| 
 mycluster8  | 
 4核*8G*250G  | 
 192.168.1.89  | 
||
| 
 mycluster9  | 
 4核*8G*250G  | 
 192.168.1.90  | 
||
| 
 mycluster10  | 
 4核*8G*250G  | 
 192.168.1.91  | 
||
| 
 分布式应用  | 
 mycluster11  | 
 4核*8G*250G  | 
 192.168.1.92  | 
 Hive Sqoop MySQL  | 
二、 安装
1. 环境配置
a) 基本配置
1. 配置各机器的机器名
| 
 vi /etc/sysconfig/network vi /etc/hosts hostname mycluster*  | 
2. 所有节点关闭防火墙
| 
 service iptables stop  | 
3. 将所有机器名配置到各机器中
| 
 vi /etc/hosts  | 
| 
 #127.0.0.1 localhost localhost.localdomain mycluster5 #::1 localhost localhost.localdomain mycluster5 #这里注释掉关于localhost的配置,详情查看遇到的问题 #因为zookeeper要求配置localhost,所以这里关于本地地址的配置改为如下: 127.0.0.1 localhost localhost.localdomain ::1 localhost localhost.localdomain 192.168.2.92 mycluster1 192.168.2.88 mycluster2 192.168.1.84 mycluster3 192.168.1.85 mycluster4 192.168.1.86 mycluster5 192.168.1.87 mycluster6 192.168.1.88 mycluster7 192.168.1.89 mycluster8 192.168.1.90 mycluster9 192.168.1.91 mycluster10 192.168.1.92 mycluster11  | 
4. 保证各机器间时间差不超过2分钟
| 
 date date -s "2014-09-05 23:38:00" ntpdate time.windows.com clock -w  | 
 查看 修改 若连通互联网,可同步微软 写入BIOS  | 
b) 打通SSH
1. 在各机器创建mycluster用户。以后的命令都在mycluster下执行。
| 
 groupadd mycluster useradd -g mycluster -G root -d /home/mycluster mycluster passwd qcpass@lh  | 
2. 在各Slave创建ssh目录。
| 
 mkdir /home/mycluster/.ssh chmod 700 /home/mycluster/.ssh  | 
 目录权限必须是700,否则无法ssh登录  | 
3. 登录Master,生成SSH公钥、私钥,复制公钥到各Slave。
| 
 ssh-keygen -t rsa cd /home/mycluster/.ssh cp id_rsa.pub authorized_keys scp authorized_keys mycluster@mycluster*:/home/mycluster/.ssh  | 
c) 安装JDK1.7
1. 登录root用户安装JDK到/usr/java目录下。
| 
 tar -zxvf jdk-7u67-linux-x64.gz ln -s jdk1.7.0_67 jdk  | 
2. 配置环境变量。
| 
 vi /etc/profile vi .bashrc  | 
 所有用户可见的方式 当前用户可见的方式  | 
|
| 
 export JAVA_HOME=/home/mycluster/jdk export CLASSPATH=. export PATH=$JAVA_HOME/bin:$PATH  | 
||
| 
 source /etc/profile env | grep JAVA_HOME  | 
 生效 验证  | 
|
2. Hadoop2.5.0安装
a) 安装与配置
| 
 tar zxvf hadoop-2.5.0.tar.gz cd hadoop-2.5.0/etc/hadoop/ vi hadoop-env.sh  | 
| 
 export JAVA_HOME=/home/mycluster/jdk  | 
| 
 vi core-site.xml  | 
| 
 <property> <name>fs.defaultFS</name> <value>hdfs://192.168.2.92:9100</value> </property> <property> <name>fs.trash.interval</name> <value>14400</value> </property>  | 
| 
 vi hdfs-site.xml  | 
| 
 <property> <name>dfs.namenode.name.dir</name> <value>/home/mycluster/data/dfs_namenode_name_dir</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/mycluster/data/dfs_datanode_data_dir</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <!-- 抽查了部分规划中的slave节点,发现其中最大的一块存储都是195G,且仅使用了1%,为/home目录所持有 -->  | 
| 
 vi mapred-site.xml (yarn必须小写)  | 
| 
 <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>  | 
| 
 vi yarn-site.xml  | 
| 
 <property> <name>yarn.resourcemanager.hostname</name> <value>mycluster1</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>  | 
| 
 vi slaves  | 
| 
 mycluster3 mycluster4 mycluster5 mycluster6 mycluster7 mycluster8 mycluster9 mycluster10  | 
3. 从Master复制Hadoop目录到各Slave。
| 
 scp -r /home/mycluster/hadoop-2.5.0 mycluster@mycluster3:/home/mycluster  | 
b) 启动与测试
1. 登录Master,配置Hadoop环境变量。
| 
 vi /home/mycluster/.bash_profile  | 
| 
 export HADOOP_HOME=/home/mycluster/hadoop-2.5.0 export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH  | 
| 
 source /home/mycluster/.bash_profile env | grep HADOOP_HOME  | 
2. 格式化HDFS,启动Hadoop,测试。
| 
 hadoop namenode -format start-dfs.sh start-yarn.sh jps hadoop jar hadoop-2.5.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar pi 2 10000  | 
3. 编写自定义MR程序测试。
| 
 (暂不提供)  | 
3. 安装MySQL
a) 安装与配置
这里安装的是MySQL绿色版,好处是全过程可控,当然图方便可以安装RPM。
1. 安装tar.gz
| 
 tar zxvf mysql-5.6.12-linux-glibc2.5-i686.tar.gz mv mysql-5.6.12-linux-glibc2.5-i686 /usr/local/mysql  | 
2. 创建组、用户,授权
| 
 groupadd mycluster useradd -g mycluster -G root -d /home/mycluster mycluster passwd qcpass@lh cd /usr/local/mysql chown -R mycluster . chgrp -R mycluster . scripts/mysql_install_db --user=mycluster chown -R root . chown -R mycluster data chmod u+x data/ibdata1 mv mycluster11.err mycluster11.err_  | 
3. 配置文件
| 
 mv /etc/my.cnf /etc/my.cnf_ cp support-files/my-default.cnf /etc/my.cnf vi /etc/my.cnf  | 
 避免以前安装过MySQL  | 
| 
 [mysqld] basedir=/usr/local/mysql datadir=/usr/local/mysql/data character-set-server=utf8 lower_case_table_names=1 sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES  | 
|
b) 启动与测试
1. 启动
| 
 mv /etc/init.d/mysql /etc/init.d/mysql_ cp support-files/mysql.server /etc/init.d/mysql service mysql start chkconfig --add mysql  | 
 避免以前安装过MySQL 立即启动 开机启动  | 
2. 修改密码
| 
 vi /mycluster/.bash_profile  | 
|
| 
 export PATH=/usr/local/mysql/bin:$PATH  | 
|
| 
 source /mycluster/.bash_profile mysql -u root -p mysql> set password = password('root');  | 
 root密码为空 修改密码为root  | 
4. 安装Hive
a) 安装与配置
1. 解压。
| 
 tar zxvf apache-hive-0.13.1-bin.tar.gz echo 'export HIVE_HOME=/home/mycluster/apache-hive-0.13.1-bin' >> /home/mycluster/.bashrc echo 'export PATH=$HIVE_HOME/bin:$PATH' >> /home/mycluster/.bashrc  | 
2. 在HDFS中创建Hive目录。
| 
 hadoop fs -mkdir /tmp hadoop fs -mkdir /user/hive/warehouse hadoop fs -chmod g+w /tmp hadoop fs -chmod g+w /user/hive/warehouse  | 
3. 创建MySQL数据库。
| 
 create database hive character set latin1;  | 
4. 配置文件。
| 
 cd apache-hive-0.13.1-bin/conf cp hive-default.xml.template hive-site.xml vi hive-site.xml  | 
| 
 <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://localhost:3306/hive</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root</value> </property> </configuration>  | 
| 
 cp mysql-connector-java-5.1.25-bin.jar /home/mycluster/apache-hive-0.13.1-bin/lib/  | 
5. 配置环境变量。
| 
 vi /home/hadoop/.bash_profile  | 
| 
 export HIVE_HOME=/home/hadoop/hive-0.9.0 export PATH=$HIVE_HOME/bin:$PATH  | 
| 
 source /home/hadoop/.bash_profile  | 
b) 启动与测试
(几种启动方式,暂缺)
5. 安装Sqoop
a) 安装与配置
1. 安装tar.gz
| 
 tar -xvf sqoop-1.4.5.bin__hadoop-2.5.0.tar.gz ln -s sqoop-1.4.5.bin__hadoop-2.5.0 sqoop export SQOOP_HOME=/home/mycluster/sqoop export PATH=$SQOOP_HOME/bin:$PATH  | 
2. 添加jar
根据需要,添加mysql connector、oracle connector
| 
 scp mysql-connector-java-5.1.25-bin.jar mycluster@mycluster11:/home/mycluster/sqoop/lib scp ojdbc14.jar mycluster@mycluster11:/home/mycluster/sqoop/lib  | 
3. 配置文件
| 
 cd /home/mycluster/sqoop/conf cp sqoop-env-template.sh sqoop-env.sh vi sqoop-env.sh  | 
| 
 export HADOOP_COMMON_HOME=/home/mycluster/hadoop-2.5.0 export HADOOP_MAPRED_HOME=/home/mycluster/hadoop-2.5.0/share/hadoop/mapreduce export HIVE_HOME=/home/mycluster/apache-hive-0.13.1-bin  | 
b) 启动与测试
| 
 sqoop list-databases --connect jdbc:mysql://localhost:3306/ --username root --password root  | 
6. 安装ZooKeeper3.4.6
a) 安装与配置
1. 安装与配置
| 
 tar -zxvf zookeeper-3.4.6.tar.gz mkdir /home/mycluster/zookeeper-3.4.6/zookeeperdir/logs cp zookeeper-3.4.6/conf/zoo_sample.cfg zookeeper-3.4.6/conf/zoo.cfg vi zookeeper-3.4.6/conf/zoo.cfg  | 
| 
 tickTime=2000 initLimit=10 syncLimit=5 dataDir=/home/mycluster/zookeeper-3.4.6/zookeeperdir/zookeeper-data dataLogDir=/home/mycluster/zookeeper-3.4.6/zookeeperdir/logs clientPort=2181 server.1=mycluster1:2888:3888 server.2=mycluster3:2888:3888 server.3=mycluster4:2888:3888  | 
| 
 vi .bashrc  | 
| 
 export ZOOKEEPER_HOME=/home/mycluster/zookeeper-3.4.6 export PATH=$ZOOKEEPER_HOME/bin:$PATH  | 
2. 复制ZK目录到各主机。
| 
 scp -r /home/mycluster/zookeeper-3.4.6 mycluster@mycluster3:/home/mycluster scp -r /home/mycluster/zookeeper-3.4.6 mycluster@mycluster4:/home/mycluster  | 
3. 设置myid
| 
 [mycluster@mycluster1 ~]$ echo "1" > /home/mycluster/zookeeper-3.4.6/zookeeperdir/zookeeper-data/myid [mycluster@mycluster3 ~]$ echo "2" > /home/mycluster/zookeeper-3.4.6/zookeeperdir/zookeeper-data/myid [mycluster@mycluster4 ~]$ echo "3" > /home/mycluster/zookeeper-3.4.6/zookeeperdir/zookeeper-data/myid  | 
b) 启动与测试
1. 登录各机器启动ZK。
| 
 [mycluster@mycluster1 ~]$ zkServer.sh start [mycluster@mycluster3 ~]$ zkServer.sh start [mycluster@mycluster4 ~]$ zkServer.sh start  | 
2. 查看启动状态。
由于ZooKeeper集群启动的时候,每个结点都试图去连接集群中的其它结点,先启动的肯定连不上后面还没启动的,所以日志前面部分的连接异常是可以忽略的。通过后面部分可以看到,集群在选出一个Leader后,最后稳定了。
| 
 [mycluster@mycluster1 ~]$ zkServer.sh status JMX enabled by default Using config: /home/mycluster/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: follower [mycluster@mycluster3 ~]$ zkServer.sh status JMX enabled by default Using config: /home/mycluster/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: leader [mycluster@mycluster4 ~]$ zkServer.sh status JMX enabled by default Using config: /home/mycluster/zookeeper-3.4.6/bin/../conf/zoo.cfg Mode: follower  | 
3. 客户端测试。
| 
 [mycluster@mycluster1 ~]$ zkCli.sh -server mycluster1:2181 [zk: mycluster1:2181(CONNECTED) 0] ls / [zookeeper]  | 
7. 安装HBase(未实现)
三、 调优(进行中... ...)
1. Hadoop调优
a) HA & Federation
·HA:解决单点故障
·Federation:扩大集群容量和提高集群性能
本集群暂不考虑Federation,因为集群暂时不会达到非常大的规模。
HA配置:
| 
 vi hdfs-site.xml  | 
| 
 <!-- HA config --> <property> <name>dfs.nameservices</name> <value>mycluster</value> <description>提供服务的NS逻辑名称,与core-site.xml里的对应</description> </property> <property> <name>dfs.ha.namenodes.mycluster</name> <value>namenode1,redhat22688</value> <description>列出该逻辑名称下的NameNode逻辑名称</description> </property> <property> <name>dfs.namenode.rpc-address.mycluster.namenode1</name> <value>mycluster1:9000</value> <description>指定NameNode的RPC位置</description> </property> <property> <name>dfs.namenode.http-address.mycluster.namenode1</name> <value>mycluster1:50070</value> <description>指定NameNode的Web Server位置</description> </property> <property> <name>dfs.namenode.rpc-address.mycluster.redhat22688</name> <value>redhat22688:9000</value> <description>指定NameNode的RPC位置</description> </property> <property> <name>dfs.namenode.http-address.mycluster.redhat22688</name> <value>redhat22688:50070</value> <description>指定NameNode的Web Server位置</description> </property> <!-- HA,NameNode --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://mycluster3:8485;mycluster4:8485;mycluster5:8485/mycluster</value> </property> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/mycluster/data/haqjm/dfs_journalnode_edits_dir</value> </property> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> <description>指定HA做隔离的方法,缺省是ssh,可设为shell,稍后详述</description> </property> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/home/mycluster/.ssh/id_rsa</value> </property> <!-- HA,客户端 --> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property>  | 
| 
 vi core-site.xml  | 
| 
 <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property>  | 
| 
 # 启动对应机器上的JN(hdfs-site中配置的) [mycluster@mycluster3 ~]$ hadoop-2.5.0/sbin/hadoop-daemon.sh start journalnode [mycluster@mycluster4 ~]$ hadoop-2.5.0/sbin/hadoop-daemon.sh start journalnode [mycluster@mycluster5 ~]$ hadoop-2.5.0/sbin/hadoop-daemon.sh start journalnode  | 
| 
 # 格式化一个NN,并启动 [mycluster@mycluster1 ~]$ hadoop namenode -format [mycluster@mycluster1 ~]$ hadoop-daemon.sh start namenode  | 
| 
 # 格式化另一个NN,并启动 [mycluster@mycluster1 ~]$ scp -r data mycluster@redhat22688:/home/mycluster/ [mycluster@redhat22688 ~]$ hadoop namenode -bootstrapStandby [mycluster@redhat22688 ~]$ hadoop-daemon.sh start namenode  | 
| 
 # 这时候,使用浏览器访问http://116.228.171.104:50070/ 和 http://116.228.171.119:50070/ 。 # 如果能够看到两个页面,证明NameNode启动成功了。这时,两个NameNode的状态都是standby。 # 或者使用以下命令 [mycluster@mycluster1 ~]$ hdfs haadmin -getServiceState namenode1  | 
| 
 # 转化active [mycluster@mycluster1 ~]$ hdfs haadmin -transitionToActive namenode1  | 
| 
 # 启动所有DN [mycluster@mycluster1 ~]$ hadoop-daemons.sh start datanode  | 
启用故障自动恢复:
| 
 vi hdfs-site.xml  | 
| 
 <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> <description>或者false</description> </property>  | 
| 
 vi core-site.xml  | 
| 
 <property> <name>ha.zookeeper.quorum</name> <value>mycluster1:2181,mycluster3:2181,mycluster4:2181</value> <description>指定用于HA的ZooKeeper集群机器列表</description> </property> <property> <name>ha.zookeeper.session-timeout.ms</name> <value>5000</value> <description>指定ZooKeeper超时间隔,单位毫秒</description> </property>  | 
| 
 # 在其中一个NN上执行: [mycluster@mycluster1 ~]$ hdfs zkfc -formatZK  | 
四、 遇到的问题
1、参考文档
Hadoop :http://hadoop.apache.org/docs/r2.5.1/
Hive :http://hive.apache.org/
ZK :http://zookeeper.apache.org/
Sqoop :http://sqoop.apache.org/docs/1.4.5/index.html
2、Hadoop及各组件版本
3、SSH端口不是默认端口22
如果ssh端口不是默认的22,在etc/hadoop/hadoop-env.sh里改下。如:
| 
 export HADOOP_SSH_OPTS="-p 18921"  | 
4、不同节点SSH端口不一样
对于hadoop来说,SSH并非很重要的内容,hadoop中仅仅使用其启动/关闭集群,所以Hadoop目前不支持不同节点配置不同的ssh端口。
方案一:手动一个个节点启动,可以不用ssh
方案二:自己写ssh启动脚本
方案三:修改ssh配置
方案四:端口转发(这种做法还不如直接直接使用方案三)
5、Address 192.168.2.92 maps to mycluster1, but this does not map back to the address - POSSIBLE BREAK-IN ATTEMPT!
修改hosts文件, 使192.168.2.92与mycluster1能唯一对应起来。
6、WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
系统中的glibc的版本和libhadoop.so需要的版本不一致导致的:
| 
 [mycluster@mycluster1 ~]$ ls -l /lib/libc.so.* lrwxrwxrwx 1 root root 11 Apr 18 2012 /lib/libc.so.6 -> libc-2.5.so [mycluster@mycluster1 ~]$ file /lib/libc-2.5.so /lib/libc-2.5.so:ELF 32-bit LSBshared object, Intel 80386, version 1 (SYSV), for GNU/Linux 2.6.9, not stripped [mycluster@mycluster1 ~]$ file hadoop-2.5.0/lib/native/libhdfs.so.0.0.0 hadoop-2.5.0/lib/native/libhdfs.so.0.0.0:ELF 64-bit LSBshared object, AMD x86-64, version 1 (SYSV), not stripped  | 
解决方案:
1、重新编译hadoop
2、升级gcc
此警告影响的范围:
1、 压缩算法
7、执行MR程序时的通信失败一:MR_AM启动Task时网络失败
| 
 [mycluster@mycluster1 ~]$ hadoop-2.5.0/bin/hadoop jar hadoop-2.5.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar pi 2 2 Number of Maps = 2 Samples per Map = 2 14/09/19 16:47:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Wrote input for Map #0 Wrote input for Map #1 Starting Job 14/09/19 16:47:47 INFO client.RMProxy: Connecting to ResourceManager at mycluster1/192.168.2.92:8032 14/09/19 16:47:47 INFO input.FileInputFormat: Total input paths to process : 2 14/09/19 16:47:47 INFO mapreduce.JobSubmitter: number of splits:2 14/09/19 16:47:47 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1411112681877_0004 14/09/19 16:47:48 INFO impl.YarnClientImpl: Submitted application application_1411112681877_0004 14/09/19 16:47:48 INFO mapreduce.Job: The url to track the job: http://mycluster1:8088/proxy/application_1411112681877_0004/ 14/09/19 16:47:48 INFO mapreduce.Job: Running job: job_1411112681877_0004 14/09/19 16:48:09 INFO mapreduce.Job: Job job_1411112681877_0004 running in uber mode : false 14/09/19 16:48:09 INFO mapreduce.Job: map 0% reduce 0% #这里应该是MR_AM启动Task(详细信息查看日志) 14/09/19 16:48:09 INFO mapreduce.Job: Job job_1411112681877_0004 failed with state FAILED due to: Application application_1411112681877_0004 failed 2 times due to Error launching appattempt_1411112681877_0004_000002. Got exception: java.net.ConnectException: Call From mycluster1/192.168.2.92 to localhost:59163 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused ... 9 more . Failing the application. 14/09/19 16:48:09 INFO mapreduce.Job: Counters: 0 Job Finished in 22.193 seconds # Job异常退出,无结果文件,导致以下错误(这个无关紧要) java.io.FileNotFoundException: File does not exist: hdfs://192.168.2.92:9100/user/mycluster/QuasiMonteCarlo_1411116465638_1171059364/out/reduce-out  | 
解决方案:
注释掉hosts文件中,关于localhost的配置
8、MySQL驱动包版本
(参考:http://dev.mysql.com/doc/connector-j/en/connector-j-versions.html)
9、配置NFS
服务器端:
| 
 rpm -qa | grep nfs yum install nfs-utils rpcbind #非centos6可能不是这名字 mkdir /home/mycluster_nfs vi /etc/exports  | 
| 
 # 将NFS Server 的/home/mycluster_nfs/ 共享给192.168.2.88/92,权限读写。 /home/mycluster_nfs 192.168.2.88(rw) /home/mycluster_nfs 192.168.2.92(rw)  | 
| 
 service rpcbind start service nfs start exportfs showmount -e #默认查看自己共享的服务,前提是要DNS能解析自己,不然容易报错 showmount -a #显示已经与客户端连接上的目录信息 chmod 777 -R /home/mycluster_nfs/  | 
客户端:
| 
 showmount -e mycluster11 #查询NFS的共享状态 mkdir /home/mycluster_nfs mount mycluster11:/home/mycluster_nfs /home/mycluster_nfs  | 
10、zkService.sh status报错
报错信息:
| 
 [mycluster@mycluster4 ~]$ zkServer.sh status JMX enabled by default Using config: /home/mycluster/zookeeper-3.4.6/bin/../conf/zoo.cfg Error contacting service. It is probably not running.  | 
网上找到三种情况:
1. 没有装nc :yum install nc
2.修改zkService.sh
打开zkServer.sh,找到
STAT=`echo stat | nc localhost $(grep clientPort "$ZOOCFG" | sed -e 's/.*=//') 2> /dev/null| grep Mode`
这行,加上或去掉-q 1(数字1而非字母l) 即可。
3./etc/hosts里面没有配置localhost
11、编译Sqoop
Complit sqoop 1.4.5 for hadoop 2.5.0
| 
 --编译前准备:看了一下 README.txt文件,需要以下软件包:  | 
| 
 Additionally, building the documentation requires these tools: * asciidoc * make * python 2.5+ * xmlto * tar * gzip  | 
| 
 yum -y install ant yum -y install asciidoc yum -y install make yum -y install xmlto yum -y install tar yum -y install gzip -- python自己去安装 ----------------------------------------------------------------------------------------------------------------------------- --第一步:解压 sqoop-1.4.5.tar.gz 文件到 /opt/software目录下(在该目录下将生成 sqoop-1.4.5 文件夹) cd /opt/software tar -xvf sqoop-1.4.5.tar.gz ----------------------------------------------------------------------------------------------------------------------------- --第二步:cd 到 sqoop-1.4.5 文件夹, 修改build.xml文件中指定的hadoop版本为2.5.0 cd /opt/software/sqoop-1.4.5 vi build.xml  | 
| 
 <elseif> <equals arg1="${hadoopversion}" arg2="200" /> <then> <property name="hadoop.version" value="2.5.0" /> <property name="hbase.version" value="0.94.2" /> <property name="zookeeper.version" value="3.4.2" /> <property name="hadoop.version.full" value="2.5.0" /> <property name="hcatalog.version" value="0.13.1" /> </then> </elseif>  | 
| 
 --第三步:运行ant package [root@funshion-hadoop194 sqoop-1.4.5]# ant package ... [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS BUILD FAILED /opt/software/sqoop-1.4.5/build.xml:1282: impossible to resolve dependencies: resolve failed - see output for details Total time: 27 seconds [ivy:resolve] com.google.protobuf#protobuf-java;2.5.0 by [com.google.protobuf#protobuf-java;2.5.0] in [hadoop200] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | hadoop200 | 154 | 59 | 58 | 37 || 120 | 48 | --------------------------------------------------------------------- [ivy:resolve] [ivy:resolve] :: problems summary :: [ivy:resolve] :::: WARNINGS [ivy:resolve] [FAILED ] org.mortbay.jetty#jetty;6.1.26!jetty.zip: (0ms) [ivy:resolve] ==== fs: tried [ivy:resolve] /root/.m2/repository/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip [ivy:resolve] ==== apache-snapshot: tried [ivy:resolve] https://repository.apache.org/content/repositories/snapshots/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip [ivy:resolve] ==== datanucleus: tried [ivy:resolve] http://www.datanucleus.org/downloads/maven2/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip [ivy:resolve] ==== cloudera-releases: tried [ivy:resolve] https://repository.cloudera.com/content/repositories/releases/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip [ivy:resolve] ==== cloudera-staging: tried [ivy:resolve] https://repository.cloudera.com/content/repositories/staging/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip [ivy:resolve] ==== maven2: tried [ivy:resolve] http://repo1.maven.org/maven2/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.zip [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: FAILED DOWNLOADS :: [ivy:resolve] :: ^ see resolution messages for details ^ :: [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] :: org.mortbay.jetty#jetty;6.1.26!jetty.zip [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: [ivy:resolve] [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS [ivy:resolve] io.netty#netty;3.4.0.Final by [io.netty#netty;3.6.2.Final] in [hadoop200test] [ivy:resolve] asm#asm;[3.0, 4.0) by [asm#asm;3.1] in [hadoop200test] [ivy:resolve] asm#asm;3.1 by [asm#asm;3.2] in [hadoop200test] [ivy:resolve] com.google.protobuf#protobuf-java;2.5.0 by [com.google.protobuf#protobuf-java;2.5.0] in [hadoop200test] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | hadoop200test | 156 | 0 | 0 | 38 || 121 | 0 | --------------------------------------------------------------------- --错误1(如上)解决方法:单独下载 jetty-6.1.26.zip 文件到 /root/.m2/repository/org/mortbay/jetty/jetty/6.1.26/目录下,解决。 ------------------------------------------------------------------------------ [ivy:resolve] com.google.protobuf#protobuf-java;2.5.0 by [com.google.protobuf#protobuf-java;2.5.0] in [hadoop200test] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | hadoop200test | 156 | 2 | 2 | 38 || 121 | 2 | --------------------------------------------------------------------- ivy-retrieve-hadoop-test: [ivy:retrieve] :: retrieving :: com.cloudera.sqoop#sqoop [sync] [ivy:retrieve] confs: [hadoop200test] [ivy:retrieve] 121 artifacts copied, 0 already retrieved (113206kB/376ms) compile-test: [mkdir] Created dir: /opt/software/sqoop-1.4.5/build/test/classes [mkdir] Created dir: /opt/software/sqoop-1.4.5/build/test/extraconf [javac] Compiling 169 source files to /opt/software/sqoop-1.4.5/build/test/classes [javac] warning: [options] bootstrap class path not set in conjunction with -source 1.6 [javac] /opt/software/sqoop-1.4.5/src/test/org/apache/sqoop/TestExportUsingProcedure.java:244: error: method repeat in class StringUtils cannot be applied to given types; [javac] sql.append(StringUtils.repeat("?", ", ", [javac] ^ [javac] required: String,int [javac] found: String,String,int [javac] reason: actual and formal argument lists differ in length [javac] Note: Some input files use or override a deprecated API. [javac] Note: Recompile with -Xlint:deprecation for details. [javac] Note: Some input files use unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. [javac] 1 error [javac] 1 warning BUILD FAILED /opt/software/sqoop-1.4.5/build.xml:433: Compile failed; see the compiler error output for details. Total time: 15 minute 9 seconds --错误2(如上),解决方法: ------------- vi +244 /opt/software/sqoop-1.4.5/src/test/org/apache/sqoop/TestExportUsingProcedure.java sql.append(StringUtils.repeat("?", ", ", --将第244行修改为如下: sql.append(StringUtils.repeat("?,", --继续重新运行 ant package,最后我们将看到:BUILD SUCCESSFUL字样,表示编译成功。 ... --然后的/opt/software/sqoop-1.4.5/build目录下将生成 sqoop-1.4.5.bin__hadoop-2.5.0的文件夹,这就是我们的安装文件,将其压缩: cd /opt/software/sqoop-1.4.5/build tar -cvf sqoop-1.4.5.bin__hadoop-2.5.0.tar.gz ./sqoop-1.4.5.bin__hadoop-2.5.0 sqoop-1.4.5.bin__hadoop-2.5.0.tar.gz文件就是我们需要的sqoop安装包了。  | 
基于Hadoop2.5.0的集群搭建的更多相关文章
- 基于VMware的虚拟Linux集群搭建-lvs+keepalived
		
基于VMware的虚拟Linux集群搭建-lvs+keepalived 本文通过keepalived实现lvsserver的的双机热备和真实server之间的负载均衡.这方面的blog挺多,可是每一个 ...
 - Redis 5.0.5集群搭建
		
Redis 5.0.5集群搭建 一.概述 Redis3.0版本之后支持Cluster. 1.1.redis cluster的现状 目前redis支持的cluster特性: 1):节点自动发现 2):s ...
 - Hadoop2.0 HA集群搭建步骤
		
上一次搭建的Hadoop是一个伪分布式的,这次我们做一个用于个人的Hadoop集群(希望对大家搭建集群有所帮助): 集群节点分配: Park01 Zookeeper NameNode (active) ...
 - ubuntu18.04 基于Hadoop3.1.2集群的Hbase2.0.6集群搭建
		
前置条件: 之前已经搭好了带有HDFS, MapReduce,Yarn 的 Hadoop 集群 链接: ubuntu18.04.2 hadoop3.1.2+zookeeper3.5.5高可用完全分布式 ...
 - CDH 6.0.1 集群搭建 「Before install」
		
从这一篇文章开始会有三篇文章依次介绍集群搭建 「Before install」 「Process」 「After install」 继上一篇使用 docker 部署单机 CDH 的文章,当我们使用 d ...
 - CentOS7.2下Hadoop2.7.2的集群搭建
		
1.基本环境: 操作系统: Centos 7.2.1511 三台虚机: 192.168.163.224 master 192.168.163.225 node1 192.168.163.226 ...
 - java_redis3.0.3集群搭建
		
redis3.0版本之后支持Cluster,具体介绍redis集群我就不多说,了解请看redis中文简介. 首先,直接访问redis.io官网,下载redis.tar.gz,现在版本3.0.3,我下面 ...
 - Redis 3.0.2集群搭建以及相关问题汇总
		
Redis3 正式支持了 cluster,是为了解决构建redis集群时的诸多不便 (1)像操作单个redis一样操作key,不用操心key在哪个节点上(2)在线动态添加.删除redis节点,不用停止 ...
 - CDH 6.0.1 集群搭建 「After install」
		
集群搭建完成之后其实还有很多配置工作要做,这里我列举一些我去做的一些. 首先是去把 zk 的角色重新分配一下,不知道是不是我在配置的时候遗漏了什么在启动之后就有报警说目前只能检查到一个节点.去将 zk ...
 
随机推荐
- MySQL 5.7增强半同步测试
			
we've know the machenism of semi-synchronous replication in my previous article,let's do som ...
 - operator.itemgetter() 字典列表排序
			
## 字典列表排序 students = [ {"name": "Stanley", "age": 22, "score" ...
 - MySQL集群-PXC搭建以及使用innobackupex工具进行全局备份和增量备份
			
环境:centos7 vm1:10.154.47.236 vm2:10.154.52.189 vm3:10.105.12.50 目的:pxc使用三个节点构建mysql集群,使用innobackupex ...
 - 『Python基础-9』元祖  (tuple)
			
『Python基础-9』元祖 (tuple) 目录: 元祖的基本概念 创建元祖 将列表转化为元组 查询元组 更新元组 删除元组 1. 元祖的基本概念 元祖可以理解为,不可变的列表 元祖使用小括号括起所 ...
 - C语言,初次见面~
			
C语言是一门介于低级语言(如汇编语言)和高级语言(如Java,Python)之间的一门编程语言,所以它兼有两类语言的一些优点,并且具有自身的一些特点. 1.c语言的高效性.c语言具有通常是汇编语言才具 ...
 - Codeforces Round #490 (Div. 3) :F. Cards and Joy(组合背包)
			
题目连接:http://codeforces.com/contest/999/problem/F 解题心得: 题意说的很复杂,就是n个人玩游戏,每个人可以得到k张卡片,每个卡片上有一个数字,每个人有一 ...
 - [原创]记一次java执行段错误及解决过程
			
最近一周左右,网管监控系统经常监控到tomcat异常退出,由于有检测tomcat pid的脚本,所以会自动重启服务器,查询tomcat日志没有报错信息,查询系统message日志,可以看到如下信息. ...
 - Git使用列表(四)
			
最近,由于自己的一个项目,导致自己的关于自己的要使用Git的很多的命令,突然发现自己的git的还有许多不知道的东西 不过,在这个工作的过程中,也发现自己的一些很大的缺陷,就是自己题目理解力有限,明明就 ...
 - Bioinfomatics dataset
			
##Genomic sequence variation ###1000 Genomes Projecthttp://www.1000genomes.org/Data collection and a ...
 - 成都Uber优步司机奖励政策(3月17日)
			
滴快车单单2.5倍,注册地址:http://www.udache.com/ 如何注册Uber司机(全国版最新最详细注册流程)/月入2万/不用抢单:http://www.cnblogs.com/mfry ...