hive + hadoop 环境搭建

机器规划：

主机	ip	进程
hadoop1	10.183.225.158	hive server
hadoop2	10.183.225.166	hive client

前置条建：

kerberos部署：http://www.cnblogs.com/kisf/p/7473193.html

Hadoop HA + kerberos部署：http://www.cnblogs.com/kisf/p/7477440.html

mysql安装：略

添加hive用户名，及数据库。mysql -uhive -h10.112.28.179 -phive123456

hive使用2.3.0版本：

wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.0/apache-hive-2.3.0-bin.tar.gz

添加环境变量：

export HIVE_HOME=/letv/soft/apache-hive-2.3.0-bin

export HIVE_CONF_DIR=$HIVE_HOME/conf

export PATH=\$PATH:\$HIVE_HOME/bin

同步至master2，并 source /etc/profile

解压：　　

tar zxvf apache-hive-2.3.0-bin.tar.gz

kerberos生成keytab:

addprinc -randkey hive/hadoop1@JENKIN.COM

addprinc -randkey hive/hadoop2@JENKIN.COM

xst -k /var/kerberos/krb5kdc/keytab/hive.keytab hive/hadoop1@JENKIN.COM

xst -k /var/kerberos/krb5kdc/keytab/hive.keytab hive/hadoop2@JENKIN.COM

拷贝至hadoop2

scp /var/kerberos/krb5kdc/keytab/hive.keytab hadoop1:/var/kerberos/krb5kdc/keytab/

scp /var/kerberos/krb5kdc/keytab/hive.keytab hadoop2:/var/kerberos/krb5kdc/keytab/

（使用需要kinit）　　

hive server 配置：

hive server hive-env.sh增加：　　

HADOOP_HOME=/xxx/soft/hadoop-2.7.3

export HIVE_CONF_DIR=/xxx/soft/apache-hive-2.3.0-bin/conf

export HIVE_AUX_JARS_PATH=/xxx/soft/apache-hive-2.3.0-bin/lib

hive server上增加hive-site.xml：

<configuration>

    <property>

           <name>hive.metastore.schema.verification</name>

           <value>false</value>

           <description>

              Enforce metastore schema version consistency.

                  True: Verify that version information stored in metastore matches with one from Hive jars.  Also disable automatic

                        schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures

                        proper metastore schema migration. (Default)

                  False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.

            </description>

    </property>

    <property>

            <name>hive.metastore.warehouse.dir</name>

            <value>/user/hive/warehouse</value>

            <description>location of default database for the warehouse</description>

    </property>

    <property>

            <name>hive.querylog.location</name>

            <value>/xxx/soft/apache-hive-2.3.0-bin/log</value>

            <description>Location of Hive run time structured log file</description>

    </property>

    <property>

            <name>hive.downloaded.resources.dir</name>

            <value>/xxx/soft/apache-hive-2.3.0-bin/tmp</value>

            <description>Temporary local directory for added resources in the remote file system.</description>

    </property>

    <property>

            <name>javax.jdo.option.ConnectionURL</name>

            <value>jdbc:mysql://10.112.28.179:3306/hive?createDatabaseIfNotExist=true&iuseUnicode=true&characterEncoding=utf-8&useSSL=false</value<configuration>

            <description>JDBC connect string for a JDBC metastore</description>

    </property>

    <property>

            <name>javax.jdo.option.ConnectionDriverName</name>

            <value>com.mysql.jdbc.Driver</value>

            <description>Driver class name for a JDBC metastore</description>

    </property>

    <property>

            <name>javax.jdo.option.ConnectionUserName</name>

            <value>hive</value>

            <description>username to use against metastore database</description>

    </property>

    <property>

            <name>javax.jdo.option.ConnectionPassword</name>

            <value>hive123456</value>

            <description>password to use against metastore database</description>

    </property>

<!-- kerberos config -->

    <property>

        <name>hive.server2.authentication</name>

        <value>KERBEROS</value>

    </property>

    <property>

        <name>hive.server2.authentication.kerberos.principal</name>

        <value>hive/_HOST@JENKIN.COM</value>

    </property>

    <property>

        <name>hive.server2.authentication.kerberos.keytab</name>

        <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value>

        <!-- value>/xxx/soft/apache-hive-2.3.0-bin/conf/keytab/hive.keytab</value -->

    </property>

    <property>

        <name>hive.metastore.sasl.enabled</name>

        <value>true</value>

    </property>

    <property>

        <name>hive.metastore.kerberos.keytab.file</name>

        <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value>

    </property>

    <property>

        <name>hive.metastore.kerberos.principal</name>

        <value>hive/_HOST@JENKIN.COM</value>

    </property>

hadoop namenode core-site.xml增加配置：

<!-- hive congfig  -->

        <property>

                <name>hadoop.proxyuser.hive.hosts</name>

                <value>*</value>

        </property>

        <property>

                <name>hadoop.proxyuser.hive.groups</name>

                <value>*</value>

        </property>

        <property>

                <name>hadoop.proxyuser.hdfs.hosts</name>

                <value>*</value>

        </property>

        <property>

                <name>hadoop.proxyuser.hdfs.groups</name>

                <value>*</value>

        </property>

        <property>

                <name>hadoop.proxyuser.HTTP.hosts</name>

                <value>*</value>

        </property>

        <property>

                <name>hadoop.proxyuser.HTTP.groups</name>

                <value>*</value>

         </property>

　　同步是其他机器。

scp etc/hadoop/core-site.xml master2:/xxx/soft/hadoop-2.7.3/etc/hadoop/

scp etc/hadoop/core-site.xml slave2:/xxx/soft/hadoop-2.7.3/etc/hadoop/

JDBC下载：

wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.44.tar.gz

tar zxvf mysql-connector-java-5.1.44.tar.gz

复制到hive lib目录：

cp mysql-connector-java-5.1.44/mysql-connector-java-5.1.44-bin.jar apache-hive-2.3.0-bin/lib/

客户端配置：

将hive拷贝至hadoop2

scp -r apache-hive-2.3.0-bin/ hadoop2:/xxx/soft/

在hadoop2上(client)：

hive-site.xml

<configuration>

    <property>

        <name>hive.metastore.uris</name>

        <value>thrift://hadoop1:9083</value>

    </property>

    <property>

         <name>hive.metastore.local</name>

         <value>false</value>

    </property>

    <!-- kerberos config -->

    <property>

        <name>hive.server2.authentication</name>

        <value>KERBEROS</value>

    </property>

    <property>

        <name>hive.server2.authentication.kerberos.principal</name>

        <value>hive/_HOST@JENKIN.COM</value>

    </property>

    <property>

        <name>hive.server2.authentication.kerberos.keytab</name>

        <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value>

        <!-- value>/xxx/soft/apache-hive-2.3.0-bin/conf/keytab/hive.keytab</value -->

    </property>

    <property>

        <name>hive.metastore.sasl.enabled</name>

        <value>true</value>

    </property>

    <property>

        <name>hive.metastore.kerberos.keytab.file</name>

        <value>/var/kerberos/krb5kdc/keytab/hive.keytab</value>

    </property>

    <property>

        <name>hive.metastore.kerberos.principal</name>

        <value>hive/_HOST@JENKIN.COM</value>

    </property>

</configuration>

启动hive：

初始化数据：

./bin/schematool -dbType mysql -initSchema

获取票据：

kinit -k -t /var/kerberos/krb5kdc/keytab/hive.keytab hive/hadoop1@JENKIN.COM

启动server：

hive --service metastore &

验证：

[root@hadoop1 conf]# netstat -nl | grep 9083

tcp        0      0 0.0.0.0:9083                0.0.0.0:*                   LISTEN

ps -ef | grep metastore

hive

hive>

启动thrift （hive server）

hive --service hiveserver2 &

验证thrift（hive server是否启动）　

[root@hadoop1 conf]# netstat -nl | grep 10000

tcp        0      0 0.0.0.0:10000               0.0.0.0:*                   LISTEN

hive客户端hql操作：

DDL参考：https://cwiki.apache.org//confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create/Drop/Alter/UseDatabase

DML参考：https://cwiki.apache.org//confluence/display/Hive/LanguageManual+DML

通过hive建的database，tables，在hdfs 上都能看到。参考hive-site.xml location配置。

hadoop fs -ls /usr/hive/warehouse

beeline客户端连接hive:

beeline -u "jdbc:hive2://hadoop1:10000/;principal=hive/_HOST@JENKIN.COM"

执行sql:

0: jdbc:hive2://hadoop1:10000/> show databases;

+----------------+

| database_name  |

+----------------+

| default        |

| hivetest       |

+----------------+

2 rows selected (0.318 seconds)

hive> create database jenkintest;

OK

Time taken: 0.968 seconds

hive> show databases;

OK

default

hivetest

jenkintest

Time taken: 0.033 seconds, Fetched: 3 row(s)

hive> use jenkintest

    > ;

OK

Time taken: 0.108 seconds

hive> create table test1(columna int, columnb string);

OK

Time taken: 0.646 seconds

hive> show tables;

OK

test1

Time taken: 0.084 seconds, Fetched: 1 row(s)

　hive数据导入：（通过文件导入，在本地建立文件，列按“table”键分开）　

[root@hadoop2 ~]# vim jenkindb.txt

1       jenkin

2       jenkin.k

3       anne

[root@hadoop2 ~]#hive

hive> create table jenkintb (id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' STORED AS TEXTFILE;

hive> load data local inpath 'jenkindb.txt' into table jenkintb;

hive> select * from jenkintb;

OK

1       jenkin

2       jenkin.k

3       anne

show create table jenkintb;