大数据学习(11)—— Hive元数据服务模式搭建
这一篇介绍Hive的安装及操作。版本是Hive3.1.2。
调整部署节点
在Hadoop篇里,我用了5台虚拟机来搭建集群,但是我的电脑只有8G内存,虚拟机启动之后卡到没法操作,把自己坑惨了。
Hive的运行是基于Hadoop集群的,为了操作的流畅性,我把Hadoop集群全部重新搭建了,一共只有3台虚拟机。
主机 | NN | RM | ZKFC | DN | NM | JN | ZK | MySQL | Hive服务端 | Hive客户端 |
server01 | • | • | • | • | • | • | • | |||
server02 | • | • | • | • | • | • | • | • | ||
server03 | • | • | • | • | • | • |
安装mysql
一般我们使用关系型数据库来存储Hive的元数据,mysql用的比较多。
先检查一下是否有mysql的yum源,注意我们要安装的是mysql-server。下图中没有这个源,需要下载。
[root@server03 ~]# yum list|grep mysql
akonadi-mysql.x86_64 1.9.2-4.el7 base
apr-util-mysql.x86_64 1.5.2-6.el7 base
dovecot-mysql.x86_64 1:2.2.36-6.el7_8.1 updates
freeradius-mysql.x86_64 3.0.13-10.el7_6 base
libdbi-dbd-mysql.x86_64 0.8.3-16.el7 base
mysql-connector-java.noarch 1:5.1.25-3.el7 base
mysql-connector-odbc.x86_64 5.2.5-8.el7 base
pcp-pmda-mysql.x86_64 4.3.2-7.el7_8 updates
php-mysql.x86_64 5.4.16-48.el7 base
php-mysqlnd.x86_64 5.4.16-48.el7 base
qt-mysql.i686 1:4.8.7-8.el7 base
qt-mysql.x86_64 1:4.8.7-8.el7 base
qt5-qtbase-mysql.i686 5.9.7-2.el7 base
qt5-qtbase-mysql.x86_64 5.9.7-2.el7 base
redland-mysql.x86_64 1.0.16-6.el7 base
rsyslog-mysql.x86_64 8.24.0-52.el7_8.2 updates
下载mysql的repo源
[root@server03 ~]# wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
--2020-10-12 22:36:02-- http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
正在解析主机 repo.mysql.com (repo.mysql.com)... 23.45.57.22
正在连接 repo.mysql.com (repo.mysql.com)|23.45.57.22|:80... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:6140 (6.0K) [application/x-redhat-package-manager]
正在保存至: “mysql-community-release-el7-5.noarch.rpm” 100%[======================================================================================>] 6,140 --.-K/s 用时 0.002s 2020-10-12 22:36:02 (3.80 MB/s) - 已保存 “mysql-community-release-el7-5.noarch.rpm” [6140/6140])
安装mysql-community-release-el7-5.noarch.rpm包
[root@server03 home]# rpm -ivh mysql-community-release-el7-5.noarch.rpm
准备中... ################################# [100%]
正在升级/安装...
1:mysql-community-release-el7-5 ################################# [100%]
安装mysql
[root@server03 home]# yum -y install mysql-server
已加载插件:fastestmirror
Loading mirror speeds from cached hostfile
* base: mirrors.ustc.edu.cn
* extras: mirrors.cn99.com
* updates: mirrors.njupt.edu.cn
正在解决依赖关系
--> 正在检查事务
---> 软件包 mysql-community-server.x86_64.0.5.6.49-2.el7 将被 安装
--> 正在处理依赖关系 mysql-community-common(x86-64) = 5.6.49-2.el7,它被软件包 mysql-community-server-5.6.49-2.el7.x86_64 需要
...........................................................
perl-Scalar-List-Utils.x86_64 0:1.27-248.el7 perl-Socket.x86_64 0:2.010-5.el7
perl-Storable.x86_64 0:2.45-3.el7 perl-Text-ParseWords.noarch 0:3.29-4.el7
perl-Time-HiRes.x86_64 4:1.9725-3.el7 perl-Time-Local.noarch 0:1.2300-2.el7
perl-constant.noarch 0:1.27-2.el7 perl-libs.x86_64 4:5.16.3-295.el7
perl-macros.x86_64 4:5.16.3-295.el7 perl-parent.noarch 1:0.225-244.el7
perl-podlators.noarch 0:2.5.1-3.el7 perl-threads.x86_64 0:1.87-4.el7
perl-threads-shared.x86_64 0:1.43-6.el7 替代:
mariadb-libs.x86_64 1:5.5.64-1.el7 完毕!
启动mysql
[root@server03 home]# service mysqld start
Redirecting to /bin/systemctl start mysqld.service
需要更改权限才能实现远程连接MYSQL数据库。可以通过以下方式来修改权限。
[root@server03 ~]# mysql -uroot
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 5.6.49 MySQL Community Server (GPL) Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use mysql;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A Database changed
mysql> select host,user,password from user;
+-----------+------+-------------------------------------------+
| host | user | password |
+-----------+------+-------------------------------------------+
| localhost | root | |
| server03 | root | |
| 127.0.0.1 | root | |
| ::1 | root | |
| localhost | | |
| server03 | | |
+-----------+------+-------------------------------------------+
6 rows in set (0.00 sec) mysql> grant all privileges on *.* to 'root'@'%' identified by '111111' with grant option;
Query OK, 0 rows affected (0.00 sec) mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec) mysql> select host,user,password from user;
+-----------+------+-------------------------------------------+
| host | user | password |
+-----------+------+-------------------------------------------+
| localhost | root | |
| server03 | root | |
| 127.0.0.1 | root | |
| ::1 | root | |
| localhost | | |
| server03 | | |
| % | root | *A46E551C358E21DEFC306B31BB84ADBFC2A75AAB |
+-----------+------+-------------------------------------------+
7 rows in set (0.00 sec)
至此,mysql配置完毕。
安装Hive服务端
这里采用最常用的远程元数据模式来安装Hive。server03上安装Hive服务端,server02上安装Hive客户端。
在server03上解压hive压缩包到/usr目录下,并将用户和组修改为hadoop。将conf目录下的hive-default.xml.template复制一份,命名为hive-site.xml。注意,这个名字必须是hive-site.xml,跟hdfs类似。
[hadoop@server03 conf]$ pwd
/usr/apache-hive-3.1.2/conf
[hadoop@server03 conf]$ cp hive-default.xml.template hive-site.xml
修改hive-site.xml文件中的配置如下
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/opt/hive_remote/warehouse</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://server03:3306/hive_remote?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>111111</value>
</property>
</configuration>
第一个配置项是hdfs里的路径,我们不需要创建这个目录,元数据初始化的时候会自动在hdfs创建/opt/hive_remote/warehouse
配置hive环境变量
JAVA_HOME=/usr/java/jdk-11.0.8/
ZOOKEEPER_HOME=/usr/apache-zookeeper-3.5.8/
HADOOP_HOME=/usr/hadoop-3.3.0
HIVE_HOME=/usr/apache-hive-3.1.2
PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HIVE_HOME/bin
在网上下载mysql驱动包mysql-connector-java-bin.jar,放在hive的lib目录下。在server03上执行元数据初始化操作。
[hadoop@server03 ~]$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Exception in thread "main" java.lang.NoSuchMethodError: com.google.common.base.Preconditions.checkArgument(ZLjava/lang/String;Ljava/lang/Object;)V
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1380)
at org.apache.hadoop.conf.Configuration.set(Configuration.java:1361)
at org.apache.hadoop.mapred.JobConf.setJar(JobConf.java:536)
at org.apache.hadoop.mapred.JobConf.setJarByClass(JobConf.java:554)
at org.apache.hadoop.mapred.JobConf.<init>(JobConf.java:448)
at org.apache.hadoop.hive.conf.HiveConf.initialize(HiveConf.java:5141)
at org.apache.hadoop.hive.conf.HiveConf.<init>(HiveConf.java:5104)
at org.apache.hive.beeline.HiveSchemaTool.<init>(HiveSchemaTool.java:96)
at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:1473)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
这里报错了。原因是hive里的guava.jar与hadoop里的guava.jar版本不一致。需要把hive里的低版本删除,再把hadoop里的高版本拷贝过来。
[hadoop@server03 ~]$ ll /usr/hadoop-3.3.0/share/hadoop/common/lib |grep guava
-rw-r--r--. 1 hadoop hadoop 2747878 7月 7 02:47 guava-27.0-jre.jar
-rw-r--r--. 1 hadoop hadoop 2199 7月 7 02:47 listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
[hadoop@server03 ~]$ ll /usr/apache-hive-3.1.2/lib/ |grep guava
-rw-r--r--. 1 hadoop hadoop 2308517 9月 27 2018 guava-19.0.jar
-rw-r--r--. 1 hadoop hadoop 971309 5月 21 2019 jersey-guava-2.25.1.jar
[hadoop@server03 ~]$ rm -f /usr/apache-hive-3.1.2/lib/guava-19.0.jar
[hadoop@server03 ~]$ cp /usr/hadoop-3.3.0/share/hadoop/common/lib/guava-27.0-jre.jar /usr/apache-hive-3.1.2/lib/
[hadoop@server03 ~]$ ll /usr/apache-hive-3.1.2/lib/ |grep guava
-rw-r--r--. 1 hadoop hadoop 2747878 10月 19 00:15 guava-27.0-jre.jar
-rw-r--r--. 1 hadoop hadoop 971309 5月 21 2019 jersey-guava-2.25.1.jar
再次执行初始化脚本
[hadoop@server03 ~]$ schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://server03:3306/hive_remote?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: root
Starting metastore schema initialization to 3.1.0
Initialization script hive-schema-3.1.0.mysql.sql Initialization script completed
schemaTool completed
Hive服务端就搭建成功了,下面启动服务。
[hadoop@server03 ~]$ hive --service metastore
2020-10-21 08:14:44: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
这是一个阻塞式窗口,再开一个窗口看看进程。为了看得更清楚,先不启动Hadoop集群。
[hadoop@server03 ~]$ jps
3910 Jps
3656 RunJar
RunJar就是Hive的服务端进程。
安装Hive客户端
在server02上安装Hive客户端。客户端做的事情与服务端差不多,只是不需要执行元数据初始化的动作。修改hive-site.xml配置如下:
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive_remote/warehouse</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://node03:9083</value>
</property>
执行hive,进入到hive的cli窗口
[hadoop@server02 ~]$ hive
which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/java/jdk-11.0.8/bin:/usr/apache-zookeeper-3.5.8/bin:/usr/hadoop-3.3.0/bin:/usr/hadoop-3.3.0/sbin:/usr/apache-hive-3.1.2/bin:/home/hadoop/.local/bin:/home/hadoop/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = 52f8c152-cd38-4293-b2b6-3efc17546299
Exception in thread "main" java.lang.ClassCastException: class jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to class java.net.URLClassLoader (jdk.internal.loader.ClassLoaders$AppClassLoader and java.net.URLClassLoader are in module java.base of loader 'bootstrap')
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:413)
at org.apache.hadoop.hive.ql.session.SessionState.<init>(SessionState.java:389)
at org.apache.hadoop.hive.cli.CliSessionState.<init>(CliSessionState.java:60)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
执行报错了。到网上查这个错误信息,得出Hive3.1.2只支持jdk8。之前搭建Hadoop集群用的jdk11,哦豁,这下悲剧了。
没办法,只能再装一个jdk8了。
切换Java版本
[root@server01 home]# rpm -ivh jdk-8u271-linux-x64.rpm
警告:jdk-8u271-linux-x64.rpm: 头V3 RSA/SHA256 Signature, 密钥 ID ec551f03: NOKEY
准备中... ################################# [100%]
正在升级/安装...
1:jdk1.8-2000:1.8.0_271-fcs ################################# [100%]
Unpacking JAR files...
tools.jar...
plugin.jar...
javaws.jar...
deploy.jar...
rt.jar...
jsse.jar...
charsets.jar...
localedata.jar...
安装完毕后,看看java目录,现在有两个版本的java
[root@server01 java]# cd /usr/java
[root@server01 java]# ll
总用量 0
lrwxrwxrwx. 1 root root 16 9月 23 18:21 default -> /usr/java/latest
drwxr-xr-x. 9 root root 128 9月 23 18:21 jdk-11.0.8
drwxr-xr-x. 9 root root 286 10月 23 13:04 jdk1.8.0_271-amd64
lrwxrwxrwx. 1 root root 20 9月 23 18:21 latest -> /usr/java/jdk-11.0.8
先把目录名字修改一下,太长了不方便
[root@server01 java]# mv jdk1.8.0_271-amd64/ jdk1.8.0
[root@server01 java]# ll
总用量 0
lrwxrwxrwx. 1 root root 16 9月 23 18:21 default -> /usr/java/latest
drwxr-xr-x. 9 root root 128 9月 23 18:21 jdk-11.0.8
drwxr-xr-x. 9 root root 286 10月 23 13:04 jdk1.8.0
lrwxrwxrwx. 1 root root 20 9月 23 18:21 latest -> /usr/java/jdk-11.0.8
接着修改环境变量/etc/profile和hadoop-env.sh里的配置,执行source /etc/profile让配置生效
#JAVA_HOME=/usr/java/jdk-11.0.8
JAVA_HOME=/usr/java/jdk1.8.0
再切换java版本
[root@server01 java]# alternatives --install /usr/bin/java java /usr/java/jdk1.8.0/bin/java 1400
[root@server01 java]# alternatives --config java 共有 3 个提供“java”的程序。 选项 命令
-----------------------------------------------
*+ 1 /usr/java/jdk-11.0.8/bin/java
2 /usr/java/jdk1.8.0_271-amd64/bin/java
3 /usr/java/jdk1.8.0/bin/java 按 Enter 保留当前选项[+],或者键入选项编号:3
第2个版本是安装rpm包的时候自动生成的,我们的路径修改之后应该把它删掉
[root@server01 java]# alternatives --remove java /usr/java/jdk1.8.0_271-amd64/bin/java
[root@server01 java]# alternatives --config java 共有 2 个提供“java”的程序。 选项 命令
-----------------------------------------------
* 1 /usr/java/jdk-11.0.8/bin/java
+ 2 /usr/java/jdk1.8.0/bin/java 按 Enter 保留当前选项[+],或者键入选项编号:2
这个时候就只有2条记录了。我们来检查一下版本切换生效没有。
[root@server01 java]# java -version
java version "1.8.0_271"
Java(TM) SE Runtime Environment (build 1.8.0_271-b09)
Java HotSpot(TM) 64-Bit Server VM (build 25.271-b09, mixed mode)
[root@server01 java]# which java
/usr/bin/java
[root@server01 java]# ll /usr/bin/java
lrwxrwxrwx. 1 root root 22 10月 23 13:11 /usr/bin/java -> /etc/alternatives/java
[root@server01 java]# ll /etc/alternatives/java
lrwxrwxrwx. 1 root root 27 10月 23 13:11 /etc/alternatives/java -> /usr/java/jdk1.8.0/bin/java
完美完成Java版本切换。接下来我们就能继续Hive环境搭建了。
启动服务
启动zookeeper集群和hadoop集群就不用说了,轻车熟路。
在server03上启动hive元数据服务,再次提示,这是个阻塞式窗口,不用等它退出到命令行界面
[hadoop@server03 bin]$ hive --service metastore
2020-10-23 15:48:01: Starting Hive Metastore Server
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
在server02上启动hive客户端,出现hive提示符就可以尽情玩耍了。
[hadoop@server02 ~]$ hive
which: no hbase in (/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/usr/java/jdk1.8.0/bin:/usr/apache-zookeeper-3.5.8/bin:/usr/hadoop-3.3.0/bin:/usr/hadoop-3.3.0/sbin:/usr/apache-hive-3.1.2/bin:/home/hadoop/.local/bin:/home/hadoop/bin)
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/apache-hive-3.1.2/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hadoop-3.3.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Hive Session ID = d252045d-6dc8-41ee-9808-c324521a880f Logging initialized using configuration in jar:file:/usr/apache-hive-3.1.2/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Hive Session ID = c39b1a09-a43a-4ccb-bcf9-9608969cbd12
hive>
基本操作实战
数据库操作,简直跟mysql一模一样
hive> show databases;
OK
default
Time taken: 4.862 seconds, Fetched: 1 row(s)
hive> create database test;
OK
Time taken: 4.09 seconds
hive> show databases;
OK
default
test
Time taken: 0.099 seconds, Fetched: 2 row(s)
hive> use test;
OK
Time taken: 0.138 seconds
hive>
这里我们再试验一下Hadoop里的例子,数据稍微做一下调整,存放在/opt/hive_remote/warehouse/data1中。
气温文件,年-月-日-温度
1949-10-01-34
1949-10-01-38
1949-10-02-36
1950-01-01-32
1950-10-01-37
1951-12-01-23
1950-10-02-41
1950-10-03-27
1951-07-01-45
1951-07-02-46
1951-07-03-47
在hive里创建一张表来保存数据。
hive> create table hive1(
> year string,
> month string,
> day string,
> temperature smallint
> )
> row format delimited
> fields terminated by '-';
OK
Time taken: 3.818 seconds
把数据导入表里
hive> load data local inpath '/opt/hive_remote/warehouse/data1'into table hive1;
Loading data to table test.hive1
OK
Time taken: 15.852 seconds
hive> select * from hive1;
OK
1949 10 01 34
1949 10 01 38
1949 10 02 36
1950 01 01 32
1950 10 01 37
1951 12 01 23
1950 10 02 41
1950 10 03 27
1951 07 01 45
1951 07 02 46
1951 07 03 47
Time taken: 12.108 seconds, Fetched: 11 row(s)
好了,数据准备完毕。还是那个需求,从中找出每个月温度最高的两天所对应的气温。先建一张中间表,把一天内有多条记录的去掉,只保留最高温。
hive> create table hive1_mid(
> year string,
> month string,
> day string,
> temperature smallint
> );
OK
Time taken: 1.627 seconds
hive> insert into hive1_mid select year,month,day,max(temperature) from hive1 group by year,month,day;
Query ID = hadoop_20201101093442_c624ad0c-4b55-406d-b49e-e8aa0315e7a2
Total jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1604193604905_0002, Tracking URL = http://server01:8088/proxy/application_1604193604905_0002/
Kill Command = /usr/hadoop-3.3.0/bin/mapred job -kill job_1604193604905_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2020-11-01 09:35:35,506 Stage-1 map = 0%, reduce = 0%
2020-11-01 09:36:41,001 Stage-1 map = 0%, reduce = 0%
2020-11-01 09:36:50,906 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec
2020-11-01 09:37:52,195 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.59 sec
2020-11-01 09:38:00,629 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 4.87 sec
2020-11-01 09:38:05,778 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.72 sec
MapReduce Total cumulative CPU time: 5 seconds 720 msec
Ended Job = job_1604193604905_0002
Loading data to table test.hive1_mid
Launching Job 2 out of 2
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1604193604905_0003, Tracking URL = http://server01:8088/proxy/application_1604193604905_0003/
Kill Command = /usr/hadoop-3.3.0/bin/mapred job -kill job_1604193604905_0003
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2020-11-01 09:39:00,431 Stage-3 map = 0%, reduce = 0%
2020-11-01 09:40:08,674 Stage-3 map = 0%, reduce = 0%
2020-11-01 09:40:17,031 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.67 sec
2020-11-01 09:40:52,565 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 4.37 sec
MapReduce Total cumulative CPU time: 4 seconds 370 msec
Ended Job = job_1604193604905_0003
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.72 sec HDFS Read: 16922 HDFS Write: 502 SUCCESS
Stage-Stage-3: Map: 1 Reduce: 1 Cumulative CPU: 4.37 sec HDFS Read: 12824 HDFS Write: 327 SUCCESS
Total MapReduce CPU Time Spent: 10 seconds 90 msec
OK
Time taken: 400.594 seconds
hive> select * from hive1_mid;
OK
1949 10 01 38
1949 10 02 36
1950 01 01 32
1950 10 01 37
1950 10 02 41
1950 10 03 27
1951 07 01 45
1951 07 02 46
1951 07 03 47
1951 12 01 23
Time taken: 7.142 seconds, Fetched: 10 row(s)
接下来的任务就简单了,从中间表中按照年月分组,获取每组记录里温度最高的两条。
hive> select a.year,a.month,a.day,a.temperature from hive1_mid a
> left join hive1_mid b
> on a.year=b.year and a.month=b.month and a.temperature < b.temperature
> group by a.year,a.month,a.day,a.temperature
> having count(*) < 2;
Query ID = hadoop_20201102233727_a7798ec5-55f2-4f81-a081-1473cf34ebca
Total jobs = 1
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1604330185721_0003, Tracking URL = http://server01:8088/proxy/application_1604330185721_0003/
Kill Command = /usr/hadoop-3.3.0/bin/mapred job -kill job_1604330185721_0003
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2020-11-02 23:39:00,286 Stage-2 map = 0%, reduce = 0%
2020-11-02 23:39:50,116 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.54 sec
2020-11-02 23:40:24,842 Stage-2 map = 100%, reduce = 67%, Cumulative CPU 2.54 sec
2020-11-02 23:40:40,775 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 6.31 sec
MapReduce Total cumulative CPU time: 6 seconds 310 msec
Ended Job = job_1604330185721_0003
MapReduce Jobs Launched:
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 6.31 sec HDFS Read: 14363 HDFS Write: 295 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 310 msec
OK
1949 10 01 38
1949 10 02 36
1950 01 01 32
1950 10 01 37
1950 10 02 41
1951 07 02 46
1951 07 03 47
1951 12 01 23
Time taken: 204.409 seconds, Fetched: 8 row(s)
上面的操作步骤跟hadoop高可用环境搭建里的map/reduce代码比起来是不是简洁多了?hive帮我们把sql语句转换成了对应的map/reduce任务,而不需要我们操心,对使用者来讲体验好多了。
大数据学习(11)—— Hive元数据服务模式搭建的更多相关文章
- 大数据学习(16)—— HBase环境搭建和基本操作
部署规划 HBase全称叫Hadoop Database,它的数据存储在HDFS上.我们的实验环境依然基于上个主题Hive的配置,参考大数据学习(11)-- Hive元数据服务模式搭建. 在此基础上, ...
- 大数据学习系列之六 ----- Hadoop+Spark环境搭建
引言 在上一篇中 大数据学习系列之五 ----- Hive整合HBase图文详解 : http://www.panchengming.com/2017/12/18/pancm62/ 中使用Hive整合 ...
- 大数据学习路线之linux系统基础搭建
学习大数据是必须掌握一定Linux知识的,工欲善其事,必先利其器.在学习之前,首先需要搭建Linux系统,本节将讲解VMware Workstation的安装和CentOS 7系统的安装. 1.2.1 ...
- 大数据学习(19)—— Flume环境搭建
系统要求 Java1.8或以上 内存要足够大 硬盘足够大 Agent对源和目的要有读写权限 Flume部署 我这8G内存的电脑之前搭建Hadoop.Hive和HBase已经苟延残喘了,怀疑会卡死,硬着 ...
- 大数据学习(25)—— 用IDEA搭建Spark开发环境
IDEA是一个优秀的Java IDE工具,它同样支持其他语言.Spark是用Scala语言编写的,用Scala开发Spark是最舒畅的.当然,Spark也提供Java和Python的API. Java ...
- 大数据学习——关于hive中的各种join
准备数据 2,b 3,c 4,d 7,y 8,u 2,bb 3,cc 7,yy 9,pp 建表: create table a(id int,name string) row format delim ...
- 大数据学习笔记——Hive完整部署流程
Hive详细部署教程 此篇博客承接上篇Hadoop和Zookeeper的部署教程,将会详细地对HIve的部署做一个整理,Hive相当于是封装在HDFS和Mapreduce上的一套sql引擎,只需要安装 ...
- 大数据学习系列之二 ----- HBase环境搭建(单机)
引言 在上一篇中搭建了Hadoop的单机环境,这一篇则搭建HBase的单机环境 环境准备 1,服务器选择 阿里云服务器:入门型(按量付费) 操作系统:linux CentOS 6.8 Cpu:1核 内 ...
- [大数据学习研究] 3. hadoop分布式环境搭建
1. Java安装与环境配置 Hadoop是基于Java的,所以首先需要安装配置好java环境.从官网下载JDK,我用的是1.8版本. 在Mac下可以在终端下使用scp命令远程拷贝到虚拟机linux中 ...
随机推荐
- SQL 利用存储过程实现对表数据有则更新无则添加(转)
初学存储过程,发现这篇文章简单易懂,特意转载,地址 http://blog.csdn.net/luotuomianyang/article/details/52013144 如果某一操作包含大量的T- ...
- 关于MySql数据库误操作数据找回的办法
先讲个事,前段时间,系统长时间不用的一个功能被开放出来了,想当然的我没有在测试平台上测试,直接操作了正式系统(的确是我不严谨),导致好多数据异常,页面展示错乱了.于是我想到的第一个就是进行备份还原.项 ...
- 『无为则无心』Python基础 — 7、Python的变量
目录 1.变量的定义 2.Python变量说明 3.Python中定义变量 (1)定义语法 (2)标识符定义规则 (3)内置关键字 (4)标识符命名习惯 4.使用变量 1.变量的定义 程序中,数据都是 ...
- java入门了解、安装jdk及软件的选择
学习编程,一些必要的dos命令还是需要掌握的. 以下只是列出常用的: cd 目录路径: 进入一个目录 cd .. 进入父目录 dir 查看本目录下的文件和子目录列表 cls 清除屏幕命令 上下键 ...
- 12.5finally子句
要点提示:无论异常是否产生,finally子句总是会执行的. 有时候无论异常是否出现或者是否被捕获,都希望执行某些代码.java有一个finally子句,可以用来达到这个目的. 注意:使用finall ...
- Linux创建ftp并设置权限以及忘记ftp帐号(密码)修改 (转)
忘记ftp密码修改方法:1.登录服务器 cd /etc/vsftpdcat ftpusers找到对应的ftp用户名 (如果用户名也忘记了 那么 cd /etc 然后cat passwd 查看用户 ...
- Apache Superset 1.2.0教程 (二)——快速入门(可视化王者英雄数据)
上一篇我们已经成功的安装了superset,那么该如何可视化我们的数据呢?本文将可视化王者英雄的数据,快速的入门Superset. 一.连接数据源 首先确保mysql可以正常连接使用,并且准备好数据. ...
- QT单进程下载
QT 同步下载 #include <QNetworkAccessManager> #include <QNetworkRequest> #include <QNet ...
- 让5G技术“智慧”生活
1.通讯技术的发展历程 2.5G技术的指标和具体概述 3. 5G的三个关键技术及概述 4.5G的应用场景及业务及安全挑战 如果你认为5G带来的只是下载视频 ...
- HCNA Routing&Switching之动态路由协议OSPF建立邻居的条件
前文我们了解了OSPF的router id.数据包结构.类型.不同类型的数据包作用以及OSPF状态机制,回顾请参考https://www.cnblogs.com/qiuhom-1874/p/15027 ...