Hive 基本语法操练（二）：视图和索引操作

1. 视图操作

-------

1) 创建一个测试表。

```

hive> create table test(id int,name string);

OK

Time taken: 0.385 seconds

hive> desc test;

OK

id                      int

name                    string

Time taken: 0.261 seconds, Fetched: 2 row(s)

```

2) 基于表 test 创建一个 test_view 视图。

```

hive> create view test_view(id,name_length) as select id,length(name) from test;

```

3) 查看 test_view 视图属性。

```

hive> desc test_view;

OK

id                  	int

name_length         	int

Time taken: 0.071 seconds, Fetched: 2 row(s)

```

4) 查看视图结果。

```

hive> select * from test_view;

```

2. 索引操作

-------

1) Hive 创建索引。

```

hive> create table user like group_test;

OK

Time taken: 0.232 seconds

hive> create index user_index on table user(uid) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' with deferred rebuild IN TABLE user_index_table;

OK

Time taken: 0.183 seconds

```

2) 更新数据。

```

hive> alter index user_index on user rebuild;

Query ID = hadoop_20180518043232_ebdf97bd-5984-4310-a3c8-6befef328133

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

  set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

  set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

  set mapreduce.job.reduces=<number>

Starting Job = job_1526553207632_0018, Tracking URL = http://master:8088/proxy/application_1526553207632_0018/

Kill Command = /opt/modules/hadoop-2.6.0/bin/hadoop job  -kill job_1526553207632_0018

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2018-05-18 04:32:55,632 Stage-1 map = 0%,  reduce = 0%

2018-05-18 04:33:04,400 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.65 sec

2018-05-18 04:33:12,406 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.93 sec

MapReduce Total cumulative CPU time: 2 seconds 930 msec

Ended Job = job_1526553207632_0018

Loading data to table default.user_index_table

Table default.user_index_table stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.93 sec   HDFS Read: 290 HDFS Write: 50 SUCCESS

Total MapReduce CPU Time Spent: 2 seconds 930 msec

OK

Time taken: 25.609 seconds

```

3) 查看索引

```

hive> show index on user;

OK

user_index          	user                	uid                 	user_index_table    	compact

Time taken: 0.046 seconds, Fetched: 1 row(s)

```

4) 删除索引

```

hive> drop index user_index on user;                                                                                                                         OK

Time taken: 0.094 seconds

hive> show index on user;

OK

Time taken: 0.036 seconds

```

5) 创建表和索引案例

```

hive> create table index_test(id INT,name STRING) PARTITIONED BY (dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

```

创建一个索引测试表 index_test，dt作为分区属性，“ROW FORMAT DELIMITED FIELDS TERMINATED BY ','” 表示用逗号分割字符串。

6) 创建一个临时索引表 index_tmp。

```

hive> create table index_tmp(id INT,name STRING,dt STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

```

7) 加载本地数据到 index_tmp 表中。

```

[hadoop@master test]$ sudo vim test.txt

02,female,192.168.1.3

01,male,192.168.1.26

03,male,192.168.1.5

08,female,192.168.1.62

04,male,192.168.1.9

hive> load data local inpath '/home/hadoop/test/test.txt' into table index_tmp;

Loading data to table default.index_tmp

Table default.index_tmp stats: [numFiles=1, totalSize=106]

OK

Time taken: 0.224 seconds

hive> select * from index_tmp;

OK

2	female	192.168.1.3

1	male	192.168.1.26

3	male	192.168.1.5

8	female	192.168.1.62

4	male	192.168.1.9

```

设置 Hive 的索引属性来优化索引查询，命令如下。

```

hive> set hive.exec.dynamic.partition.mode=nonstrict;----设置所有列为 dynamic partition

hive> set hive.exec.dynamic.partition=true;----使用动态分区

```

8) 查询index_tmp 表中的数据，插入 table_test 表中。

```

hive> insert overwrite table index_test partition(dt) select id,name,dt from index_tmp;

Query ID = hadoop_20180518044343_97e7fe67-a5a1-408b-be8e-e9dadb2f9e48

Total jobs = 3

Launching Job 1 out of 3

Number of reduce tasks is set to 0 since there's no reduce operator

Starting Job = job_1526553207632_0019, Tracking URL = http://master:8088/proxy/application_1526553207632_0019/

Kill Command = /opt/modules/hadoop-2.6.0/bin/hadoop job  -kill job_1526553207632_0019

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0

2018-05-18 04:43:42,621 Stage-1 map = 0%,  reduce = 0%

2018-05-18 04:43:48,835 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.87 sec

MapReduce Total cumulative CPU time: 870 msec

Ended Job = job_1526553207632_0019

Stage-4 is selected by condition resolver.

Stage-3 is filtered out by condition resolver.

Stage-5 is filtered out by condition resolver.

Moving data to: hdfs://ns/tmp/hive/hadoop/9f7dd0d3-a14c-4535-9291-557b9cb6259b/hive_2018-05-18_04-43-36_337_559705388802402645-1/-ext-10000

Loading data to table default.index_test partition (dt=null)

	 Time taken for load dynamic partitions : 278

	Loading partition {dt=192.168.1.62}

	Loading partition {dt=192.168.1.3}

	Loading partition {dt=192.168.1.5}

	Loading partition {dt=192.168.1.26}

	Loading partition {dt=192.168.1.9}

	 Time taken for adding to write entity : 0

Partition default.index_test{dt=192.168.1.26} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]

Partition default.index_test{dt=192.168.1.3} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]

Partition default.index_test{dt=192.168.1.5} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]

Partition default.index_test{dt=192.168.1.62} stats: [numFiles=1, numRows=1, totalSize=9, rawDataSize=8]

Partition default.index_test{dt=192.168.1.9} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1   Cumulative CPU: 0.87 sec   HDFS Read: 308 HDFS Write: 338 SUCCESS

Total MapReduce CPU Time Spent: 870 msec

OK

Time taken: 15.225 seconds

```

9) 使用 index_test 表，在属性 id 上创建一个索引 index1_index_test 。

```

hive> create index index1_index_test on table index_test(id) as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;

OK

Time taken: 0.109 seconds

```

10) 填充索引数据。

```

hive> alter index index1_index_test on index_test rebuild;

Query ID = hadoop_20180518044545_edc98c3b-03eb-42c0-83d1-ba266d8497e0

Total jobs = 5

（省略MapReduce过程）

Ended Job = job_1526553207632_0024

Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.26)

Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.3)

Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.5)

Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.62)

Loading data to table default.default__index_test_index1_index_test__ partition (dt=192.168.1.9)

Partition default.default__index_test_index1_index_test__{dt=192.168.1.26} stats: [numFiles=1, numRows=0, totalSize=70, rawDataSize=0]

Partition default.default__index_test_index1_index_test__{dt=192.168.1.3} stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]

Partition default.default__index_test_index1_index_test__{dt=192.168.1.5} stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]

Partition default.default__index_test_index1_index_test__{dt=192.168.1.62} stats: [numFiles=1, numRows=0, totalSize=70, rawDataSize=0]

Partition default.default__index_test_index1_index_test__{dt=192.168.1.9} stats: [numFiles=1, numRows=0, totalSize=69, rawDataSize=0]

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.66 sec   HDFS Read: 226 HDFS Write: 191 SUCCESS

Stage-Stage-5: Map: 1  Reduce: 1   Cumulative CPU: 1.94 sec   HDFS Read: 227 HDFS Write: 187 SUCCESS

Stage-Stage-9: Map: 1  Reduce: 1   Cumulative CPU: 2.15 sec   HDFS Read: 225 HDFS Write: 187 SUCCESS

Stage-Stage-13: Map: 1  Reduce: 1   Cumulative CPU: 1.9 sec   HDFS Read: 228 HDFS Write: 191 SUCCESS

Stage-Stage-17: Map: 1  Reduce: 1   Cumulative CPU: 2.73 sec   HDFS Read: 225 HDFS Write: 187 SUCCESS

Total MapReduce CPU Time Spent: 11 seconds 380 msec

OK

Time taken: 196.607 seconds

```

11) 查看创建的索引。

```

hive> show index on index_test

```

12) 查看分区信息。

```

hive> show partitions index_test;

```

13) 查看索引数据。

```

[hadoop@master hadoop-2.6.0]$ hadoop fs -ls /user/hive/warehouse/default__index_test_index1_index_test__

18/05/18 04:52:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Found 5 items

drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:46 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.26

drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:46 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.3

drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:46 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.5

drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:48 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.62

drwxr-xr-x   - hadoop supergroup          0 2018-05-18 04:48 /user/hive/warehouse/default__index_test_index1_index_test__/dt=192.168.1.9

```

14) 删除索引。

```

hive> drop index index1_index_test on index_test;

OK

Time taken: 4.842 seconds

hive> show index on index_test;

OK

Time taken: 0.031 seconds

```

15) 索引数据也被删除。

```

[hadoop@master hadoop-2.6.0]$ hadoop fs -ls /user/hive/warehouse/default__index_test_index1_index_test__

18/05/18 04:53:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

ls: `/user/hive/warehouse/default__index_test_index1_index_test__': No such file or directory

```

&16) 修改配置文件信息。

```

[hadoop@master hadoop-2.6.0]$ cd /opt/modules/hive1.0.0/conf/

[hadoop@master conf]$ sudo vim hive-site.xml（/+字符串快速查找）

<property>

	<name>hive.optimize.index.filter</name>

	<value>true</value>

</property>

<property>

	<name>hive.optimize.index.groupby</name>

	<value>true< /value>

</property>

<property>

	<name>hive.optimize.index.filter.compact.minsize</name>

	<value>5120</value>

</property>

```

hive.optimize.index.filter 和 hive.optimize.index.groupby 参数默认是 false。使用索引的时候必须把这两个参数开启，才能起到作用。

hive.optimize.index.filter.compact.minsize 参数为输入一个紧凑的索引将被自动采用最小尺寸、默认5368709120（以字节为单位）。

以上就是博主为大家介绍的这一板块的主要内容，这都是博主自己的学习过程，希望能给大家带来一定的指导作用，有用的还望大家点个支持，如果对你没用也望包涵，有错误烦请指出。如有期待可关注博主以第一时间获取更新哦，谢谢！　

 版权声明：本文为博主原创文章，未经博主允许不得转载。

Hive 基本语法操练（二）：视图和索引操作的更多相关文章

Hive基本语法操练
建表规则如下: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment ...
Hive 基本语法操练（三）：分区操作和桶操作
(一)分区操作 Hive 的分区通过在创建表时启动 PARTITION BY 实现,用来分区的维度并不是实际数据的某一列,具体分区的标志是由插入内容时给定的.当要查询某一分区的内容时可以采用 WHER ...
Hive 基本语法操练（六）：Hive 的权限控制
Hive 的权限控制 Hive从0.10可以通过元数据控制权限.但是Hive的权限控制并不是完全安全的.基本的授权方案的目的是防止用户不小心做了不合适的事情. 为了使用Hive的授权机制,有两个参数必 ...
Hive 基本语法操练（五）：Hive 的 JOIN 用法
Hive 的 JOIN 用法 hive只支持等连接,外连接,左半连接.hive不支持非相等的join条件(通过其他方式实现,如left outer join),因为它很难在map/reduce中实现这 ...
Hive 基本语法操练（四）：Hive 复合类型
hive语法中主要提供了以下复合数据类型: 1)Structs: structs内部的数据可以通过DOT(.)来存取.例如,表中一列c的类型为STRUCT{a INT; b INT},我们可以通过c. ...
Hive 基本语法操练（一）：表操作
Hive 和 Mysql 的表操作语句类似,如果熟悉 Mysql,学习Hive 的表操作就非常容易了,下面对 Hive 的表操作进行深入讲解. **(1)先来创建一个表名为student的内部表** ...
Elasticsearch必知必会的干货知识二：ES索引操作技巧
该系列上一篇文章<Elasticsearch必知必会的干货知识一:ES索引文档的CRUD> 讲了如何进行index的增删改查,本篇则侧重讲解说明如何对index进行创建.更改.迁移.查询配 ...
Hive SQL 语法学习与实践
Hive 介绍 Hive 是基于Hadoop 构建的一套数据仓库分析系统,它提供了丰富的SQL查询方式来分析存储在Hadoop 分布式文件系统中的数据,可以将结构化的数据文件映射为一张数据库表,并提供 ...
008-elasticsearch5.4.3【二】ES使用、ES客户端、索引操作【增加、删除】、文档操作【crud】
一.ES使用,以及客户端 1.pom引用 <dependency> <groupId>org.elasticsearch.client</groupId> < ...

随机推荐

Poj 2503 Babelfish(Map操作)
一.Description You have just moved from Waterloo to a big city. The people here speak an incomprehens ...
二 lambda表达式
1:lambda写的好可以极大的减少代码冗余,同时可读性也好过冗长的内部类,匿名类. 2: lambda表达式配合Java8新特性Stream API可以将业务功能通过函数式编程简洁的实现. 3: l ...
phpstorm断点调试 php.ini 文件中 Xdebug 配置
[XDebug]xdebug.profiler_output_dir="D:\phpStudy\tmp\xdebug"xdebug.trace_output_dir="D ...
关于 sklearn.decomposition.KernelPCA的简单介绍
from sklearn import decomposition import numpy as np A1_mean = [1, 1] A1_cov = [[2, .99], [1, 1]] A1 ...
C笔试题（二）
/* 现在有一个数组我们可以定义数组的子数组如数组 1 3 4 2 5 8 7 它的子数组可以是 1 3 4 3 4 2 5 等等请写一个算法找一个子数组这个子数组递增不减少并且是满足递 ...
VC 绘图，使用双缓冲技术实现
VC 绘图,使用双缓冲技术实现 - Cloud-Datacenter-Renewable Energy-Big Data-Model - 博客频道 - CSDN.NET VC 绘图,使用双缓冲技术实现 ...
python+selenium简单实现拖动元素实例
from selenium import webdriver#引入ActionChains类from selenium.webdriver.common.action_chains impo ...
CSS 布局_如何实现容器中每一行的子容器数量随着浏览器宽度的变化而变化？
实现一个浮动布局,红色容器中每一行的蓝色容器数量随着浏览器宽度的变化而变化,就如下图: 要实现这样一个布局,我们首先需要如下的 HTML: <div id="float-contain ...
Angular08 依赖注入
1 angular应用中依赖注入的工作原理技巧01:在模块级别进行注册时所有在应用级别的组件都可以使用,因为主模块会导入其他模块,所以在模块中注入就相当于在主模块进行注入操作:懒加载的模块除外技巧 ...
[原创]Javascript 利用mousetrap.js进行键盘事件操作
我们日常开发中,会遇到js的键盘操作,例如回车提交表单之类的.或者按下某个键执行某个方法.无意中发现一个大小不到4K的js文件,它非常方便的操作键盘事件. 自己也尝试了一下:具体代码如下: 详情可以去 ...

Hive 基本语法操练（二）：视图和索引操作

Hive 基本语法操练（二）：视图和索引操作的更多相关文章

随机推荐

热门专题