015-HQL中级5-hive创建索引

索引是hive0.7之后才有的功能，创建索引需要评估其合理性，因为创建索引也是要磁盘空间，维护起来也是需要代价的

创建索引

hive> create index [index_studentid] on table student_3(studentid)

> as 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'

> with deferred rebuild

> IN TABLE index_table_student_3;

OK

Time taken: 12.219 seconds

hive>

org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler ：创建索引需要的实现类

index_studentid:索引名称

student_3:表名

index_table_student_3:创建索引后的表名

查看索引表（index_table_student_3）没有数据

hive> select*from index_table_student_3;

OK

Time taken: 0.295 seconds

加载索引数据

hive> alter index index_studentid on student_3 rebuild;

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. tez, spark) or using Hive 1.X releases.

Query ID = root_20161226235345_5b3fcc2b-7f90-4b10-861f-31cbaed8eb73

Total jobs = 1

Launching Job 1 out of 1

Number of reduce tasks not specified. Estimated from input data size: 1

In order to change the average load for a reducer (in bytes):

set hive.exec.reducers.bytes.per.reducer=<number>

In order to limit the maximum number of reducers:

set hive.exec.reducers.max=<number>

In order to set a constant number of reducers:

set mapreduce.job.reduces=<number>

Starting Job = job_1482824475750_0001, Tracking URL = http://hadoop-node4.com:8088/proxy/application_1482824475750_0001/

Kill Command = /usr/local/development/hadoop-2.6.4/bin/hadoop job -kill job_1482824475750_0001

Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1

2016-12-26 23:55:40,317 Stage-1 map = 0%, reduce = 0%

2016-12-26 23:56:40,757 Stage-1 map = 0%, reduce = 0%

2016-12-26 23:56:48,768 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.08 sec

2016-12-26 23:57:34,981 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 3.66 sec

2016-12-26 23:57:40,716 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.68 sec

MapReduce Total cumulative CPU time: 4 seconds 680 msec

Ended Job = job_1482824475750_0001

Loading data to table default.index_table_student_3

MapReduce Jobs Launched:

Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.68 sec HDFS Read: 10282 HDFS Write: 537 SUCCESS

Total MapReduce CPU Time Spent: 4 seconds 680 msec

OK

Time taken: 280.693 seconds

查询索引表中数据

hive> select*from index_table_student_3;

OK

1 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [0]

2 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [28]

3 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [56]

4 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [85]

5 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [113]

6 hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt [143]

Time taken: 2.055 seconds, Fetched: 6 row(s)

hive>

查看hdfs://hadoop-node4.com:8020/opt/hive/warehouse/student_3/sutdent.txt

[root@node4 node4]# hdfs dfs -text /opt/hive/warehouse/student_3/sutdent.txt;

001 0 BeiJing xinlang@.com

002 1 ShangHaixinlang@.com

003 0 ShegZhen xinlang@.com

004 1 NanJing xinlang@.com

005 0 GuangDong xinlang@.com

006 1 HaiNan xinlang@.com[root@node4 node4]#

删除索引

DROP INDEX index_studentid on student_3;

查看索引

hive> SHOW INDEX on student_3;

OK

index_studentid         student_3               studentid               index_table_student_3    compact

Time taken: 0.487 seconds, Fetched: 1 row(s)

hive>

015-HQL中级5-hive创建索引的更多相关文章

hive创建索引
索引是hive0.7之后才有的功能,创建索引需要评估其合理性,因为创建索引也是要磁盘空间,维护起来也是需要代价的创建索引 hive> create index [index_studentid ...
Hadoop Hive概念学习系列之hive的索引及案例（八）
hive里的索引是什么? 索引是标准的数据库技术,hive 0.7版本之后支持索引.Hive提供有限的索引功能,这不像传统的关系型数据库那样有“键(key)”的概念,用户可以在某些列上创建索引来加速某 ...
hive：创建索引
hive也是支持索引的使用,但是如果表中已经有数据的情况下,创建索引的过程不是特别快. 已经拥有表: create table if not exists llcfpd_withgroupbykey( ...
SQL语句-创建索引
语法:CREATE [索引类型] INDEX 索引名称ON 表名(列名)WITH FILLFACTOR = 填充因子值0~100 GO USE 库名GO IF EXISTS (SELECT * FRO ...
*使用while循环遍历数组创建索引和自增索引值
package com.chongrui.test;/* *使用while循环遍历数组 * * * */public class test { public static void main ...
程序员眼中的 SQL Server－执行计划教会我如何创建索引？
先说点废话以前有 DBA 在身边的时候,从来不曾考虑过数据库性能的问题,但是,当一个应用程序从头到脚都由自己完成,而且数据库面对的是接近百万的数据,看着一个页面加载速度像乌龟一样,自己心里真是有种挫 ...
SQL Server创建索引(转)
什么是索引拿汉语字典的目录页(索引)打比方:正如汉语字典中的汉字按页存放一样,SQL Server中的数据记录也是按页存放的,每页容量一般为4K .为了加快查找的速度,汉语字(词)典一般都有按拼音. ...
MongoDB性能篇之创建索引，组合索引，唯一索引，删除索引和explain执行计划
这篇文章主要介绍了MongoDB性能篇之创建索引,组合索引,唯一索引,删除索引和explain执行计划的相关资料,需要的朋友可以参考下一.索引 MongoDB 提供了多样性的索引支持,索引信息被保存 ...
mysql 创建索引和删除索引
索引的创建可以在CREATE TABLE语句中进行,也可以单独用CREATE INDEX或ALTER TABLE来给表增加索引.删除索引可以利用ALTER TABLE或DROP INDEX语句来实现. ...

随机推荐

getopt--parse command line options
getopt解析命令行选项 getopt, getopt_long, getopt_long_only, optarg, optind, opterr, optopt - Parse command- ...
Hdu 2236 无题II 最大匹配+二分
题目链接: pid=2236">Hdu 2236 解题思路: 将行和列理解为二分图两边的端点,给出的矩阵即为二分图中的全部边, 假设二分图能全然匹配,则说明不同行不同列的n个元素 ...
大数据(13) - Spark的安装部署与简单使用
一 .Spark概述官网:http://spark.apache.org 1. 什么是spark Spark是一种快速.通用.可扩展的大数据分析引擎,2009年诞生于加州大学伯克利分校 ...
SSIS 自测题-数据流控件类
说明:以下是自己的理解答案,不是标准的答案,如有不妥烦请指出. 有些题目暂时没有答案,有知道的请留言,互相学习,一起进步. 133.请描述一下 Conditional Split 的使 ...
hive组件和执行过程
转自http://blog.csdn.net/lifuxiangcaohui/article/details/40262021 对Hive的基本组成进行了总结: 1.组件: 元存储(Metastore ...
Python之两个列表一起打乱
例子: import random c = list(zip(a, b)) random.shuffle(c) a[:], b[:] = zip(*c)
Spring Cloud对于中小型互联网公司来说是一种福音
Spring Cloud对于中小型互联网公司来说是一种福音,因为这类公司往往没有实力或者没有足够的资金投入去开发自己的分布式系统基础设施,使用Spring Cloud一站式解决方案能在从容应对业务发展 ...
c++开发之对应Linux下的sem_t和lock
http://www.cnblogs.com/P_Chou/archive/2012/07/13/semaphore-and-mutex-in-thread-sync.html http://blog ...
类库服务寄宿到WebHost
1.该Demo中包含一个类库项目.一个空的WebForm项目 2.新建WebForm项目 3.全局路由中注册类库服务 public class Global : System.Web.HttpAppl ...
LeetCode-Lowest Common Ancestor of a Binary Tre
Given a binary tree, find the lowest common ancestor (LCA) of two given nodes in the tree. According ...

015-HQL中级5-hive创建索引

015-HQL中级5-hive创建索引的更多相关文章

随机推荐

热门专题