Hive_Hive的数据模型

体系结构：元数据 /HQL的执行
安装：嵌入 /远程 /本地
管理： CLI /web界面 /远程服务
数据类型：基本 /复杂 /时间
数据模型：数据存储 /内部表 /分区表 /外部表 /桶表 /视图


=============================================================================================

Hive的数据模型_数据存储

web管理工具察看HDFS文件系统：http://<IP>:50070/

基于HDFS

没有专门的数据存储格式,默认使用制表符

存储结构主要包括：数据库，文件，表，视图

可以直接加载文本文件

创建表时，可以指定Hive数据的列分隔符和行分隔符。

Hive数据模型

表：

-Table内部表

-Partition分区表

-External Table 外部表

-Bucket Table 桶表

视图：

=============================================================================================

Hive的数据模型_内部表

- 与数据库中的Table在概念上是类似。

- 每一个Table在Hive中都有一个相应的目录存储数据。

- 所有的Table数据(不包括External Table)都保存在这个目录中。

create table t1

(tid int, tname string, age int);

create table t2

(tid int, tname string, age int)

location '/mytable/hive/t2'

create table t3

(tid int, tname string, age int)

row format delimited fields terminated by ',';

create table t4

as

select * from t1;

hdfs dfs -cat /usr/hive/warehouse/tablename/000000_0

alter table t1 add columns(english int);

desc t1;

drop table t1;

if open the recycle bin function of hdfs . we can see the file not delete, but move from one dir to another dir, we can restore it.

=============================================================================================

Hive的数据模型_分区表

准备数据表：

create table sampledata

(sid int, sname string, gender string, language int, math int, english int)

row format delimited fields terminated by ',' stored as textfile;

准备文本数据：

sampledata.txt

1,Tom,M,60,80,96

2,Mary,F,11,22,33

3,Jerry,M,90,11,23

4,Rose,M,78,77,76

5,Mike,F,99,98,98

将文本数据插入到数据表：

hive> load data local inpath '/root/pl62716/hive/sampledata.txt' into table sampledata;

-partition对应于数据库中的Partition 列的密集索引

-在Hive中，表中的一个Partition对应于表下的一个目录，所有的Partition的数据都存储在对应的目录中。

创建分区表：

create table partition_table

(sid int, sname string)

partitioned by (gender string)

row format delimited fields terminated by ',';

向分区表中插入数据：

hive> insert into table partition_table partition(gender='M') select sid, sname from sampledata where gender='M';

hive> insert into table partition_table partition(gender='F') select sid, sname from sampledata where gender='F';

从内部表解析比从分区表解析效率低：

内部表：

hive> explain select * from sampledata where gender='M';

OK

STAGE DEPENDENCIES:

  Stage-0 is a root stage

STAGE PLANS:

  Stage: Stage-0

    Fetch Operator

      limit: -1

      Processor Tree:

        TableScan

          alias: sampledata

          Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE

          Filter Operator

            predicate: (gender = 'M') (type: boolean)

            Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE

            Select Operator

              expressions: sid (type: int), sname (type: string), 'M' (type: string), language (type: int), math (type: int), english (type: int)

              outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5

              Statistics: Num rows: 1 Data size: 90 Basic stats: COMPLETE Column stats: NONE

              ListSink

Time taken: 0.046 seconds, Fetched: 20 row(s)

分区表：

hive> explain select * from partition_table where gender='M';

OK

STAGE DEPENDENCIES:

  Stage-0 is a root stage

STAGE PLANS:

  Stage: Stage-0

    Fetch Operator

      limit: -1

      Processor Tree:

        TableScan

          alias: partition_table

          Statistics: Num rows: 2 Data size: 13 Basic stats: COMPLETE Column stats: NONE

          Select Operator

            expressions: sid (type: int), sname (type: string), 'M' (type: string)

            outputColumnNames: _col0, _col1, _col2

            Statistics: Num rows: 2 Data size: 13 Basic stats: COMPLETE Column stats: NONE

            ListSink

Time taken: 0.187 seconds, Fetched: 17 row(s)

=============================================================================================

Hive的数据模型_外部表

外部表(External Table)

-指向已经在HDFS中存在的数据，可以创建Partition

-它和内部表在元数据的组织上是相同的，而实际数据的存储则有较大的差异。

-外部表侄有一个过程，加载数据和创建表同时完成，并不会移动到数据仓库目录中，只是与外部数据建立一个链接。当删除一个外部表时，仅删除该链接。

1、准备几张相同数据结构的数据txt文件，放在HDFS的/input 目录下。

2、在hive下创建一张有相同数据结构的外部表external_student，location设置为HDFS的/input 目录。则external_student会自动关连/input 下的文件。

3、查询外部表。

4、删除/input目录下的部分文件。

5、查询外部表。删除的那部分文件数据不存在。

6、将删除的文件放入/input目录。

7、查询外部表。放入的那部分文件数据重现。

(1)准备数据：

student1.txt

1,Tom,M,60,80,96

2,Mary,F,11,22,33

student2.txt

3,Jerry,M,90,11,23

student3.txt

4,Rose,M,78,77,76

5,Mike,F,99,98,98

# hdfs dfs -ls /

# hdfs dfs -mkdir /input

将文件放入HDFS文件系统

hdfs dfs -put localFileName hdfsFileDir

# hdfs dfs -put student1.txt /input

# hdfs dfs -put student2.txt /input

# hdfs dfs -put student3.txt /input

(2)创建外部表

create table external_student

(sid int, sname string, gender string, language int, math int, english int)

row format delimited fields terminated by ','

location '/input';

(3)查询外部表

select * from external_student;

(4)删除HDFS上的student1.txt

# hdfs dfs -rm /input/student1.txt

(5)查询外部表

select * from external_student;

(6)将student1.txt 重新放入HDFS input目录下

# hdfs dfs -put student1.txt /input

(7)查询外部表

select * from external_student;

=============================================================================================

Hive的数据模型_桶表

对数据进行HASH运算，放在不同文件中，降低热块，提高查询速度。

例如：根据sname进行hash运算存入5个桶中。

create table bucket_table

(sid int, sname string, age int)

clustered by (sname) into 5 buckets;

=============================================================================================

Hive的数据模型_视图

-视图是一种虚表，是一个逻辑概念；可以跨越多张表

-视图建立在已有表的基础上，视图赖以建立的这些表称为基表。

-视图可以简化复杂的查询。

创建视图

create view viewName

as

select data from table where condition;

查看视图结构

desc viewName;

查询视图

select * from viewName;

Hive_Hive的数据模型_汇总的更多相关文章

Hive_Hive的数据模型_内部表
Hive的数据模型_内部表 - 与数据库中的Table在概念上是类似.- 每一个Table在Hive中都有一个相应的目录存储数据.- 所有的Table数据(不包括External Table)都保存在 ...
Hive_Hive的数据模型_数据存储
Hive的数据模型_数据存储 web管理工具察看HDFS文件系统:http://<IP>:50070/ 基于HDFS没有专门的数据存储格式,默认使用制表符存储结构主要包括:数据库,文件,表 ...
Hive_Hive的数据模型_外部表
Hive的数据模型之外部表外部表(External Table)- 指向已经在HDFS中存在的数据,可以创建Partition- 它和内部表在元数据的组织上是相同的,而实际数据的存储则有较大的差异. ...
Hive_Hive的数据模型_分区表
Hive的数据模型之分区表准备数据表: create table sampledata (sid int, sname string, gender string, language int, ma ...
Hive_Hive的数据模型_视图
- 视图是一种虚表,是一个逻辑概念:可以跨越多张表- 视图建立在已有表的基础上,视图赖以建立的这些表称为基表.- 视图可以简化复杂的查询. 创建视图 create view viewName as s ...
Hive_Hive的数据模型_桶表
对数据进行HASH运算,放在不同文件中,降低热块,提高查询速度. 例如:根据sname进行hash运算存入5个桶中. create table bucket_table(sid int, sname ...
python学习两月总结_汇总大牛们的思想_值得收藏
下面是我汇总的我学习两个月python(version:3.3.2)的所有笔记你可以访问:http://www.python.org获取更多信息你也可以访问:http://www.cnblogs. ...
Vivado_MicroBlaze_问题及解决方法_汇总（不定时更新）
Vivado_MicroBlaze_问题及解决方法_汇总(不定时更新) 标签: Vivado 2015-07-03 14:35 4453人阅读评论(0) 收藏举报分类: 硬件(14) 版权声 ...
TypeScript完全解读(26课时)_汇总贴
ECMAScript 6 入门:http://es6.ruanyifeng.com/ 官网:http://www.typescriptlang.org/ 中文网:https://www.tslang. ...

随机推荐

RQNOJ 202 奥运火炬登珠峰：01背包
题目链接:https://www.rqnoj.cn/problem/202 题意: 登珠峰需要携带a(L)O2和t(L)N2. 有n个气缸可供选择.其中第i个气缸能装下a[i](L)O2和t[i](L ...
[原创]Java动态生成word文档（图文并茂）
很多情况下,软件开发者需要从数据库读取数据,然后将数据动态填充到手工预先准备好的Word模板文档里,这对于大批量生成拥有相同格式排版的正式文件非常有用,这个功能应用PageOffice的基本动态填充功 ...
C语言实现wc基本功能
GitHub地址:https://github.com/hhx007/wc 项目要求 wc.exe 是一个常见的工具,它能统计文本文件的字符数.单词数和行数. 这个项目要求写一个命令行程序,模仿已有w ...
mac下配置java运行环境
1. oracle官网下载java se jdk地址 http://www.oracle.com/technetwork/java/javase/downloads/jdk9-downloads-38 ...
linux进程学习-进程描述符的存储
当进程被新建时,内核会给进程分配一个8K的空间作为进程的内核堆栈.同时我们知道task_struct结构体也会被创建,但有意思的是,内核不会给task_struct单独分别空间,而是直接将其扔到8k的 ...
Win 10 无法打开内核设备“\\.\Global\vmx86”
Win 10操作系统, VMWareWorkstation10 无法打开内核设备“\\.\Global\vmx86”: 系统找不到指定的文件.你想要在安装 VMware Workstation 前重启 ...
cm 安装
为Cloudera Manager建立数据库:/usr/share/cmf/schema/scm_prepare_database.sh mysql -h[mysql数据库的主机名] -P63751 ...
N1游记
考试一年前:要认真学文化课,所以还是别报七月的了吧,等到年底就该稳了. 考试半年前:虽然暑假没学,但是到了年底就该稳了. 考试一个月前:我咋还要考N1,算了不管了,到时候再说吧. 考试一周前:我一定要 ...
ACM学习历程—SNNUOJ 1116 A Simple Problem（递推 && 逆元 && 组合数学 && 快速幂）（2015陕西省大学生程序设计竞赛K题）
Description Assuming a finite – radius “ball” which is on an N dimension is cut with a “knife” of N- ...
MVC之一、预备知识储备
自动属性隐式类型对象初始化器与集合初始化器匿名类扩展方法 Lambda表达式 (1).自动属性(Auto-Implemented Properties) C#自动属性可以避免原来这样我们手工声 ...

Hive_Hive的数据模型_汇总

Hive_Hive的数据模型_汇总的更多相关文章

随机推荐

热门专题