hive表分区相关操作
Hive表的分区就是一个目录,分区字段不和表的字段重复
创建分区表:
create table tb_partition(id string, name string)
PARTITIONED BY (month string)
row format delimited fields terminated by '\t';
加载数据到hive分区表中
方法一:通过load方式加载
load data local inpath '/home/hadoop/files/nameinfo.txt' overwrite into table tb_partition partition(month='201709');
方法二:insert select 方式
insert overwrite table tb_partition partition(month='201707') select id, name from name;
hive> insert into table tb_partition partition(month='201707') select id, name from name;
Query ID = hadoop_20170918222525_7d074ba1-bff9-44fc-a664-508275175849
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
方法三:可通过手动上传文件到分区目录,进行加载
hdfs dfs -mkdir /user/hive/warehouse/tb_partition/month=201710
hdfs dfs -put nameinfo.txt /user/hive/warehouse/tb_partition/month=201710
虽然方法三手动上传文件到分区目录,但是查询表的时候是查询不到数据的,需要更新元数据信息。
更新源数据的两种方法:
方法一:msck repair table 表名
hive> msck repair table tb_partition;
OK
Partitions not in metastore: tb_partition:month=201710
Repair: Added partition to metastore tb_partition:month=201710
Time taken: 0.265 seconds, Fetched: 2 row(s)
方法二:alter table tb_partition add partition(month='201708');
hive> alter table tb_partition add partition(month='201708');
OK
Time taken: 0.126 seconds
查询表数据:

hive> select *from tb_partition ;
OK
1 Lily 201708
2 Andy 201708
3 Tom 201708
1 Lily 201709
2 Andy 201709
3 Tom 201709
1 Lily 201710
2 Andy 201710
3 Tom 201710
Time taken: 0.161 seconds, Fetched: 9 row(s)

查询分区信息: show partitions 表名
hive> show partitions tb_partition;
OK
month=201708
month=201709
month=201710
Time taken: 0.154 seconds, Fetched: 3 row(s)
查看hdfs中的文件结构

[hadoop@node11 files]$ hdfs dfs -ls /user/hive/warehouse/tb_partition/
17/09/18 22:33:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
drwxr-xr-x - hadoop supergroup 0 2017-09-18 22:25 /user/hive/warehouse/tb_partition/month=201707
drwxr-xr-x - hadoop supergroup 0 2017-09-18 22:15 /user/hive/warehouse/tb_partition/month=201708
drwxr-xr-x - hadoop supergroup 0 2017-09-18 05:55 /user/hive/warehouse/tb_partition/month=201709
drwxr-xr-x - hadoop supergroup 0 2017-09-18 22:03 /user/hive/warehouse/tb_partition/month=201710

创建多级分区
create table tb_mul_partition(id string, name string)
PARTITIONED BY (month string, code string)
row format delimited fields terminated by '\t';
加载数据:
load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201709',code='10000');
load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201710',code='10000');
查询数据:

hive> select *From tb_mul_partition where code='10000';
OK
1 Lily 201709 10000
2 Andy 201709 10000
3 Tom 201709 10000
1 Lily 201710 10000
2 Andy 201710 10000
3 Tom 201710 10000
Time taken: 0.208 seconds, Fetched: 6 row(s)

测试以下指定一个分区:
hive> load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(month='201708');
FAILED: SemanticException [Error 10006]: Line 1:95 Partition not found ''201708''
hive> load data local inpath '/home/hadoop/files/nameinfo.txt' into table tb_mul_partition partition(code='20000');
FAILED: SemanticException [Error 10006]: Line 1:95 Partition not found ''20000''
创建是多级分区,指定一个分区是不可以的。
查看一下在hdfs中存储的结构:
[hadoop@node11 files]$ hdfs dfs -ls /user/hive/warehouse/tb_mul_partition/month=201710
drwxr-xr-x - hadoop supergroup 0 2017-09-18 22:36 /user/hive/warehouse/tb_mul_partition/month=201710/code=10000
动态分区
回顾一下之前的向分区插入数据:
|
1
|
insert overwrite table tb_partition partition(month='201707') select id, name from name; |
这里需要指定具体的分区信息‘201707’,这里通过动态操作,向表里插入数据。
新建表:
hive> create table tb_copy_partition like tb_partition;
OK
Time taken: 0.118 seconds
查看一下表结构:

hive> desc tb_copy_partition;
OK
id string
name string
month string # Partition Information
# col_name data_type comment month string
Time taken: 0.127 seconds, Fetched: 8 row(s)

接下来通过动态操作,向tb_copy_partitioon里面插入数据,
insert into table tb_copy_partition partition(month) select id, name, month from tb_partition; 这里注意需要将分区字段month放到最后。
hive> insert into table tb_copy_partition partition(month) select id, name, month from tb_partition;
FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
这里报错,使用动态加载,需要 To turn this off set hive.exec.dynamic.partition.mode=nonstrict
那根据错误信息设置一下
hive> set hive.exec.dynamic.partition.mode=nonstrict;
查询设置信息,设置成功
hive> set hive.exec.dynamic.partition.mode;
hive.exec.dynamic.partition.mode=nonstrict
重新执行:

hive> insert into table tb_copy_partition partition(month) select id, name, month from tb_partition;
Query ID = hadoop_20170918230808_0bf202da-279f-4df3-a153-ece0e457c905
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1505785612206_0002, Tracking URL = http://node11:8088/proxy/application_1505785612206_0002/
Kill Command = /home/hadoop/app/hadoop-2.6.0-cdh5.10.0/bin/hadoop job -kill job_1505785612206_0002
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 0
2017-09-18 23:08:13,698 Stage-1 map = 0%, reduce = 0%
2017-09-18 23:08:23,896 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.94 sec
2017-09-18 23:08:27,172 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.63 sec
MapReduce Total cumulative CPU time: 3 seconds 630 msec
Ended Job = job_1505785612206_0002
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://cluster1/user/hive/warehouse/tb_copy_partition/.hive-staging_hive_2017-09-18_23-08-01_475_7542657053989652968-1/-ext-10000
Loading data to table default.tb_copy_partition partition (month=null)
Time taken for load dynamic partitions : 381
Loading partition {month=201709}
Loading partition {month=201708}
Loading partition {month=201710}
Loading partition {month=201707}
Time taken for adding to write entity : 0
Partition default.tb_copy_partition{month=201707} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
Partition default.tb_copy_partition{month=201708} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
Partition default.tb_copy_partition{month=201709} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
Partition default.tb_copy_partition{month=201710} stats: [numFiles=1, numRows=3, totalSize=20, rawDataSize=17]
MapReduce Jobs Launched:
Stage-Stage-1: Map: 2 Cumulative CPU: 3.63 sec HDFS Read: 8926 HDFS Write: 382 SUCCESS
Total MapReduce CPU Time Spent: 3 seconds 630 msec
OK
Time taken: 28.932 seconds

查询一下数据:

hive> select *From tb_copy_partition;
OK
1 Lily 201707
2 Andy 201707
3 Tom 201707
1 Lily 201708
2 Andy 201708
3 Tom 201708
1 Lily 201709
2 Andy 201709
3 Tom 201709
1 Lily 201710
2 Andy 201710
3 Tom 201710
Time taken: 0.121 seconds, Fetched: 12 row(s)

hive表分区相关操作的更多相关文章
- day40数据库之表的相关操作
数据库之表的相关操作1.表的操作: 1.创建表的语法: create table 表名( id int(10) primary key auto_inc ...
- MYSQL--表与表之间的关系、修改表的相关操作
表与表之间的操作: 如果所有信息都在一张表中: 1.表的结构不清晰 2.浪费硬盘空间 3.表的扩展性变得极差(致命的缺点) 确立表与表之间的关系.一定要换位思考(必须在两者考虑清楚之后才能得出结论) ...
- Hive表分区
必须在表定义时创建partition a.单分区建表语句:create table day_table (id int, content string) partitioned by (dt stri ...
- Hive 表分区
Hive表的分区就是一个目录,分区字段不和表的字段重复 创建分区表: create table tb_partition(id string, name string) PARTITIONED BY ...
- HDFS文件和HIVE表的一些操作
1. hadoop fs -ls 可以查看HDFS文件 后面不加目录参数的话,默认当前用户的目录./user/当前用户 $ hadoop fs -ls 16/05/19 10:40:10 WARN ...
- [Hive]使用HDFS文件夹数据创建Hive表分区
描写叙述: Hive表pms.cross_sale_path建立以日期作为分区,将hdfs文件夹/user/pms/workspace/ouyangyewei/testUsertrack/job1Ou ...
- hive表分区的修复
hive从低版本升级到高版本或者做hadoop的集群数据迁移时,需要重新创建表和表分区,由于使用的是动态分区,所以需要重新刷新分区表字段,否则无法查看数据. 在hive中执行中以下命令即可自动更新元数 ...
- 使用MSCK命令修复Hive表分区
set hive.strict.checks.large.query=false; set hive.mapred.mode=nostrict; MSCK REPAIR TABLE 表名; 通常是通过 ...
- Oracle language types(语言种类) 表的相关操作 DDL数据定义语言
数据定义语言 Data Definition Language Statements(DDL)数据操纵语言 Data Manipulation Language(DML) Statements事务控制 ...
随机推荐
- html5点击input没有出现光标完美解决方案
html5点击input没有出现光标完美解决方案 <pre> <input type="text" placeholder="输入姓名" cl ...
- 第九次作业 DFA最小化,语法分析初步
1.将DFA最小化:教材P65 第9题 Ⅰ {1,2,3,4,5} {6,7} {1,2}b={1,2,3,4,5} 3,4}b={5} {6,7} Ⅱ {1,2}{3,4}{5} {6,7} 2.构 ...
- Zookeeper connection loss leads to Flink job restart
Flink可以使用zookeeper来进行ha,而一般我们都会使用zookeeper的高级api架构curator来对zk进行通讯.在curator中引入了状态的概念,包括connected,reco ...
- C编程遇到的一些小细节
1 typedef int ElemType --> typedef int ElemType; 此处要加分行(:),要区别 #defin a 20 此处不需要加分号:#define是预 ...
- 可落地的DDD(4)-如何利用DDD进行微服务的划分(2)
摘要 在前面一篇介绍了如何通过DDD的思想,来调整单体服务内的工程结构,为微服务的拆分做准备.同时介绍了我们在进行微服务拆分的时候踩过的一些坑. 这篇介绍下我们最终的方案,不一定对,欢迎留言讨论. 微 ...
- 基准测试工具:Wrk初识
最近和同事聊起常用的一些压测工具,谈到了Apache ab.阿里云的PTS.Jmeter.Locust以及wrk各自的一些优缺点和适用的场景类型. 这篇博客,简单介绍下HTTP基准测试工具wrk的基本 ...
- C#读写设置修改调整UVC摄像头画面-增益
有时,我们需要在C#代码中对摄像头的增益进行读和写,并立即生效.如何实现呢? 建立基于SharpCamera的项目 首先,请根据之前的一篇博文 点击这里 中的说明,建立基于SharpCamera的摄像 ...
- "类"的讲稿
-----------------------面向对象基础------------------------------------方法(函数) { (c#p10为主,p27;javap96)+资料,讲 ...
- springmvc集成shiro后,session、request是否发生变化
1. 疑问 我们在项目中使用了spring mvc作为MVC框架,shiro作为权限控制框架,在使用过程中慢慢地产生了下面几个疑惑,本篇文章将会带着疑问慢慢地解析shiro源码,从而解开心里面的那点小 ...
- 解决老大难疑惑:指针 vs 引用
▶疑问描述 1. 引用reference的本质: 常指针 ——> 什么时候用指针?= 就按Java中的引用变量那样用? ——> 什么时候用引用? ①函数的入参/返回值时 ②T&am ...