Hive重写表数据丢失风险记录

若在Hive中执行INSERT OVERWRITE重写同一个表的数据时，有可能会造成数据丢失。

如 INSERT OVERWRITE TABLE table_name SELECT * FROM table_name

一、新建一张分区表

create table test_chj_cols (id string, name string, age string) partitioned by (ds string) stored as textfile;

二、插入一条记录

insert into test_chj_cols partition (ds='20181224') values ('1','chj','18');

三、确认表数据及结构

> select * from test_chj_cols;

OK

test_chj_cols.id        test_chj_cols.name      test_chj_cols.age       test_chj_cols.ds

1       chj     18      20181224

> desc formatted test_chj_cols partition (ds='20181224');

OK

col_name        data_type       comment

# col_name              data_type               comment             

id                      string

name                    string

age                     string                                      

# Partition Information

# col_name              data_type               comment             

ds                      string                                      

# Detailed Partition Information

Partition Value:        [20181224]

Database:               hduser05db

Table:                  test_chj_cols

CreateTime:             Mon Dec 24 19:35:28 CST 2018

LastAccessTime:         UNKNOWN

Protect Mode:           None

Location:               hdfs://bdphdp02/user/hive/warehouse/hduser05/hduser05db.db/test_chj_cols/ds=20181224

Partition Parameters:

        COLUMN_STATS_ACCURATE   true

        numFiles                1

        numRows                 1

        rawDataSize             8

        totalSize               17

        transient_lastDdlTime   1545651329          

# Storage Information

SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

InputFormat:            org.apache.hadoop.mapred.TextInputFormat

OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Compressed:             No

Num Buckets:            -1

Bucket Columns:         []

Sort Columns:           []

Storage Desc Params:

        serialization.format    1

Time taken: 0.099 seconds, Fetched: 37 row(s)

四、在表中间新增字段

alter table test_chj_cols replace columns (id string, name string, money string, age string);

> desc formatted test_chj_cols;

OK

col_name        data_type       comment

# col_name              data_type               comment             

id                      string

name                    string

money                   string

age                     string                                      

# Partition Information

# col_name              data_type               comment             

ds                      string                                      

# Detailed Table Information

Database:               hduser05db

Owner:                  hadoop

CreateTime:             Mon Dec 24 19:34:46 CST 2018

LastAccessTime:         UNKNOWN

Protect Mode:           None

Retention:              0

Location:               hdfs://bdphdp02/user/hive/warehouse/hduser05/hduser05db.db/test_chj_cols

Table Type:             MANAGED_TABLE

Table Parameters:

        last_modified_by        hadoop

        last_modified_time      1545651722

        transient_lastDdlTime   1545651722          

# Storage Information

SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

InputFormat:            org.apache.hadoop.mapred.TextInputFormat

OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat

Compressed:             No

Num Buckets:            -1

Bucket Columns:         []

Sort Columns:           []

Storage Desc Params:

        serialization.format    1

Time taken: 0.051 seconds, Fetched: 36 row(s)

五、重写数据

insert overwrite table test_chj_cols partition (ds='20181224') select id,name,age,name from

test_chj_cols;

六、age字段数据丢失

> select * from test_chj_cols;

OK

test_chj_cols.id        test_chj_cols.name      test_chj_cols.age       test_chj_cols.money     test_chj_cols.ds

1       chj     NULL    NULL    20181224

Hive重写表数据丢失风险记录的更多相关文章

单表60亿记录等大数据场景的MySQL优化和运维之道
此文是根据杨尚刚在[QCON高可用架构群]中,针对MySQL在单表海量记录等场景下,业界广泛关注的MySQL问题的经验分享整理而成,转发请注明出处. 杨尚刚,美图公司数据库高级DBA,负责美图后端数据 ...
【转】单表60亿记录等大数据场景的MySQL优化和运维之道 | 高可用架构
此文是根据杨尚刚在[QCON高可用架构群]中,针对MySQL在单表海量记录等场景下,业界广泛关注的MySQL问题的经验分享整理而成,转发请注明出处. 杨尚刚,美图公司数据库高级DBA,负责美图后端数据 ...
[转载] 单表60亿记录等大数据场景的MySQL优化和运维之道 | 高可用架构
原文: http://mp.weixin.qq.com/s?__biz=MzAwMDU1MTE1OQ==&mid=209406532&idx=1&sn=2e9b0cc02bdd ...
单表60亿记录等大数据场景的MySQL优化和运维之道 | 高可用架构
015-08-09 杨尚刚高可用架构此文是根据杨尚刚在[QCON高可用架构群]中,针对MySQL在单表海量记录等场景下,业界广泛关注的MySQL问题的经验分享整理而成,转发请注明出处. 杨尚刚,美 ...
mysql在线修改表结构大数据表的风险与解决办法归纳
整理这篇文章的缘由: 互联网应用会频繁加功能,修改需求.那么表结构也会经常修改,加字段,加索引.在线直接在生产环境的表中修改表结构,对用户使用网站是有影响. 以前我一直为这个问题头痛.当然那个时候不需 ...
Hive metastore表结构设计分析
今天总结下,Hive metastore的结构设计.什么是metadata呢,对于它的描述,可以理解为数据的数据,主要是描述数据的属性的信息.它是用来支持如存储位置.历史数据.资源查找.文件记录等功能 ...
Mycat读写分离、主从切换、分库分表的操作记录
系统开发中,数据库是非常重要的一个点.除了程序的本身的优化,如:SQL语句优化.代码优化,数据库的处理本身优化也是非常重要的.主从.热备.分表分库等都是系统发展迟早会遇到的技术问题问题.Mycat是一 ...
R语言读取Hive数据表
R通过RJDBC包连接Hive 目前Hive集群是可以通过跳板机来访问 HiveServer, 将Hive 中的批量数据读入R环境,并进行后续的模型和算法运算. 1. 登录跳板机后需要首先在Linux ...
Hive中小表与大表关联(join)的性能分析【转】
Hive中小表与大表关联(join)的性能分析 [转自:http://blog.sina.com.cn/s/blog_6ff05a2c01016j7n.html] 经常看到一些Hive优化的建议中说当 ...

随机推荐

android 应用能够安装在什么地方
眼下我们仅仅能做到将应用安装在例如以下组合中: 1.关闭MTK_2SDCARD_SWAP,能够将应用安装在以下两个路径上 1)手机内存(internal storage) 2) ...
超全面的JavaWeb笔记day04<dom树等>
1.案例:在末尾添加节点(*****) 创建标签 createElement方法创建文本 createTextNode方法把文本添加到标签下面 appendChild方法 2.元素对象(了解) 如 ...
swift--Timer实现定时器功能，每个一段时间执行具体函数，可以重复，也可以只执行一次
1,创建 //控制器 timer = Timer.scheduledTimer(timeInterval: 0.001, target: self, selector: #selector(Fifte ...
nutch 存储到数据库
就像我们知道的一样,nutch是一个架构在lucene之上的网络爬虫+搜索引擎. 是由lucene的作者在lucene基础之上开发,并整合了hadoop,实现在分布式云计算,使用google标准的HF ...
如何禁止审查元素扒代码（F12）
查看网页源码无非是三种,右键,ctrl+shift+i,f12我们只要禁止即可,代码如下 window.onload=function(){ document.onkeydown=function() ...
maven安装和与IDE集成
第一部分:maven的基本信息和安装,配置 maven是一个项目构建和管理的工具,提供了帮助管理构建.文档.报告.依赖.scms.发布.分发的方法.可以方便的编译代码.进行依赖管理.管理二进制库等 ...
ELK平台介绍
在搜索ELK资料的时候,发现这篇文章比较好,于是摘抄一小段: 以下内容来自:http://baidu.blog.51cto.com/71938/1676798 日志主要包括系统日志.应用程序日志和安全 ...
Android中textView自动识别电话号码,电子邮件,网址(自动加连接)
extends:http://blog.csdn.net/wx_962464/article/details/8471195 其实这个是很简单的,在android中已经为我们实现了,但是我估计很多人都 ...
java jar命令及补丁方法
用法: jar {ctxui}[vfmn0PMe] [jar-file] [manifest-file] [entry-point] [-C dir] files ...选项: -c 创建新档案 -t ...
百度地图API开发----手机地图做导航功能
第一种方式:手机网页点击打开直接进百度地图APP <a href="baidumap://map/direction?mode=[transit:公交,driving:驾车]& ...

Hive重写表数据丢失风险记录

一、新建一张分区表

二、插入一条记录

三、确认表数据及结构

四、在表中间新增字段

五、重写数据

六、age字段数据丢失

Hive重写表数据丢失风险记录的更多相关文章

随机推荐

热门专题