Hive- 表

在hive中表的类型：管理表和托管表（外部表）。

内部表也称之为MANAGER_TABLE,默认存储在/user/hive/warehouse下，也可以通过location指定；删除表时，会删除表的数据以及元数据；

外部表称之为EXTERNAL_TABLE。在创建表时可以自己指定目录位置（LOCATION），数据存储所在的目录；删除表时，只会删除元数据不会删除表数据；

创建外部表实例

create external table if not exists default.emp_ext(

empno int,

ename string,

job string,

mgr int,

hiredate string,

sal double,

comm double,

deptno int

)

row format delimited fields terminated by '\t'

location '/opt/input／emp';

分区表实际上就是对应一个HDFS文件系统上的独立的文件夹，该文件夹下是该分区所以的数据文件。hive中的分区就是分目录，把一个大的数据集根据业务需要分割成更小的数据集。

在查询时通过WHERE子句中的表达来选择所需要的指定的分区，这样的查询效率会提高很多。

create external table if not exists default.emp_partition(

empno int,

ename string,

job string,

mgr int,

hiredate string,

sal double,

comm double,

deptno int

)

partitioned by(month string)

row format delimited fields terminated by '\t';

分区表注意事项：

修复表：msck repair table table_name;

可以写shell脚本

dfs -mkdir -p /user/hive/warehouse/dept_part/day=;

dfs -put /opt/weblog/log.log /user/hive/warehouse/dept_part/day=;

alter table dept_part and partition('day=20171025');

查看表的分区数：show partitions dept_part;

导入数据进入hive表

load　data [local] inpath 'filepath' [overwrite] into table tablename　into　tablename [partition (partcol1=val,...)]；

参数带local意思是本地文件，不带就是HDFS文件

参数带overwrite意思是覆盖原本文件的内容，不带就追加内容

分区表加载，特殊性partition (partcol1=val,...)

1.加载本地文件到hive表

load  data  local  inpath '/root/emp.txt' into  table  default.emp

2.加载hdfs文件到hive表中

load  data  inpath '/root/emp.txt' into  table  default.emp

3.加载数据覆盖表中已有的数据

load  data  inpath '/root/emp.txt' overwrite into  table  default.emp

4.创建表是通过insert加载

create　table    default.emp_ci like emp;

insert into table default.emp_ci select * from default.emp;

5.创建表的时候通过指定location指定加载

导出hive表数据

insert overwrite local directory '/opt/datas/hive/hive_exp_emp'  select * from default.emp 

row format delimited fields terminated by '\t';

#bin/hive -e "select * from default.emp;" > /opt/datas/hive/exp_res.txt

hive表多重插入
假如有一个需求：
从t_4中筛选出不同的数据，插入另外两张表中；

insert overwrite table t_4_st_lt_200 partition(day='')

select ip,url,staylong from t_4 where staylong<;

insert overwrite table t_4_st_gt_200 partition(day='')

select ip,url,staylong from t_4 where staylong>;

但是以上实现方式有一个弊端，两次筛选job，要分别启动两次mr过程，要对同一份源表数据进行两次读取
如果使用多重插入语法，则可以避免上述弊端，提高效率：源表只要读取一次即可

from t_4

insert overwrite table t_4_st_lt_200 partition(day='')

select ip,url,staylong where staylong<

insert overwrite table t_4_st_gt_200 partition(day='')

select ip,url,staylong where staylong>;

Hive- 表的更多相关文章

hive 表分区操作
hive的数据查询一般会扫描整个表,当表数据太大时,就会消耗些时间,有时候我们只需要对部分数据感兴趣,所以hive引入了分区的概念 hive的表分区区别于一般的分布式分区(hash分区,范围分区 ...
如何快速把hdfs数据动态导入到hive表
1. hdfs 文件 {"retCode":1,"retMsg":"Success","data":[{" ...
HDFS文件和HIVE表的一些操作
1. hadoop fs -ls 可以查看HDFS文件后面不加目录参数的话,默认当前用户的目录./user/当前用户 $ hadoop fs -ls 16/05/19 10:40:10 WARN ...
用puthivestreaming把hdfs里的数据流到hive表
全景图: 1. 创建hive表 CREATE TABLE IF NOT EXISTS newsinfo.test( name STRING ) CLUSTERED BY (name)INTO 3 ...
spark使用Hive表操作
spark Hive表操作之前很长一段时间是通过hiveServer操作Hive表的,一旦hiveServer宕掉就无法进行操作. 比如说一个修改表分区的操作一.使用HiveServer的方式 v ...
spark+hcatalog操作hive表及其数据
package iie.hadoop.hcatalog.spark; import iie.udps.common.hcatalog.SerHCatInputFormat; import iie.ud ...
【原】创建Hive表，分号分隔符“；”引起的异常
[障碍再现] 在创建支持Map数据结构的Hive表时,抛出如下异常 hive> create table tab_map(name string,info map<string,strin ...
Hive表分区
必须在表定义时创建partition a.单分区建表语句:create table day_table (id int, content string) partitioned by (dt stri ...
导hive表项目总结（未完待续）
shell里面对日期的操作 #!/bin/bash THIS_FROM=$(date +%Y%m%d -d "-7 day") THIS_TO=$(date +%Y-%m-%d - ...
使用spark对hive表中的多列数据判重
本文处理的场景如下,hive表中的数据,对其中的多列进行判重deduplicate. 1.先解决依赖,spark相关的所有包,pom.xml spark-hive是我们进行hive表spark处理的关 ...

随机推荐

Eureka 源码编译安装部署---Eureka运行eureka-server服务
---恢复内容开始--- 折腾了几天,终于运行好了,两个字:佩服首先感谢这个大佬的博客支持:https://www.cnblogs.com/lifuping/p/5663127.html 1.首先在 ...
OpenGL/GLSL数据传递小记(3.x)(转)
OpenGL/GLSL规范在不断演进着,我们渐渐走进可编程管道的时代的同时,崭新的功能接口也让我们有点缭乱的感觉.本文再次从OpenGL和GLSL之间数据的传递这一点,记录和介绍基于OpenGL3.x ...
webview长按保存图片
private String imgurl = ""; /*** * 功能:长按图片保存到手机 */ @Override public void onC ...
Dockerfile安装KOD可道云
[root@docker01 base2]# cat Dockerfile FROM centos:6.8 RUN yum install openssh-server -y RUN /etc/ini ...
Crashing Robots - poj 2632
Time Limit: 1000MS Memory Limit: 65536K Total Submissions: 8352 Accepted: 3613 Description In ...
Cannot merge new index 65781 into a non-jumbo instruction! 问题解决(网上摘抄)
我的报了这个错 Error:Execution failed for task ':app:transformClassesWithDexForDebug'.> com.android.buil ...
html5小趣味知识点系列（一）required
都知道这个属性是检查你是否填写了字段也就是说咱们不用判断输入的数值是否为空的情况了但是这个属性一定要和form配合在一起使用单独的使用是不可以实现的 <!DOCTYPE html> & ...
旋转卡壳求两个凸包最近距离poj3608
#include <iostream> #include <cmath> #include <vector> #include <string.h> # ...
摘要: CentOS 6.5搭建Redis3.2.8伪分布式集群
from https://my.oschina.net/ososchina/blog/856678 摘要: CentOS 6.5搭建Redis3.2.8伪分布式集群前言最近在服务器上搭建了 ...
九度OJ 1183：守形数（数字特性）
时间限制:1 秒内存限制:32 兆特殊判题:否提交:3815 解决:2005 题目描述: 守形数是这样一种整数,它的平方的低位部分等于它本身. 比如25的平方是625,低位部分是25,因此25是 ...

Hive- 表

Hive- 表的更多相关文章

随机推荐

热门专题