一,Hive数据导入的几种方式

首先列出讲述下面几种导入方式的数据和hive表。

导入:

  1. 本地文件导入到Hive表;
  2. Hive表导入到Hive表;
  3. HDFS文件导入到Hive表;
  4. 创建表的过程中从其他表导入;
  5. 通过sqoop将mysql库导入到Hive表;示例见《通过sqoop进行mysql与hive的导入导出》和《定时从大数据平台同步HIVE数据到oracle

导出:

  1. Hive表导出到本地文件系统;
  2. Hive表导出到HDFS;
  3. 通过sqoop将Hive表导出到mysql库;

Hive表:

创建testA:

CREATE TABLE testA (
id INT,
name string,
area string
) PARTITIONED BY (create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

创建testB:

CREATE TABLE testB (
id INT,
name string,
area string,
code string
) PARTITIONED BY (create_time string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE;

数据文件(sourceA.txt):

1,fish1,SZ
2,fish2,SH
3,fish3,HZ
4,fish4,QD
5,fish5,SR

数据文件(sourceB.txt):

1,zy1,SZ,1001
2,zy2,SH,1002
3,zy3,HZ,1003
4,zy4,QD,1004
5,zy5,SR,1005

(1)本地文件导入到Hive表

(2)Hive表导入到Hive表

将testB的数据导入到testA表

hive> INSERT INTO TABLE testA PARTITION(create_time='2015-07-11') select id, name, area from testB where id = 1;
...(省略)
OK
Time taken: 14.744 seconds
hive> INSERT INTO TABLE testA PARTITION(create_time) select id, name, area, code from testB where id = 2;
<pre name="code" class="java">...(省略)
OKTime taken: 19.852 secondshive> select * from testA;OK2 zy2 SH 10021 fish1 SZ 2015-07-082 fish2 SH 2015-07-083 fish3 HZ 2015-07-084 fish4 QD 2015-07-085 fish5 SR 2015-07-081 zy1 SZ 2015-07-11Time taken: 0.032 seconds, Fetched: 7 row(s)

说明:

1,将testB中id=1的行,导入到testA,分区为2015-07-11

2,将testB中id=2的行,导入到testA,分区create_time为id=2行的code值。

(3)HDFS文件导入到Hive表

将sourceA.txt和sourceB.txt传到HDFS中,路径分别是/home/hadoop/sourceA.txt和/home/hadoop/sourceB.txt中

hive> LOAD DATA INPATH '/home/hadoop/sourceA.txt' INTO TABLE testA PARTITION(create_time='2015-07-08');
...(省略)
OK
Time taken: 0.237 seconds
hive> LOAD DATA INPATH '/home/hadoop/sourceB.txt' INTO TABLE testB PARTITION(create_time='2015-07-09');
<pre name="code" class="java">...(省略)
OK
Time taken: 0.212 seconds
hive> select * from testA;
OK
1 fish1 SZ 2015-07-08
2 fish2 SH 2015-07-08
3 fish3 HZ 2015-07-08
4 fish4 QD 2015-07-08
5 fish5 SR 2015-07-08
Time taken: 0.029 seconds, Fetched: 5 row(s)
hive> select * from testB;
OK
1 zy1 SZ 1001 2015-07-09
2 zy2 SH 1002 2015-07-09
3 zy3 HZ 1003 2015-07-09
4 zy4 QD 1004 2015-07-09
5 zy5 SR 1005 2015-07-09
Time taken: 0.047 seconds, Fetched: 5 row(s)

/home/hadoop/sourceA.txt'导入到testA表

/home/hadoop/sourceB.txt'导入到testB表

(4)创建表的过程中从其他表导入

hive> create table testC as select name, code from testB;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1449746265797_0106, Tracking URL = http://hadoopcluster79:8088/proxy/application_1449746265797_0106/
Kill Command = /home/hadoop/apache/hadoop-2.4.1/bin/hadoop job -kill job_1449746265797_0106
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-12-24 16:40:17,981 Stage-1 map = 0%, reduce = 0%
2015-12-24 16:40:23,115 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.11 sec
MapReduce Total cumulative CPU time: 1 seconds 110 msec
Ended Job = job_1449746265797_0106
Stage-4 is selected by condition resolver.
Stage-3 is filtered out by condition resolver.
Stage-5 is filtered out by condition resolver.
Moving data to: hdfs://hadoop2cluster/tmp/hive-root/hive_2015-12-24_16-40-09_983_6048680148773453194-1/-ext-10001
Moving data to: hdfs://hadoop2cluster/home/hadoop/hivedata/warehouse/testc
Table default.testc stats: [numFiles=1, numRows=0, totalSize=45, rawDataSize=0]
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.11 sec HDFS Read: 297 HDFS Write: 45 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 110 msec
OK
Time taken: 14.292 seconds
hive> desc testC;
OK
name string
code string
Time taken: 0.032 seconds, Fetched: 2 row(s)

二、Hive数据导出的几种方式

(1)导出到本地文件系统

hive> INSERT OVERWRITE LOCAL DIRECTORY '/home/hadoop/output' ROW FORMAT DELIMITED FIELDS TERMINATED by ',' select * from testA;
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1451024007879_0001, Tracking URL = http://hadoopcluster79:8088/proxy/application_1451024007879_0001/
Kill Command = /home/hadoop/apache/hadoop-2.4.1/bin/hadoop job -kill job_1451024007879_0001
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-12-25 17:04:30,447 Stage-1 map = 0%, reduce = 0%
2015-12-25 17:04:35,616 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.16 sec
MapReduce Total cumulative CPU time: 1 seconds 160 msec
Ended Job = job_1451024007879_0001
Copying data to local directory /home/hadoop/output
Copying data to local directory /home/hadoop/output
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.16 sec HDFS Read: 305 HDFS Write: 110 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 160 msec
OK
Time taken: 16.701 seconds

查看数据结果:

[hadoop@hadoopcluster78 output]$ cat /home/hadoop/output/000000_0
1,fish1,SZ,2015-07-08
2,fish2,SH,2015-07-08
3,fish3,HZ,2015-07-08
4,fish4,QD,2015-07-08
5,fish5,SR,2015-07-08

通过INSERT OVERWRITE LOCAL DIRECTORY将hive表testA数据导入到/home/hadoop目录,众所周知,HQL会启动Mapreduce完成,其实/home/hadoop就是Mapreduce输出路径,产生的结果存放在文件名为:000000_0。

(2)导出到HDFS

导入到HDFS和导入本地文件类似,去掉HQL语句的LOCAL就可以了

hive> INSERT OVERWRITE DIRECTORY '/home/hadoop/output' select * from testA;
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1451024007879_0002, Tracking URL = http://hadoopcluster79:8088/proxy/application_1451024007879_0002/
Kill Command = /home/hadoop/apache/hadoop-2.4.1/bin/hadoop job -kill job_1451024007879_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2015-12-25 17:08:51,034 Stage-1 map = 0%, reduce = 0%
2015-12-25 17:08:59,313 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.4 sec
MapReduce Total cumulative CPU time: 1 seconds 400 msec
Ended Job = job_1451024007879_0002
Stage-3 is selected by condition resolver.
Stage-2 is filtered out by condition resolver.
Stage-4 is filtered out by condition resolver.
Moving data to: hdfs://hadoop2cluster/home/hadoop/hivedata/hive-hadoop/hive_2015-12-25_17-08-43_733_1768532778392261937-1/-ext-10000
Moving data to: /home/hadoop/output
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 1.4 sec HDFS Read: 305 HDFS Write: 110 SUCCESS
Total MapReduce CPU Time Spent: 1 seconds 400 msec
OK
Time taken: 16.667 seconds

查看hfds输出文件:

其他

采用hive的-e和-f参数来导出数据。

参数为: -e 的使用方式,后面接SQL语句。>>后面为输出文件路径

[hadoop@hadoopcluster78 bin]$ ./hive -e "select * from testA" >> /home/hadoop/output/testA.txt
15/12/25 17:15:07 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in file:/home/hadoop/apache/hive-0.13.1/conf/hive-log4j.properties
OK
Time taken: 1.128 seconds, Fetched: 5 row(s)
[hadoop@hadoopcluster78 bin]$ cat /home/hadoop/output/testA.txt
1 fish1 SZ 2015-07-08
2 fish2 SH 2015-07-08
3 fish3 HZ 2015-07-08
4 fish4 QD 2015-07-08
5 fish5 SR 2015-07-08

参数为: -f 的使用方式,后面接存放sql语句的文件。>>后面为输出文件路径

SQL语句文件:

[hadoop@hadoopcluster78 bin]$ cat /home/hadoop/output/sql.sql
select * from testA

使用-f参数执行:

[hadoop@hadoopcluster78 bin]$ ./hive -f /home/hadoop/output/sql.sql >> /home/hadoop/output/testB.txt
15/12/25 17:20:52 WARN conf.HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead Logging initialized using configuration in file:/home/hadoop/apache/hive-0.13.1/conf/hive-log4j.properties
OK
Time taken: 1.1 seconds, Fetched: 5 row(s)

参看结果:

[hadoop@hadoopcluster78 bin]$ cat /home/hadoop/output/testB.txt
1 fish1 SZ 2015-07-08
2 fish2 SH 2015-07-08
3 fish3 HZ 2015-07-08
4 fish4 QD 2015-07-08
5 fish5 SR 2015-07-08

Hive数据导入导出的几种方式的更多相关文章

  1. Hive数据导入导出的n种方式

    Tutorial-LoadingData Hive加载数据的6种方式 #格式 load data [local] inpath '/op/datas/xxx.txt' [overwrite] into ...

  2. 利用sqoop将hive数据导入导出数据到mysql

    一.导入导出数据库常用命令语句 1)列出mysql数据库中的所有数据库命令  #  sqoop list-databases --connect jdbc:mysql://localhost:3306 ...

  3. 从零自学Hadoop(16):Hive数据导入导出,集群数据迁移上

    阅读目录 序 导入文件到Hive 将其他表的查询结果导入表 动态分区插入 将SQL语句的值插入到表中 模拟数据文件下载 系列索引 本文版权归mephisto和博客园共有,欢迎转载,但须保留此段声明,并 ...

  4. sqoop用法之mysql与hive数据导入导出

    目录 一. Sqoop介绍 二. Mysql 数据导入到 Hive 三. Hive数据导入到Mysql 四. mysql数据增量导入hive 1. 基于递增列Append导入 1). 创建hive表 ...

  5. Hive 实战(1)--hive数据导入/导出基础

    前沿: Hive也采用类SQL的语法, 但其作为数据仓库, 与面向OLTP的传统关系型数据库(Mysql/Oracle)有着天然的差别. 它用于离线的数据计算分析, 而不追求高并发/低延时的应用场景. ...

  6. 数据仓库Hive数据导入导出

    Hive库数据导入导出 1.新建表data hive (ebank)> create table data(id int,name string) > ROW FORMAT DELIMIT ...

  7. 如何利用sqoop将hive数据导入导出数据到mysql

    运行环境  centos 5.6   hadoop  hive sqoop是让hadoop技术支持的clouder公司开发的一个在关系数据库和hdfs,hive之间数据导入导出的一个工具. 上海尚学堂 ...

  8. SQL Server数据导入导出的几种方法

    在涉及到SQL Server编程或是管理时一定会用到数据的导入与导出, 导入导出的方法有多种,结合我在做项目时的经历做一下汇总: 1. SQL Server导入导出向导,这种方式是最方便的. 导入向导 ...

  9. Hive数据导入导出

    Hive三种不同的数据导出的方式 (1)  导出到本地文件系统 insert overwrite local directory '/home/anjianbing/soft/export_data/ ...

随机推荐

  1. 第15次Scrum会议(10/27)【欢迎来怼】

    一.小组信息 队名:欢迎来怼 小组成员 队长:田继平 成员:李圆圆,葛美义,王伟东,姜珊,邵朔,冉华 小组照片 二.开会信息 时间:2017/10/27 17:20~17:45,总计25min. 地点 ...

  2. django 浅谈CSRF(Cross-site request forgery)跨站请求伪造

    浅谈CSRF(Cross-site request forgery)跨站请求伪造(写的非常好) 本文目录 一 CSRF是什么 二 CSRF攻击原理 三 CSRF攻击防范 回到目录 一 CSRF是什么 ...

  3. CentOS7安装PostgreSQL10,pgadmin4

    ======PostgreSQL10 CentOS7=================FYI:https://tecadmin.net/install-postgresql-server-centos ...

  4. python在图片上画矩形

    python在图片上画矩形 image_path = '' image = cv2.imread(image_path) first_point = (100, 100) last_point = ( ...

  5. 简单的Windows应用程序命名规则

    读书:<高质量C++编程指南> 作者对“匈牙利”命名规则做了合理的简化,下述的命名规则简单易用,比较适合于Windows应用软件的开发. l [规则3-2-1]类名和函数名用大写字母开头的 ...

  6. Java中break和continue跳出指定循环

    https://www.cnblogs.com/miys/p/b7f6a463bc58785d74a8a7fccd1f1243.html 在Java中,break和continue可以跳出指定循环,在 ...

  7. MySQL中INFORMATION_SCHEMA

    select database();  获取当前连接的数据库name 来源:http://www.cnblogs.com/drake-guo/p/6099436.html select auto_in ...

  8. MySQL Transaction--MySQL与SQL Server在可重复读事务隔离级别上的差异

    MySQL和SQL Server两种数据库在REPEATABLE-READ事务隔离级别实现方式不同,导致使用上也存在差异. 在MySQL中,默认使用REPEATABLE-READ事务隔离级别,MySQ ...

  9. Linux fdisk普通分区扩容

    买了一个orangepi 然后用7.4GB的内存卡,写入了一个lubuntu镜像,用去3.6GB还有3.8GB没有用,因为要编译mt7601u进ubuntu中,需要用到内核文件 但是内核压缩包1.2G ...

  10. hiho一下 第165周#1327 : 分隔相同字符

    题目要求: 时间限制:10000ms单点时限:1000ms内存限制:256MB 描述给定一个只包含小写字母'a'-'z'的字符串 S ,你需要将 S 中的字符重新排序,使得任意两个相同的字符不连在一起 ...