hdfs数据到hive中，以及hdfs数据隐身理解

hdfs数据到hive中：

假设hdfs中已存在好了数据,路径是hdfs:/localhost:9000/user/user_w/hive_g2park/user_center_enterprise_info/*

1.提前(在hive中)准备好表， user_center_enterprise_info2 ，用于接收hdfs数据。

CREATE TABLE user_center_enterprise_info2 (

`id`string ,

`name` string

);

2.使用load data方式，加载数据，（已执行数据库选择命令 hive>use testdb;）

以下相对/绝对两种路径加载都行

hive>load data inpath              'hive_g2park/user_center_enterprise_info/*' into table user_center_enterprise_info2;

hive>load data inpath '/user/user_w/hive_g2park/user_center_enterprise_info/*' into table user_center_enterprise_info2;

此时：

hdfs dfs -ls /user/user_w/hive_g2park/user_center_enterprise_info 发现里面内容没了

3.把数据从hive写回hdfs，让数据出现在hdfs

数据是从hdfs的 hive_g2park/user_center_enterprise_info 写到hive的，

现在写道 hive_g2park/user_center_enterprise_info3 路径下

-- 设置task数 set mapred.reduce.tasks = ; 结果数据平均分区（分区数等于task数）;

set mapred.reduce.tasks = ;

insert overwrite directory 'hive_g2park/user_center_enterprise_info3'

ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'

WITH SERDEPROPERTIES (

'field.delim'='\t',

'serialization.format'= '',

'serialization.null.format'=''

) STORED AS TEXTFILE

select * from user;

此时在hdfs下生成了新路径，

hive_g2park/user_center_enterprise_info3
并且有数据在其文件夹下

我的理解数据消失：
hive本质是map-reduce,即操作hdfs数据的方式有：spark,mr,pig,hive。

而hive只是在mr上包了一层，hive操作的时候，本质上，是直接操作的hdfs数据，也就是说hdfs数据load后，和hive数据是同一份数据
而且load data inpath '/user/hive/os.txt' into table os;这种方式loca数据到hive辣么快，应该是修改了指针而已，而不是复制了一份数据到hive。

hdfs数据到hive就隐藏不见，这么设计，就是为了避免数据在被hive改动的同时，又被mr直接操作hdfs数据，删除移动什么的，造成数据的不一致，所以数据丢hive里hdfs里就看不见了

sqoop export --connect jdbc:mysql://127.0.0.1:3306/parkdb --username xiaoming --password '' --table t_vip_user --export-dir 'hive_g2park/vip/*' --fields-terminated-by "\t"

附录：

sqoop的导出参数中，hive-import作用：本次导入到hive中

导入看得到hdfs文件夹范例
sqoop import --connect jdbc:mysql://localhost:3306/user --username h --password '123' \
--fields-terminated-by "\t" --table enterprise_info --delete-target-dir --target-dir 'hive_g2park/user_center_enterprise_info' \
--create-hive-table --hive-table g2park.user_center_enterprise_info

导入看不到hdfs文件夹范例
sqoop import --connect jdbc:mysql://localhost:3306/user --username h --password '123'
--fields-terminated-by "\t" --table enterprise_info --delete-target-dir
--hive-import --target-dir 'hive_g2park/user_center_enterprise_info'
--create-hive-table --hive-table g2park.user_center_enterprise_info

hdfs数据到hive中，以及hdfs数据隐身理解的更多相关文章

Hive中导入Oracle数据错误：Listener refused the connection with the following error: ORA-12505
问题: 今天往Hive中导入Oracle数据的时候碰到了如下错误:Listener refused the connection with the following error: ORA-12505 ...
（转）原始图像数据和PDF中的图像数据
比较原始图像数据和PDF中的图像数据,结果见表1.1.表1.1中各种“解码器”的解释见本文后续的“PDF支持的图像格式”部分,“PDF中的图像数据”各栏中的数据来自开源的PdfView.如果您有兴趣查 ...
一个I/O线程可以并发处理N个客户端连接和读写操作 I/O复用模型基于Buf操作NIO可以读取任意位置的数据 Channel中读取数据到Buffer中或将数据 Buffer 中写入到 Channel 事件驱动消息通知观察者模式
Tomcat那些事儿 https://mp.weixin.qq.com/s?__biz=MzI3MTEwODc5Ng==&mid=2650860016&idx=2&sn=549 ...
使用sqoop1.4.4从oracle导入数据到hive中错误记录及解决方案
在使用命令导数据过程中,出现如下错误 sqoop import --hive-import --connect jdbc:oracle:thin:@192.168.29.16:1521/testdb ...
sqoop导oracle数据到hive中并动态分区
静态分区: 在hive中创建表可以使用hql脚本: test.hql USE TEST; CREATE TABLE page_view(viewTime INT, userid BIGINT, pag ...
hive 之将excel数据导入hive中 : excel 转 txt
一.需求: 1.客户每月上传固定格式的excel文件到指定目录.每月上传的文件名只有结尾月份不同,如: 10月文件名: zhongdiangedan202010.xlsx , 11月文件名: zh ...
sqoop 从oracle导数据到hive中，date型数据时分秒截断问题
oracle数据库中Date类型倒入到hive中出现时分秒截断问题解决方案 1.问题描述: 用sqoop将oracle数据表倒入到hive中,oracle中Date型数据会出现时分秒截断问题,只保留了 ...
sqoop1.4.4从oracle导数据到hive中
sqoop从oracle定时增量导入数据到hive 感谢: http://blog.sina.com.cn/s/blog_3fe961ae01019a4l.htmlhttp://f.dataguru. ...
在hive中查询导入数据表时FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
当我们出现这种情况时 FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least ...

随机推荐

Delaunay triangulation
1,先花个圆: detail模式执行. #define XY 0x00 #define XZ 0x01 #define YZ 0x02 #define pi 3.1415926 #define clo ...
编译和运行dubbo-admin管理平台
下载 Github上下载最新的dubbo源码包并解压修改ZooKeeper相关的配置打开dubbo-admin/src/main/webapp/WEB-INF下的dubbo.p ...
[转]PHP中file_put_contents追加和换行
在PHP的一些应用中需要写日志或者记录一些信息,这样的话. 可以使用fopen(),fwrite()以及 fclose()这些进行操作. 也可以简单的使用file_get_contents()和fil ...
安装mysql5.7与创建用户和远程登录授权
环境:ubuntu18.04 参考文章:安装并远程登录授权:https://www.cnblogs.com/chancy/p/9444187.html 用户管理:https://www.cnblogs ...
在内网使用Gradle构建Android Studio项目
在Android Studio项目中,默认的远程仓库为jcenter,如果在项目引用了一些类库,Gradle构建程序的时候会将这些依赖类库从jcenter网站下载到本地,如我们在 build.grad ...
洛谷P4336 [SHOI2016]黑暗前的幻想乡 [Matrix-Tree定理，容斥]
传送门思路首先看到生成树计数,想到Matrix-Tree定理. 然而,这题显然是不能Matrix-Tree定理硬上的,因为还有每个公司只能建一条路的限制.这个限制比较恶心,尝试去除它. 怎么除掉它 ...
vi快速查找
用vim时,想高亮显示一个单词并查找的方发,将光标移动到所找单词. 1: shift + "*" 向下查找并高亮显示 2: shift + "#" 向上查找 ...
LuoGu P1541 乌龟棋
题目传送门乌龟棋我并不知道他为啥是个绿题0.0 总之感觉思维含量确实不太高(虽然我弱DP)(毛多弱火,体大弱门,肥胖弱菊,骑士弱梯,入侵弱智,沙华弱Dp) 总之,设计出来状态这题就很简单了设 f[ ...
python 面向对象编程（初级篇）
飞机票概述面向过程:根据业务逻辑从上到下写垒代码函数式:将某功能代码封装到函数中,日后便无需重复编写,仅调用函数即可面向对象:对函数进行分类和封装,让开发“更快更好更强...” 面向过程编程最 ...
python中的zip、map、reduce 、lambda、filter函数的使用
飞机票 lambda函数 lambda只是一个表达式,函数体比def简单很多. lambda的主体是一个表达式,而不是一个代码块.仅仅能在lambda表达式中封装有限的逻辑进去. lambda表达式是 ...

hdfs数据到hive中，以及hdfs数据隐身理解

hdfs数据到hive中，以及hdfs数据隐身理解的更多相关文章

随机推荐

热门专题