SQOOP Load Data from Oracle to Hive Table
sqoop import -D oraoop.disabled=true \ --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=HOSTNAME)(port=PORT))(connect_data=(service_name=SERVICE_NAME)))" \ --username USERNAME --table TABLE_NAME --null-string '\\N' --null-non-string '\\N' \ --hive-import --hive-table HIVEDB.HIVETALBENAME \ --num-mappers --verbose --password PWD --hive-drop-import-delims --hive-overwrite --fetch-size
-D is not the parameter for sqoop, it is used for hadoop.
oraoop.disabled=true
If not set this parameter, the command report a issue: table or view does not exists.
Oraoop is a special plugin for sqoop that provides faster access to Oracle's RDBMS by using custom protocols that are not available publicly. Quest software partnered with Oracle to get those protocols, implemented them and created Oraoop.
In our test environment, without this parameter setting, it works fine. For another environment, encounter this issue, before this, I see one log message is : it can't be recognized a valid thin url. Maybe the driver issue .
Another thing need to take care is , you 'd better write TABLE_NAME(VIEW) AND username in UPPER CASE. Or else you may encounter same issue: table or view not exists.
--hive-drop-import-delims
This parameter used to address the known issue, when your fields in the RDBMS table has new line (\r \n or special char such as \001) in the content.
It will break the hive rule. Hive use \001 as default field separator and \n as the row terminator in default.
But if you specify the fields separator or row terminator by yourself, hive will report a error. Hive now just support \n as the row terminator. So you can replace or drop the special char or \r\n in the fields.
--hive-overwrite
This will overwrite the data in the hive table
--fetch-size
This parameter 's default value is 1000.
One time, when we load a width view, has about 80 columns. The sqoop command report a error: out of memory .
The java file not generated now. I don't know why, but this error occurs before the fetch size setting, so I change this.
The root cause may need get more information from source code .
--null-string '\\N' --null-non-string '\\N'
For this parameter, the hive will parse NULL in RDBMS to string 'null', with this parameter, it will keep null in hive table.
If the sqoop command will generate the hadoop jar file in temp path, and then execute the mapreduce job.
First , it will load data to HDFS, then create table for hive, then use load command load data from HDFS to datawarehouse folder.
If the command execute successfully, it will clean the staging file.
If it fails when load data to hive or create hive table. The hdfs folder and file will keep in the HDFS.
If you rerun the same command again, it will fail, report the output directory has exists. So just drop it or load the data by self.
If you use --query (-e) , use free query to load data.
Demo : --query "select *from table where \$conditions", in double quote , you should add \, in single quote, not needed for this.
And you should add parameter --target-dir /hdfspath , if you use --query.
when load data from rdbms to hive, if you let the sqoop create the table for you. you will find the integer type will convert to double.
so you need do something for this. please take care.
SQOOP Load Data from Oracle to Hive Table的更多相关文章
- 使用sqoop从mysql导入数据到hive
目录 前言 一.使用的导入命令 二.遇到的问题及解决 1. 用文本字段进行分区的问题 2. Hadoop历史服务器Hadoop JobHistory没开启的问题 3. 连接元数据存储数据库报错 4 ...
- mysql load data infile的使用 和 SELECT into outfile备份数据库数据
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE t ...
- LOAD DATA INFILE – performance case study
转: http://venublog.com/2007/11/07/load-data-infile-performance/ I often noticed that people complain ...
- LOAD DATA INFILE Syntax--官方
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLE tbl_n ...
- mysql 的load data infile要使用
LOAD DATA INFILE从文本文件中读出的声明以极高的速度到表. 1.基本语法 LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'fi ...
- mysql导入数据load data infile用法整理
有时候我们需要将大量数据批量写入数据库,直接使用程序语言和Sql写入往往很耗时间,其中有一种方案就是使用MySql Load data infile导入文件的形式导入数据,这样可大大缩短数据导入时间. ...
- mysql 开发进阶篇系列 50 表的数据导入(load data infile,mysqlimport )
一.概述 上篇讲到的表的数据导出(select .. into outfile 或者mysqldump),这篇继续讲表的数据导入,导入也同样有二个方法,分别是load data infile... 和 ...
- load data妙用
load变量和用户变量的巧妙结合,实现灵活导入字段列(NO.1) LOAD DATA INFILE 'file.csv' INTO TABLE dados_meteo (@var1, @var2) S ...
- MySQL基础之---mysqlimport工具和LOAD DATA命令导入文本文件
1.mysqlimport工具的使用 看一下命令的使用方法: shell > mysqlimport -u root -p [--LOCAL] DBname File [option] --f ...
随机推荐
- Access restriction : The constructor BASE64Decoder() is not accessible due to restriction on required library
1.问题描述 找不到包 sun.misc.BASE64Encoder 2. 解决方案 只需要在project build path中先移除JRE System Library,再添加库JRE Sys ...
- Android图像处理之Bitmap类
Bitmap是Android系统中的图像处理的最重要类之一.用它可以获取图像文件信息,进行图像剪切.旋转.缩放等操作,并可以指定格式保存图像文件.本文从应用的角度,着重介绍怎么用Bitmap来实现 ...
- [.NET] 使用C#开发SQL Function来提供服务 - 简讯发送
[.NET] 使用C#开发SQL Function来提供服务 - 简讯发送 范例下载 范例程序代码:点此下载 问题情景 在「使用C#开发SQL Function来提供数据 - 天气预报」这篇文章中,介 ...
- [js开源组件开发]js文本框计数组件
js文本框计数组件 先上效果图: 样式可以自行调整 ,它的功能提供文本框的实时计数,并作出对应的操作,比如现在超出了,点击下面的按钮后,文本框会闪动两下,阻止提交.具体例子可以点击demo:http: ...
- React对话框组件实现
当下前端届最火的技术之一莫过于React + Redux + webpack的技术结合.最近公司内部也正在转react,这周主要做了个React的modal组件,接下来谈下具体实现过程. 基本的HTM ...
- CSS 选择器汇总
CSS 选择器 CSS 元素选择器 CSS 选择器分组 CSS 类选择器详解 CSS ID 选择器详解 CSS 属性选择器详解 CSS 后代选择器 CSS 子元素选择器 CSS 相邻兄弟选择器 CSS ...
- MySQL数据库中字符集的问题
今天在做Hibernate案例,往mysql中写记录的时候,出现ERROR: Incorrect string value: '\xE5\x8A\xA0\xE5\x86\x85...' for col ...
- JTS Geometry关系判断和分析
关系判断 Geometry之间的关系有如下几种: 相等(Equals): 几何形状拓扑上相等. 脱节(Disjoint): 几何形状没有共有的点. 相交(Intersects): 几何形状至少有一个共 ...
- Arcengine实现创建网络数据集札记(一)
一 引子 网络数据集,GIS空间分析基础的理论和知识,是最短路径分析.连通性分析等其他空间分析技术的数据基础. 以往,网络数据集的研究很少,此次项目开发过程中,对网络数据集以及arcengine创建网 ...
- android res文件夹下面的 values-v11 、 values-v14
values-v11代表在API 11+的设备上,用该目录下的styles.xml代替res/values/styles.xml values-v14代表在API 14+的设备上,用该目录下的styl ...