SQOOP Load Data from Oracle to Hive Table
sqoop import -D oraoop.disabled=true \ --connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=HOSTNAME)(port=PORT))(connect_data=(service_name=SERVICE_NAME)))" \ --username USERNAME --table TABLE_NAME --null-string '\\N' --null-non-string '\\N' \ --hive-import --hive-table HIVEDB.HIVETALBENAME \ --num-mappers --verbose --password PWD --hive-drop-import-delims --hive-overwrite --fetch-size
-D is not the parameter for sqoop, it is used for hadoop.
oraoop.disabled=true
If not set this parameter, the command report a issue: table or view does not exists.
Oraoop is a special plugin for sqoop that provides faster access to Oracle's RDBMS by using custom protocols that are not available publicly. Quest software partnered with Oracle to get those protocols, implemented them and created Oraoop.
In our test environment, without this parameter setting, it works fine. For another environment, encounter this issue, before this, I see one log message is : it can't be recognized a valid thin url. Maybe the driver issue .
Another thing need to take care is , you 'd better write TABLE_NAME(VIEW) AND username in UPPER CASE. Or else you may encounter same issue: table or view not exists.
--hive-drop-import-delims
This parameter used to address the known issue, when your fields in the RDBMS table has new line (\r \n or special char such as \001) in the content.
It will break the hive rule. Hive use \001 as default field separator and \n as the row terminator in default.
But if you specify the fields separator or row terminator by yourself, hive will report a error. Hive now just support \n as the row terminator. So you can replace or drop the special char or \r\n in the fields.
--hive-overwrite
This will overwrite the data in the hive table
--fetch-size
This parameter 's default value is 1000.
One time, when we load a width view, has about 80 columns. The sqoop command report a error: out of memory .
The java file not generated now. I don't know why, but this error occurs before the fetch size setting, so I change this.
The root cause may need get more information from source code .
--null-string '\\N' --null-non-string '\\N'
For this parameter, the hive will parse NULL in RDBMS to string 'null', with this parameter, it will keep null in hive table.
If the sqoop command will generate the hadoop jar file in temp path, and then execute the mapreduce job.
First , it will load data to HDFS, then create table for hive, then use load command load data from HDFS to datawarehouse folder.
If the command execute successfully, it will clean the staging file.
If it fails when load data to hive or create hive table. The hdfs folder and file will keep in the HDFS.
If you rerun the same command again, it will fail, report the output directory has exists. So just drop it or load the data by self.
If you use --query (-e) , use free query to load data.
Demo : --query "select *from table where \$conditions", in double quote , you should add \, in single quote, not needed for this.
And you should add parameter --target-dir /hdfspath , if you use --query.
when load data from rdbms to hive, if you let the sqoop create the table for you. you will find the integer type will convert to double.
so you need do something for this. please take care.
SQOOP Load Data from Oracle to Hive Table的更多相关文章
- 使用sqoop从mysql导入数据到hive
目录 前言 一.使用的导入命令 二.遇到的问题及解决 1. 用文本字段进行分区的问题 2. Hadoop历史服务器Hadoop JobHistory没开启的问题 3. 连接元数据存储数据库报错 4 ...
- mysql load data infile的使用 和 SELECT into outfile备份数据库数据
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE t ...
- LOAD DATA INFILE – performance case study
转: http://venublog.com/2007/11/07/load-data-infile-performance/ I often noticed that people complain ...
- LOAD DATA INFILE Syntax--官方
LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLE tbl_n ...
- mysql 的load data infile要使用
LOAD DATA INFILE从文本文件中读出的声明以极高的速度到表. 1.基本语法 LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'fi ...
- mysql导入数据load data infile用法整理
有时候我们需要将大量数据批量写入数据库,直接使用程序语言和Sql写入往往很耗时间,其中有一种方案就是使用MySql Load data infile导入文件的形式导入数据,这样可大大缩短数据导入时间. ...
- mysql 开发进阶篇系列 50 表的数据导入(load data infile,mysqlimport )
一.概述 上篇讲到的表的数据导出(select .. into outfile 或者mysqldump),这篇继续讲表的数据导入,导入也同样有二个方法,分别是load data infile... 和 ...
- load data妙用
load变量和用户变量的巧妙结合,实现灵活导入字段列(NO.1) LOAD DATA INFILE 'file.csv' INTO TABLE dados_meteo (@var1, @var2) S ...
- MySQL基础之---mysqlimport工具和LOAD DATA命令导入文本文件
1.mysqlimport工具的使用 看一下命令的使用方法: shell > mysqlimport -u root -p [--LOCAL] DBname File [option] --f ...
随机推荐
- ACM训练场
http://acm.nyist.net/JudgeOnline/problemset.php http://blog.csdn.net/SJF0115/article/category/910592 ...
- 002_Razor简介
关于 Razor: Razor 语句以 @ 字符开始.在使用 Razor 声明视图模型对象的类型时要使用小写字母,如在本例文件 Index.cshtml 文件中 @model 以小写的 m 开头,但要 ...
- PHP框架中最喜欢的WindFramework
题外话, 像我这样从小到大作文打0分居多的人,写文章,实在是没有耐心的,抱歉. 尽管自己也山寨过许多PHP框架,但被山寨的对象中,最喜欢的是WindFramework. Yii其实更好,但太大而全. ...
- markdown这么好用的东西我才知道。。。多么不折腾的我。。。
markdown 锚点 努力吧 我的网站 之前有个域名phifan.com没续费被抢了,之后又买了phifan.cn没续费被抢了,还剩下个plusnet.cn说什么也不能再丢掉了! package c ...
- Visual Studio图片注释image-comments扩展
有一个开源的Visual Studio小工具image-comments,它用于在源代码注释中插入图片,您可以到这儿下载.目前支持Visual Studio 2010/2012 Sta ...
- Android 手机卫士12--进程管理
1.本进程不能被选中,所以先将checkbox隐藏掉--手机卫士 不能自杀 if(getItem(position).packageName.equals(getPackageName())){ ho ...
- linux线程控制&线程分离
线程概念 线程,有时被称为轻量级进程(Lightweight Process,LWP),是程序执行流的最小单元. 线程是程序中一个单一的顺序控制流程.进程内一个相对独立的.可调度的执行单元,是系统独立 ...
- jQuery 的 ajax
jQuery load() 方法 jQuery load() 方法是简单但强大的 AJAX 方法. load() 方法从服务器加载数据,并把返回的数据放入被选元素中. $(selector).load ...
- Log4net中的调错
在使用log4net时,感觉最麻烦的就是配置文件了,为了使用方便,我不得不先准备好一个完整的配置文件方案,测试了输出到文本.控制台.windows事件.SQL Server数据库都没有问题,但输出到o ...
- android 读中文文本文件
AndroidManifest.xml中 加入: <!-- 在SDCard中创建与删除文件权限 --> <uses-permission android:name="and ...