sqoop import -D oraoop.disabled=true \

--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=HOSTNAME)(port=PORT))(connect_data=(service_name=SERVICE_NAME)))" \

--username USERNAME --table TABLE_NAME --null-string '\\N' --null-non-string '\\N' \

--hive-import --hive-table HIVEDB.HIVETALBENAME \

--num-mappers  --verbose --password PWD --hive-drop-import-delims --hive-overwrite

--fetch-size 

-D is not the parameter for sqoop, it is used for hadoop.

oraoop.disabled=true

If not set this parameter, the command report a issue: table or view does not exists.

Oraoop is a special plugin for sqoop that provides faster access to Oracle's RDBMS by using custom protocols that are not available publicly. Quest software partnered with Oracle to get those protocols, implemented them and created Oraoop.

In our test environment, without this parameter setting, it works fine. For another environment, encounter this issue, before this, I see one log message is : it can't be recognized a valid thin url. Maybe the driver issue .

Another thing need to take care is , you 'd better write TABLE_NAME(VIEW) AND username in UPPER CASE. Or else you may encounter same issue: table or view not exists.

--hive-drop-import-delims

This parameter used to address the known issue, when your fields in the RDBMS table has new line (\r \n or special char such as \001) in the content.

It will break the hive rule. Hive use \001 as default field separator and \n as the row terminator in default.

But if you specify the fields separator or row terminator by yourself, hive will report a error. Hive now just support \n as the row terminator. So you can replace or drop the special char or \r\n in the fields.

--hive-overwrite

This will overwrite the data in the hive table

--fetch-size

This parameter 's default value is 1000.

One time, when we load a width view, has about 80 columns. The sqoop command report a error: out of memory .

The java file not generated now. I don't know why, but this error occurs before the fetch size setting, so I change this.

The root cause may need get more information from source code .

--null-string '\\N' --null-non-string '\\N'

For this parameter, the hive will parse NULL in RDBMS to string 'null', with this parameter, it will keep null in hive table.

If the sqoop command will generate the hadoop jar file in temp path, and then execute the mapreduce job.

First , it will load data to HDFS, then create table for hive, then use load command load data from HDFS to datawarehouse folder.

If the command execute successfully, it will clean the staging file.

If it fails when load data to hive or create hive table. The hdfs folder and file will keep in the HDFS.

If you rerun the same command again, it will fail, report the output directory has exists. So just drop it or load the data by self.

If you use --query (-e) , use free query to load data.

Demo : --query "select *from table where \$conditions", in double quote , you should add \, in single quote, not needed for this.

And you should add parameter --target-dir /hdfspath , if you use --query.

when load data from rdbms to hive, if you let the sqoop create the table for you. you will find the integer type will convert to double.

so you need do something for this. please take care.

SQOOP Load Data from Oracle to Hive Table的更多相关文章

  1. 使用sqoop从mysql导入数据到hive

      目录 前言 一.使用的导入命令 二.遇到的问题及解决 1. 用文本字段进行分区的问题 2. Hadoop历史服务器Hadoop JobHistory没开启的问题 3. 连接元数据存储数据库报错 4 ...

  2. mysql load data infile的使用 和 SELECT into outfile备份数据库数据

    LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE t ...

  3. LOAD DATA INFILE – performance case study

    转: http://venublog.com/2007/11/07/load-data-infile-performance/ I often noticed that people complain ...

  4. LOAD DATA INFILE Syntax--官方

    LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLE tbl_n ...

  5. mysql 的load data infile要使用

    LOAD DATA INFILE从文本文件中读出的声明以极高的速度到表. 1.基本语法 LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'fi ...

  6. mysql导入数据load data infile用法整理

    有时候我们需要将大量数据批量写入数据库,直接使用程序语言和Sql写入往往很耗时间,其中有一种方案就是使用MySql Load data infile导入文件的形式导入数据,这样可大大缩短数据导入时间. ...

  7. mysql 开发进阶篇系列 50 表的数据导入(load data infile,mysqlimport )

    一.概述 上篇讲到的表的数据导出(select .. into outfile 或者mysqldump),这篇继续讲表的数据导入,导入也同样有二个方法,分别是load data infile... 和 ...

  8. load data妙用

    load变量和用户变量的巧妙结合,实现灵活导入字段列(NO.1) LOAD DATA INFILE 'file.csv' INTO TABLE dados_meteo (@var1, @var2) S ...

  9. MySQL基础之---mysqlimport工具和LOAD DATA命令导入文本文件

     1.mysqlimport工具的使用 看一下命令的使用方法: shell > mysqlimport -u root -p [--LOCAL] DBname File [option] --f ...

随机推荐

  1. C#实现的18位身份证格式验证算法

    18位身份证标准在国家质量技术监督局于1999年7月1日实施的GB11643-1999<公民身份号码>中做了明确的规定. GB11643-1999<公民身份号码>为GB1164 ...

  2. 静默安装、授权及卸载Microsoft SQL Server、NET Framework、Windows Installer 、ArcGIS License Manager、ArcGIS Engine(Silent install、uninstall and Authorization.. .through Setup Factory)基于Setup Factory

    通过Setup Factory写的代码大概有1700行,所以就不整理了.思路如下: 静默安装都是通过去Microsoft 和Esri的官网找到静默安装的命令,然后File.Run(...)或者Shel ...

  3. LGLCalender (价格日历)

    一直未能找到自己想要的日历价格,就算右也不是我想要的,今天自己封装了一个,欢迎各位来查阅,不足的地方请指教 最新代码下载地址https://github.com/liguoliangiOS/LGLCa ...

  4. 打印机问题win7 和xp

    服务器端问题,重启如下服务 net stop "print spooler" net start "print spooler" gpedit.msc 本地计算 ...

  5. 误报的java.sql.SQLException: Parameter number 21 is not an OUT parameter

    今天为了模拟一个mysql内存不释放问题,要测试一个存储过程,同时具有出参和入参,启动时报了上述错误. <select id="funcl_trd_secu_execution_que ...

  6. django性能优化

    1. 内存.内存,还是加内存 2. 使用单独的静态文件服务器 3. 关闭KeepAlive(如果服务器不提供静态文件服务,如:大文件下载) 4. 使用memcached 5. 使用select_rel ...

  7. ASP.NET中UEditor使用

    ASP.NET中UEditor使用 0.ueditor简介 UEditor是由百度WEB前端研发部开发的所见即所得的开源富文本编辑器,具有轻量.可定制.用户体验优秀等特点.开源基于BSD协议,所有源代 ...

  8. SharePoint 2010 文档管理系列之星级评论功能

    前言:正如我们前面介绍的是,文档管理就是让大家更加直观.方便的对手里的文档,进行统筹掌控,哪些文档是有价值的,哪些文档更受大家欢迎,所有就带来了这个星级评论. 当然,这个是SharePoint 201 ...

  9. 【读书笔记】iOS-读取文本文件

    一,文本文件的内容. 二,工程目录 三,ViewController.m - (void)viewDidLoad { [super viewDidLoad]; // Do any additional ...

  10. Python学习二---字符串

    一.字符串 1.1.字符串和转义字符 转义字符需要使用\来表示 1.2.字符串连接 print 字符串1 字符串2,打印出来的字符串直接连接在一起没有空格 print 字符串1,字符串2,打印出来的字 ...