sqoop import -D oraoop.disabled=true \

--connect "jdbc:oracle:thin:@(description=(address=(protocol=tcp)(host=HOSTNAME)(port=PORT))(connect_data=(service_name=SERVICE_NAME)))" \

--username USERNAME --table TABLE_NAME --null-string '\\N' --null-non-string '\\N' \

--hive-import --hive-table HIVEDB.HIVETALBENAME \

--num-mappers  --verbose --password PWD --hive-drop-import-delims --hive-overwrite

--fetch-size 

-D is not the parameter for sqoop, it is used for hadoop.

oraoop.disabled=true

If not set this parameter, the command report a issue: table or view does not exists.

Oraoop is a special plugin for sqoop that provides faster access to Oracle's RDBMS by using custom protocols that are not available publicly. Quest software partnered with Oracle to get those protocols, implemented them and created Oraoop.

In our test environment, without this parameter setting, it works fine. For another environment, encounter this issue, before this, I see one log message is : it can't be recognized a valid thin url. Maybe the driver issue .

Another thing need to take care is , you 'd better write TABLE_NAME(VIEW) AND username in UPPER CASE. Or else you may encounter same issue: table or view not exists.

--hive-drop-import-delims

This parameter used to address the known issue, when your fields in the RDBMS table has new line (\r \n or special char such as \001) in the content.

It will break the hive rule. Hive use \001 as default field separator and \n as the row terminator in default.

But if you specify the fields separator or row terminator by yourself, hive will report a error. Hive now just support \n as the row terminator. So you can replace or drop the special char or \r\n in the fields.

--hive-overwrite

This will overwrite the data in the hive table

--fetch-size

This parameter 's default value is 1000.

One time, when we load a width view, has about 80 columns. The sqoop command report a error: out of memory .

The java file not generated now. I don't know why, but this error occurs before the fetch size setting, so I change this.

The root cause may need get more information from source code .

--null-string '\\N' --null-non-string '\\N'

For this parameter, the hive will parse NULL in RDBMS to string 'null', with this parameter, it will keep null in hive table.

If the sqoop command will generate the hadoop jar file in temp path, and then execute the mapreduce job.

First , it will load data to HDFS, then create table for hive, then use load command load data from HDFS to datawarehouse folder.

If the command execute successfully, it will clean the staging file.

If it fails when load data to hive or create hive table. The hdfs folder and file will keep in the HDFS.

If you rerun the same command again, it will fail, report the output directory has exists. So just drop it or load the data by self.

If you use --query (-e) , use free query to load data.

Demo : --query "select *from table where \$conditions", in double quote , you should add \, in single quote, not needed for this.

And you should add parameter --target-dir /hdfspath , if you use --query.

when load data from rdbms to hive, if you let the sqoop create the table for you. you will find the integer type will convert to double.

so you need do something for this. please take care.

SQOOP Load Data from Oracle to Hive Table的更多相关文章

  1. 使用sqoop从mysql导入数据到hive

      目录 前言 一.使用的导入命令 二.遇到的问题及解决 1. 用文本字段进行分区的问题 2. Hadoop历史服务器Hadoop JobHistory没开启的问题 3. 连接元数据存储数据库报错 4 ...

  2. mysql load data infile的使用 和 SELECT into outfile备份数据库数据

    LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE] INTO TABLE t ...

  3. LOAD DATA INFILE – performance case study

    转: http://venublog.com/2007/11/07/load-data-infile-performance/ I often noticed that people complain ...

  4. LOAD DATA INFILE Syntax--官方

    LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'file_name' [REPLACE | IGNORE] INTO TABLE tbl_n ...

  5. mysql 的load data infile要使用

    LOAD DATA INFILE从文本文件中读出的声明以极高的速度到表. 1.基本语法 LOAD DATA [LOW_PRIORITY | CONCURRENT] [LOCAL] INFILE 'fi ...

  6. mysql导入数据load data infile用法整理

    有时候我们需要将大量数据批量写入数据库,直接使用程序语言和Sql写入往往很耗时间,其中有一种方案就是使用MySql Load data infile导入文件的形式导入数据,这样可大大缩短数据导入时间. ...

  7. mysql 开发进阶篇系列 50 表的数据导入(load data infile,mysqlimport )

    一.概述 上篇讲到的表的数据导出(select .. into outfile 或者mysqldump),这篇继续讲表的数据导入,导入也同样有二个方法,分别是load data infile... 和 ...

  8. load data妙用

    load变量和用户变量的巧妙结合,实现灵活导入字段列(NO.1) LOAD DATA INFILE 'file.csv' INTO TABLE dados_meteo (@var1, @var2) S ...

  9. MySQL基础之---mysqlimport工具和LOAD DATA命令导入文本文件

     1.mysqlimport工具的使用 看一下命令的使用方法: shell > mysqlimport -u root -p [--LOCAL] DBname File [option] --f ...

随机推荐

  1. linq order by charindex 排序 按给定字符串顺序排序

    //list=list.OrderBy(ee => SqlFunctions.CharIndex("书记,主任,支部委员,村委委员,系统工作人员", ee.ZhiWu)).T ...

  2. 如何在ASP.Net中实现RSA加密

    在我们实际运用中,加密是保证数据安全的重要手段.以前使用ASP时,对数据加密可以使用MD5和SHA1算法,这两种算法虽然快捷有效,但是无法对通过它们加密的密文进行反运算,即是解密.因此需要解密数据的场 ...

  3. Android 手机卫士12--进程管理

    1.本进程不能被选中,所以先将checkbox隐藏掉--手机卫士 不能自杀 if(getItem(position).packageName.equals(getPackageName())){ ho ...

  4. ajax跨子域请求的两种现代方法

    因为面向互联网的性质,我们公司的大部分系统都采用多子域的方式进行开发和部署,以达到松耦合和分布式的目的,因此子系统间的交互不可避免.虽然通过后台的rpc框架解决了大部分的交互问题,但有些情况下,前端直 ...

  5. DirectShow程序运行过程简析

    这段时间一直在学习陆其明老师的<DirectShow开发指南>一书,书中对DirectShow的很多细节讲解清晰,但是却容易让人缺少对全局的把握.在学习过程中,整理了关于DirectSho ...

  6. 字母排序问题(c++实现)

    描述:编写一个程序,当输入不超过60个字符组成的英文文字时,计算机将这个句子中的字母按英文字典字母顺序重新排列,排列后的单词的长度要与原始句子中的长度 相同.例如: 输入: THE PRICE OFB ...

  7. sass开发过程中遇到的几个坑

    1.安装sass被墙的问题 安装完`ruby`后,打开`ruby cmd` 输入`gem install sass`,安装失败,有可能是镜像源的问题,也有可能是墙的问题. 因为公司内网的奇葩限制,各种 ...

  8. SharePoint 2010 文档管理之点击次数

    前言:很多场景下,我们都需要对一篇文章或者文档的点击次数进行统计,然而SharePoint本身并没有给我们设计这样一个字段,所以我们需要通过简单的字段开发来实现这样一个功能. 一.创建项目: 1. 创 ...

  9. 由于无法在数据库 'TestNonContainedDB' 上放置锁 ALTER DATABASE 失败

    Error: 消息5601,级别16,状态1,第1行,由于无法在数据库 'TestNonContainedDB' 上放置锁,ALTER DATABASE 失败.请稍后再试.消息5069,级别16,状态 ...

  10. 利用在线工具根据JSon数据自动生成对应的Java实体类

    如果你希望根据JSon数据自动生成对应的Java实体类,并且希望能进行变量的重命名,那么“JSON To Java”一定适合你.(下面的地址需要FQ) https://jsontojava.appsp ...