Sqoop Java API 导入应用案例
环境信息:
Linux+JDK1.7
Sqoop 1.4.6-cdh5.5.2
hadoop-core 2.6.0-mr1-cdh5.5.2
hadoop-common 2.6.0-cdh5.5.2
hadoop-mapreduce-client-core 2.6.0-cdh5.5.2
需求:
将oracle中的某表导入到hdfs
实现:
首先组织Sqoop命令:
String[] args = new String[] { // Oracle数据库信息
"--connect","jdbc:oracle:thin:@***:1522/**",
"-username","***",
"-password","***",
// 查询sql
"--query","select * from TABLE_NAME where $CONDITIONS and create_date>=date'2017-05-01' and create_date<date'2017-06-01' ",
"-split-by","id",
"--hive-overwrite",
"--fields-terminated-by","'\\001'",
"--hive-drop-import-delims",
"--null-string","'\\\\N'",
"--null-non-string","'\\\\N'",
"--verbose",
"--target-dir","/user/hive/warehouse/test.db/H_TABLE_NAME"
};
执行Sqoop任务:
String[] expandArguments = OptionsFileUtil.expandArguments(args);
SqoopTool tool = SqoopTool.getTool("import");
Configuration conf = new Configuration();
conf.set("fs.default.name", "hdfs://nameservice1");//设置HDFS服务地址
Configuration loadPlugins = SqoopTool.loadPlugins(conf); Sqoop sqoop = new Sqoop((com.cloudera.sqoop.tool.SqoopTool) tool, loadPlugins);
int res = Sqoop.runSqoop(sqoop, expandArguments);
if (res == 0)
log.info ("成功");
完成编码后,发到测试环境进行测试,发现Sqoop在进行动态编译时报编译错误:
2017-07-26 15:10:15 [ERROR] [http-0.0.0.0-8080-6] [org.apache.sqoop.tool.ImportTool.run(ImportTool.java:613)] Encountered IOException running import job: java.io.IOException: Error returned by javac
at org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:217)
at org.apache.sqoop.tool.CodeGenTool.generateORM(CodeGenTool.java:108)
at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:478)
at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
动态编译的日志如果没有特殊配置的话,是无法通过log4j进行输出的,因此,编译错误需要到系统日志里查找:
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:7: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.BytesWritable;
^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:8: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:9: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Writable;
^
/tmp/sqoop-deploy/compile/b78440d7bc7097805be8b088c525566b/QueryResult.java:37: error: cannot access Writable
public class QueryResult extends SqoopRecord implements DBWritable, Writable {
如上,推测是动态编译环境的classpath没有包含hadoop-common包导致的,在CompilationManager里查到了如下内容:
private String findHadoopJars() {
String hadoopMapRedHome = options.getHadoopMapRedHome();
if (null == hadoopMapRedHome) {
LOG.info("$HADOOP_MAPRED_HOME is not set");
return Jars.getJarPathForClass(JobConf.class);
}
if (!hadoopMapRedHome.endsWith(File.separator)) {
hadoopMapRedHome = hadoopMapRedHome + File.separator;
}
File hadoopMapRedHomeFile = new File(hadoopMapRedHome);
LOG.info("HADOOP_MAPRED_HOME is " + hadoopMapRedHomeFile.getAbsolutePath());
Iterator<File> filesIterator = FileUtils.iterateFiles(hadoopMapRedHomeFile,
new String[] { "jar" }, true);
StringBuilder sb = new StringBuilder();
while (filesIterator.hasNext()) {
File file = filesIterator.next();
String name = file.getName();
if (name.startsWith("hadoop-common")
|| name.startsWith("hadoop-mapreduce-client-core")
|| name.startsWith("hadoop-core")) {
sb.append(file.getAbsolutePath());
sb.append(File.pathSeparator);
}
}
if (sb.length() < 1) {
LOG.warn("HADOOP_MAPRED_HOME appears empty or missing");
return Jars.getJarPathForClass(JobConf.class);
}
String s = sb.substring(0, sb.length() - 1);
LOG.debug("Returning jar file path " + s);
return s;
}
推测是由于配置里没有hadoopMapRedHome这个参数,导致这个方法只能取到JobConf.class所在的jar包,即hadoop-core包。打开DEBUG进行验证,找到如下日志:
2017-07-26 15:10:14 [INFO] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.findHadoopJars(CompilationManager.java:85)] $HADOOP_MAPRED_HOME is not set
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:171)] Current sqoop classpath = :/usr/local/tomcat6/bin/bootstrap.jar
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:195)] Adding source file: /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/QueryResult.java
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:199)] Invoking javac with args:
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] -sourcepath
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] -d
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] /tmp/sqoop-deploy/compile/1baf2f947722b9531d4a27b1e5ef5aca/
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] -classpath
2017-07-26 15:10:14 [DEBUG] [http-0.0.0.0-8080-6] [org.apache.sqoop.orm.CompilationManager.compile(CompilationManager.java:201)] :/usr/local/tomcat6/bin/bootstrap.jar:/var/www/webapps/***/WEB-INF/lib/hadoop-core-2.6.0-mr1-cdh5.5.2.jar:/var/www/webapps/***/WEB-INF/lib/sqoop-1.4.6-cdh5.5.2.jar
果然是缺少了jar包。在CompilationManager中查到classpath的组装方式如下:
String curClasspath = System.getProperty("java.class.path");
LOG.debug("Current sqoop classpath = " + curClasspath);
args.add("-sourcepath");
args.add(jarOutDir);
args.add("-d");
args.add(jarOutDir);
args.add("-classpath");
args.add(curClasspath + File.pathSeparator + coreJar + sqoopJar);
可以通过两种方式将缺失的jar添加进去:
1.直接修改java.class.path:
String curClasspath = System.getProperty ("java.class.path");
curClasspath = curClasspath
+ File.pathSeparator
+ "/var/www/webapps/***/WEB-INF/lib/hadoop-common-2.6.0-cdh5.5.2.jar"
+ File.pathSeparator
+ "/var/www/webapps/***/WEB-INF/lib/hadoop-mapreduce-client-core-2.6.0-cdh5.5.2.jar";
System.setProperty ("java.class.path", curClasspath);
2.增加配置项(未尝试):
--hadoop-mapred-home <dir> 指定$HADOOP_MAPRED_HOME路径
使用第一种方式后,已经能够正常进行导入操作:
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1547)] Job complete: job_local703153215_0001
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:566)] Counters: 18
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)] File System Counters
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] FILE: Number of bytes read=15015144
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] FILE: Number of bytes written=15688984
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] FILE: Number of read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] FILE: Number of large read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] FILE: Number of write operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] HDFS: Number of bytes read=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] HDFS: Number of bytes written=1536330810
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] HDFS: Number of read operations=40
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] HDFS: Number of large read operations=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] HDFS: Number of write operations=36
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:568)] Map-Reduce Framework
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Map input records=3272909
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Map output records=3272909
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Input split bytes=455
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Spilled Records=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] CPU time spent (ms)=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Physical memory (bytes) snapshot=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Virtual memory (bytes) snapshot=0
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.hadoop.mapred.Counters.log(Counters.java:570)] Total committed heap usage (bytes)=4080271360
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:184)] Transferred 1.4308 GB in 71.5332 seconds (20.4822 MB/sec)
2017-07-26 15:52:00 [INFO] [http-0.0.0.0-8080-1] [org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:186)] Retrieved 3272909 records.
至此,Sqoop Java API 导入demo完成。
参考文章:
http://shiyanjun.cn/archives/624.html Sqoop-1.4.4工具import和export使用详解
http://blog.csdn.net/sl1992/article/details/53521819 Java操作Sqoop对象
Sqoop Java API 导入应用案例的更多相关文章
- ElasticSearch7.3学习(二十九)----聚合实战之使用Java api实现电视案例
一.数据准备 创建索引及映射 建立价格.颜色.品牌.售卖日期字段 PUT /tvs PUT /tvs/_mapping { "properties": { "price& ...
- _00017 Kafka的体系结构介绍以及Kafka入门案例(0基础案例+Java API的使用)
博文作者:妳那伊抹微笑 itdog8 地址链接 : http://www.itdog8.com(个人链接) 博客地址:http://blog.csdn.net/u012185296 博文标题:_000 ...
- Phoenix简介概述,Phoenix的Java API 相关操作优秀案例
Phoenix简介概述,Phoenix的Java API 相关操作优秀案例 一.Phoenix概述简介 二.Phoenix实例一:Java API操作 2.1 phoenix.properties 2 ...
- Json for Java API学习
首先声明:本文来个非常多网友的博客,我通过參考了他们的博客,大致的了解了一些项目中经常使用的Json in java 类和方法,以及关于json的个人理解 个人对json的一些简单理解 在近期的学习中 ...
- [转]Java中导入、导出Excel
原文地址:http://blog.csdn.net/jerehedu/article/details/45195359 一.介绍 当前B/S模式已成为应用开发的主流,而在企业办公系统中,常常有客户这样 ...
- Java API 快速速查宝典
Java API 快速速查宝典 作者:明日科技,陈丹丹,李银龙,王国辉 著 出版社:人民邮电出版社 出版时间:2012年5月 Java编程的最基本要素是方法.属性和事件,掌握这些要素,就掌握了解决实际 ...
- sqoop1.99.4 JAVA API操作
貌似天国还没有介绍1.99.4的java操作代码的,自己吃一次螃蟹吧 如果你是MAVEN项目 <dependency> <groupId>org.apache.sqoop< ...
- Java的导入与导出Excel
使用Jakarta POI导入.导出Excel Jakarta POI 是一套用于访问微软格式文档的Java API.Jakarta POI有很多组件组成,其中有用于操作Excel格式文件的HSSF和 ...
- MyEclipse下查看Java API帮助文档
每次重装JDK或者升级JDK时,都会忘了如何使MyEclipse关联帮助文档.然后,再花十几分钟重新google搜索,麻烦! 首先下载Javadoc api帮助文档,google搜一下就行了. MyE ...
随机推荐
- java基础(八章)
一. 什么是数组及其作用? 定义:具有相同数据类型的一个集合 作用:存储连续的具有相同类型的数据 二. java中如何声明和定义数组 l 声明和定义的语法: 数据类型[ ...
- [UWP]用Shape做动画
相对于WPF/Silverlight,UWP的动画系统可以说有大幅提高,不过本文无意深入讨论这些动画API,本文将介绍使用Shape做一些进度.等待方面的动画,除此之外也会介绍一些相关技巧. 1. 使 ...
- MySql数据库基础操作——数据库、用户的创建,表的制作、修改等
MySql 是一款使用便捷.轻量级的数据库.因为他体积小.速度快.安装使用简单.开源等优点,目前是使用最广泛的数据库.目前位于Oracle甲骨文公司旗下.那今天我们就来介绍一下数据库的基本操作.具体介 ...
- KVM之Live Migration
1.安装KVM必要的软件包 #sudo apt-get install qemu-kvm bridge-utilus 2.制作虚拟机映像ubuntu-12.04.qcow2 $qemu-img cre ...
- spring4 之 helloworld
1.从官网下载相关JAR包 spring-framework-4.2.1.RELEASE-dist(下载地址:http://maven.springframework.org/release/org/ ...
- Phpcms V9缩略图裁剪存在黑边的解决方法
最近用Phpcms v9又碰到一个老问题:在内容页缩略图裁剪的时候出现黑边,这种情况很久没碰到,估计是长宽不同或者会在首页.列表页.内容页不同地方偶然出现的情况,在这里分享下Phpcms V9缩略图裁 ...
- Random随机数种子生成,减少生成重复随机数的可能
我们都知道使用Random可以生成随机数,默认的无参的构造函数New Random().使用与时间相关的默认种子值,初始化 System.Random 类的新实例. 这种方式生成随机数时重复的概率很大 ...
- Java 基础 程序流程控制 (上)
Java程序流程控制 (上) Java程序大体分为三种流程控制结构:顺序结构.分支结构.循环结构 顺序结构 程序由上到下的逐行执行,中间没有任何跳转和判断语句. 示例代码如下: public clas ...
- linux开机启动smb服务
修改/etc/rc.local文件(增加红色部分) [root@localhost ~]# cat /etc/rc.local #!/bin/sh## This script will be exec ...
- Hibernate缓存和懒加载的坑你知道多少?这5个简单问题回答不上来就不敢说会用hibernate
问题1:session.flush()调用之后,懒加载还生效吗? 如果不生效,那是抛异常还是没有任何反应,或者直接返回null? 答案:生效.可以理解为在同一个session当中,懒加载只会执行一次. ...