sqoop import mysql to hive table:GC overhead limit exceeded
1. Scenario description
when I use sqoop to import mysql table into hive, I got the following error:
// :: WARN hcat.SqoopHCatUtilities: The Sqoop job can fail if types are not assignment compatible
// :: WARN hcat.SqoopHCatUtilities: The HCatalog field submername has type string. Expected = varchar based on database column type : VARCHAR
// :: WARN hcat.SqoopHCatUtilities: The Sqoop job can fail if types are not assignment compatible
// :: INFO mapreduce.DataDrivenImportJob: Configuring mapper for HCatalog import job
// :: INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
// :: INFO client.RMProxy: Connecting to ResourceManager at hadoop-namenode01/192.168.1.101:
// :: WARN conf.HiveConf: HiveConf of name hive.server2.webui.host.port does not exist
// :: INFO Configuration.deprecation: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
// :: INFO db.DBInputFormat: Using read commited transaction isolation
// :: INFO mapreduce.JobSubmitter: number of splits:
// :: INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1562229385371_50086
// :: INFO impl.YarnClientImpl: Submitted application application_1562229385371_50086
// :: INFO mapreduce.Job: The url to track the job: http://hadoop-namenode01:8088/proxy/application_1562229385371_50086/
// :: INFO mapreduce.Job: Running job: job_1562229385371_50086
// :: INFO hive.metastore: Closed a connection to metastore, current connections:
// :: INFO mapreduce.Job: Job job_1562229385371_50086 running in uber mode : false
// :: INFO mapreduce.Job: map % reduce %
// :: INFO mapreduce.Job: Task Id : attempt_1562229385371_50086_m_000000_0, Status : FAILED
Error: GC overhead limit exceeded
Why Sqoop Import throws this exception?
The answer is – During the process, RDBMS database (NOT SQOOP) fetches all the rows at one shot and tries to load everything into memory. This causes memory spill out and throws error. To overcome this you need to tell RDBMS database to return the data in batches. The following parameters “?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true” following the jdbc connection string tells database to fetch 10000 rows per batch.
The script I use to import is as follows:
file sqoop_order_detail.sh
#!/bin/bash /home/lenmom/sqoop-1.4./bin/sqoop import \
--connect jdbc:mysql://lenmom-mysql:3306/inventory \
--username root --password root \
--driver com.mysql.jdbc.Driver \
--table order_detail \
--hcatalog-database orc \
--hcatalog-table order_detail \
--hcatalog-partition-keys pt_log_d \
--hcatalog-partition-values \
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \
-m
the target mysql table has 10 billion record.
2.Solution:
2.1 solution 1
modify the mysql url to set stream read data style by append the following content:
?dontTrackOpenResources=true&defaultFetchSize=&useCursorFetch=true
of which the defaultFetchSize can be changed according to specific condition,in my case, the whole script is :
#!/bin/bash /home/lenmom/sqoop-1.4./bin/sqoop import \
--connect jdbc:mysql://lenmom-mysql:3306/inventory?dontTrackOpenResources=true\&defaultFetchSize=10000\&useCursorFetch=true\&useUnicode=yes\&characterEncoding=utf8\&characterEncoding=utf8 \
--username root --password root \
--driver com.mysql.jdbc.Driver \
--table order_detail \
--hcatalog-database orc \
--hcatalog-table order_detail \
--hcatalog-partition-keys pt_log_d \
--hcatalog-partition-values \
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \
-m
don't forget to use escape for & in shell script, or we can also use "jdbc url" to instead of using escape.
#!/bin/bash /home/lenmom/sqoop-1.4./bin/sqoop import \
--connect "jdbc:mysql://lenmom-mysql:3306/inventory?dontTrackOpenResources=true&defaultFetchSize=10000&useCursorFetch=true&useUnicode=yes&characterEncoding=utf8&characterEncoding=utf8" \
--username root --password root \
--driver com.mysql.jdbc.Driver \
--table order_detail \
--hcatalog-database orc \
--hcatalog-table order_detail \
--hcatalog-partition-keys pt_log_d \
--hcatalog-partition-values \
--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")' \
-m
2.2 Solution 2
sqoop import -Dmapreduce.map.memory.mb= -Dmapreduce.map.java.opts=-Xmx1600m -Dmapreduce.task.io.sort.mb=
Above parameters needs to be tuned according to the data for a successful SQOOP pull.
2.3 Solution 3
increase mapper number(the default mapper number is 4, should not greater than datanode number)
sqoop job --exec lenmom-job -- --num-mappers ;
reference:
sqoop import mysql to hive table:GC overhead limit exceeded的更多相关文章
- troubleshooting-sqoop mysql导入hive 报:GC overhead limit exceeded
Halting due to Out Of Memory Error...18/09/13 21:42:17 INFO mapreduce.Job: Task Id : attempt_1536756 ...
- java.lang.OutOfMemoryError:GC overhead limit exceeded填坑心得
我遇到这样的问题,本地部署时抛出异常java.lang.OutOfMemoryError:GC overhead limit exceeded导致服务起不来,查看日志发现加载了太多资源到内存,本地的性 ...
- [转]java.lang.OutOfMemoryError:GC overhead limit exceeded
我遇到这样的问题,本地部署时抛出异常java.lang.OutOfMemoryError:GC overhead limit exceeded导致服务起不来,查看日志发现加载了太多资源到内存,本地的性 ...
- java.lang.OutOfMemoryError:GC overhead limit exceeded
在调测程序时报java.lang.OutOfMemoryError:GC overhead limit exceeded 错误 错误原因:在用程序进行数据切割时报了该错误.由于在本地执行数据切割测试的 ...
- Android:java.lang.OutOfMemoryError:GC overhead limit exceeded
Android编译:java.lang.OutOfMemoryError:GC overhead limit exceeded 百度好多什么JVM啊之类的东西,新手简单粗暴的办法: 1.在的Model ...
- oozie: GC overhead limit exceeded 解决方法
1.异常表现形式 1) 提示信息 Error java.lang.OutOfMemoryError: GC overhead limit exceeded 2)提示出错 Erro ...
- java.lang.OutOfMemoryError:GC overhead limit exceeded解决方法
异常如下:Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded 一.解 ...
- java.lang.OutOfMemoryError:GC overhead limit exceeded解决方
Tomcat异常信息: Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit excee ...
- solr索引报错(java.lang.OutOfMemoryError:GC overhead limit exceeded)
配置文件修改如下: <dataSource driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost:3 ...
随机推荐
- 应该知道的linux命令
常用命令 1.在compose Bar下可以对多个服务器同时进行操作.选择To All Sessions 2. 查看JAVA进程: ps -ef | grep java ps auxf | grep ...
- Eclipse的tab键为4个空格完整方法 附:阿里代码开发规范书
开发规范书:http://pan.baidu.com/s/1dESdyox 1.点击 window->preference-,依次选择 General->Editors->Text ...
- Spring框架的核心功能之AOP技术
技术分析之Spring框架的核心功能之AOP技术 AOP的概述 1. 什么是AOP的技术? * 在软件业,AOP为Aspect Oriented Programming的 ...
- Linux下dstat的安装(适用任何版本)
dstat下载地址:https://pan.baidu.com/s/1jHTEoWe 1.上传后,解压: 2.进入解压后的目录:cd dstat-0.7.3/ 3.make 4.make instal ...
- 学习Spring-Data-Jpa(五)---可嵌入对象和元素集合的使用
1.场景一:地址信息(省.市.县.详细地址)在很多实体中都需要,比如说作者有地址,订单也有地址,但是他们的地址并不能独立与他们存在,所以地址不能映射为实体,那么我们就需要在作者实体和订单实体中都添加这 ...
- 一.使用LDAP认证
作用:网络用户认证,用户集中管理 网络用户信息:LDAP服务器提供 本地用户信息:/etc/passwd /etc/shadow提供 LDAP服务器:虚拟机classroom LDAP ...
- Springboot如何优雅的解决ajax+自定义headers的跨域请求[转]
1.什么是跨域 由于浏览器同源策略(同源策略,它是由Netscape提出的一个著名的安全策略.现在所有支持JavaScript 的浏览器都会使用这个策略.所谓同源是指,域名,协议,端口相同.),凡是发 ...
- Linux运维:安装CentOS7图解
Ago linux运维群: 93324526 笔者QQ:578843228 此篇博文针对最小化安装,和只有图解.有不懂地方,欢迎加群询问. 此篇以CentOS7.2为例
- (1)Angular的开发
流行的ReactNative.Node.js.Angular.js.RXjs等技术 H5视频直播 ReactNative应用 JavaScript的新语法 高性能服务端框架 Webpack支撑大规模应 ...
- VMware安装VMwaretools
默认点击“安装VMware Tools(T)”选项下载好安装包 下载的安装包放在计算机的media目录下 进入/media/ubuntu14-04/VMware Tools目录: cd /media/ ...