Spark执行失败时的一个错误分析
错误分析
堆栈信息中有一个错误信息:Job aborted due to stage failure: Task 1 in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2.0 (TID 264, idc-xx-xx-3-30.d.xx.com, executor 2): java.lang.OutOfMemoryError: Java heap space
根据提示信息可以得到以下几点
- stage由一堆task组成,也就是taskset,编号1的task在stage2中失败了4次
- executor 是实际执行task的节点,编号2的executor发生了Java heap space
- executor 内存配置的是512M,没有配置 spark.executor.memoryOverhead,spark在计算executor最终需要分配多少内存时有以下机制
- 未配置spark.executor.memoryOverhead来直接控制off-heap时(堆外内存,将对象序列化后放在一大块gc不会直接管理的内存中,需要的时候再反序列化使用,就像放到磁盘上一样,此处堆外内存包含了方法区,直接内存,虚拟机栈,本地方法栈)
realMem = executorMemory[heap] + (executorMemory * 0.10, with minimum of 384)[off-heap]
2)配置spark.executor.memoryOverhead
realMem = executorMemory[heap] + memoryOverhead[off-heap]
readMem表示java进程需要申请的总内存,如果超过container的内存容量,会被直接kill掉
异常种类
- OutOfMemoryError: Java heap space,堆内存不足,溢出,需调整--executor-memory
- OutOfMemoryError: Java permgen space,堆外内存不足,溢出,需调整spark.executor.memoryOverhead
下述异常属于Java heap space,调整--executor-memory
RDD的位置,根据MemoryMode可以选择是堆内或堆外
日志中查看到的异常信息
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:147)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:101)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)
at org.apache.spark.sql.Dataset.(Dataset.scala:185)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2.0 (TID 264, idc-xx-xx-3-30.d.xx.com, executor 2): java.lang.OutOfMemoryError: Java heap space
at org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:778)
at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:511)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:225)
at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137)
at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:184)
at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:132)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1005)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:996)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:936)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:996)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:700)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
导致异常的代码
/**
* @param f file to read the chunks from
* @return the chunks
* @throws IOException
*/
public List<Chunk> readAll(FSDataInputStream f) throws IOException {
List<Chunk> result = new ArrayList<Chunk>(chunks.size());
f.seek(offset);
byte[] chunksBytes = new byte[length]; //778行,分配长为length的byte[]时没有足够的可用内存导致heap space
f.readFully(chunksBytes);
// report in a counter the data we just scanned
BenchmarkCounter.incrementBytesRead(length);
int currentChunkOffset = 0;
for (int i = 0; i < chunks.size(); i++) {
ChunkDescriptor descriptor = chunks.get(i);
if (i < chunks.size() - 1) {
result.add(new Chunk(descriptor, chunksBytes, currentChunkOffset));
} else {
// because of a bug, the last chunk might be larger than descriptor.size
result.add(new WorkaroundChunk(descriptor, chunksBytes, currentChunkOffset, f));
}
currentChunkOffset += descriptor.size;
}
return result ;
}
Spark执行失败时的一个错误分析的更多相关文章
- Hadoop概念学习系列之为什么hadoop/spark执行作业时,输出路径必须要不存在?(三十九)
很多人只会,但没深入体会和想为什么要这样? 拿Hadoop来说,当然,spark也一样的道理. 输出路径由Hadoop自己创建,实际的结果文件遵守part-nnnn的约定. 如何指定一个已有目录作为H ...
- 程序中使用事务来管理sql语句的执行,执行失败时,可以达到回滚的要求。
1.设置使用事务的SQL执行语句 /// <summary> /// 使用有事务的SQL语句 /// </summary> /// <param name="s ...
- svn cleanup 执行失败时,可以勾选 break locks,
- 【转】mysql触发器的实战(触发器执行失败,sql会回滚吗)
1 引言Mysql的触发器和存储过程一样,都是嵌入到mysql的一段程序.触发器是mysql5新增的功能,目前线上凤巢系统.北斗系统以及哥伦布系统使用的数据库均是mysql5.0.45版本,很多程 ...
- ThreadPoolExecutor线程池任务执行失败的时候会怎样
接上一篇 <JDK1.8中的线程池> 1. 任务执行失败时的处理逻辑 1.1. Worker Worker相当于线程池中的线程 可以看到,Worker有几个重要的属性: thread ...
- pybot执行多条用例时,某一个用例执行失败,停止所有用例的执行
问题: pybot执行多条用例时,某一个用例执行失败,停止所有用例的执行 解决办法: pybot -exitonfailure E:\robot\呼送项目\测试用例\基本流程\主流程.txt 参考文章 ...
- vs2015 生成项目时,提示执行失败,参数错误
今天vs2015 生成项目时,提示执行失败,参数错误.查了很多资料未解决 后来,发现只有一个项目出现这个问题,其他项目生成正常.怀疑是该项目解决方案的问题 于是将解决项目中的项目移除,逐一生成引用,解 ...
- spark 在yarn执行job时一直抱0.0.0.0:8030错误
近日新写完的spark任务放到yarn上面执行时,在yarn的slave节点中一直看到报错日志:连接不到0.0.0.0:8030 . The logs are as below: 2014-08-11 ...
- 大数据学习day23-----spark06--------1. Spark执行流程(知识补充:RDD的依赖关系)2. Repartition和coalesce算子的区别 3.触发多次actions时,速度不一样 4. RDD的深入理解(错误例子,RDD数据是如何获取的)5 购物的相关计算
1. Spark执行流程 知识补充:RDD的依赖关系 RDD的依赖关系分为两类:窄依赖(Narrow Dependency)和宽依赖(Shuffle Dependency) (1)窄依赖 窄依赖指的是 ...
随机推荐
- MongoDB入门(介绍、安装、增删改查)
文章作者公众号bigsai,已收录在回车课堂,如有帮助还请不吝啬点个赞赞支持一下! 课程导学 大家好我是bigsai,我们都学过数据库,但你可能更熟悉关系(型)数据库例如MySQL,SQL SERVE ...
- NodeJS沙箱逃逸&&vm
NodeJS沙箱逃逸 关于nodejs的沙箱 使用场景 在线代码编辑器 第三方js代码 jsonp,like百度搜索框 https://www.baidu.com/s?wd=nodejs&mi ...
- 破晓行动----带你总结JVM的知识大全(二)
JVM运行时内存 + 垃圾回收与算法
- 自定义springboot - starter 实现日志打印,并支持动态可插拔
1. starter 命名规则: springboot项目有很多专一功能的starter组件,命名都是spring-boot-starter-xx,如spring-boot-starter-loggi ...
- Python练习题 002:奖金计算
[Python练习题 002]企业发放的奖金根据利润提成.利润(I)低于或等于10万元时,奖金可提10%:利润高于10万元,低于20万元时,低于10万元的部分按10%提成,高于10万元的部分,可可提成 ...
- 图像sensor的bitdepth
参考来源:https://blog.csdn.net/yuejisuo1948/article/details/83617359 bitdepth目前个人理解是sensor像素上表示颜色的范围,也可说 ...
- Spring Boot入门系列(二十)快速打造Restful API 接口
spring boot入门系列文章已经写到第二十篇,前面我们讲了spring boot的基础入门的内容,也介绍了spring boot 整合mybatis,整合redis.整合Thymeleaf 模板 ...
- JVM系列【3】Class文件加载过程
JVM系列笔记目录 虚拟机的基础概念 class文件结构 class文件加载过程 jvm内存模型 JVM常用指令 GC与调优 Class文件加载过程 JVM加载Class文件主要分3个过程:Loadi ...
- MYSQL账户是否不允许远程连接。如果无法连接可以尝试以下方法:
mysql账户是否不允许远程连接.如果无法连接可以尝试以下方法: mysql -u root -p //登录MySQL mysql> GRANT ALL PRIVILEGES ON *.* TO ...
- 题解:POI2012 Salaries
题解:POI2012 Salaries Description The Byteotian Software Corporation (BSC) has \(n\) employees. In BSC ...