错误分析

堆栈信息中有一个错误信息:Job aborted due to stage failure: Task 1 in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2.0 (TID 264, idc-xx-xx-3-30.d.xx.com, executor 2): java.lang.OutOfMemoryError: Java heap space

根据提示信息可以得到以下几点

  • stage由一堆task组成,也就是taskset,编号1的task在stage2中失败了4次
  • executor 是实际执行task的节点,编号2的executor发生了Java heap space
  • executor 内存配置的是512M,没有配置 spark.executor.memoryOverhead,spark在计算executor最终需要分配多少内存时有以下机制
  1. 未配置spark.executor.memoryOverhead来直接控制off-heap时(堆外内存,将对象序列化后放在一大块gc不会直接管理的内存中,需要的时候再反序列化使用,就像放到磁盘上一样,此处堆外内存包含了方法区,直接内存,虚拟机栈,本地方法栈)

    realMem = executorMemory[heap] + (executorMemory * 0.10, with minimum of 384)[off-heap]

    2)配置spark.executor.memoryOverhead

    realMem = executorMemory[heap] + memoryOverhead[off-heap]

readMem表示java进程需要申请的总内存,如果超过container的内存容量,会被直接kill掉

异常种类

  • OutOfMemoryError: Java heap space,堆内存不足,溢出,需调整--executor-memory
  • OutOfMemoryError: Java permgen space,堆外内存不足,溢出,需调整spark.executor.memoryOverhead

下述异常属于Java heap space,调整--executor-memory

RDD的位置,根据MemoryMode可以选择是堆内或堆外

日志中查看到的异常信息

: org.apache.spark.SparkException: Job aborted.

at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:147)

at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)

at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)

at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)

at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:121)

at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:101)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)

at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)

at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)

at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)

at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)

at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)

at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:92)

at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:92)

at org.apache.spark.sql.Dataset.(Dataset.scala:185)

at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)

at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:592)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

at py4j.Gateway.invoke(Gateway.java:280)

at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.GatewayConnection.run(GatewayConnection.java:214)

at java.lang.Thread.run(Thread.java:745)

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 2.0 failed 4 times, most recent failure: Lost task 1.3 in stage 2.0 (TID 264, idc-xx-xx-3-30.d.xx.com, executor 2): java.lang.OutOfMemoryError: Java heap space

at org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:778)

at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:511)

at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.checkEndOfRowGroup(VectorizedParquetRecordReader.java:270)

at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextBatch(VectorizedParquetRecordReader.java:225)

at org.apache.spark.sql.execution.datasources.parquet.VectorizedParquetRecordReader.nextKeyValue(VectorizedParquetRecordReader.java:137)

at org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)

at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)

at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:184)

at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)

at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.scan_nextBatch$(Unknown Source)

at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)

at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)

at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)

at org.apache.spark.sql.execution.columnar.InMemoryRelation$$anonfun$1$$anon$1.hasNext(InMemoryRelation.scala:132)

at org.apache.spark.storage.memory.MemoryStore.putIteratorAsValues(MemoryStore.scala:215)

at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1005)

at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:996)

at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:936)

at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:996)

at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:700)

at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:334)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:285)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

at org.apache.spark.scheduler.Task.run(Task.scala:99)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

导致异常的代码

/**
* @param f file to read the chunks from
* @return the chunks
* @throws IOException
*/
public List<Chunk> readAll(FSDataInputStream f) throws IOException {
List<Chunk> result = new ArrayList<Chunk>(chunks.size());
f.seek(offset);
byte[] chunksBytes = new byte[length]; //778行,分配长为length的byte[]时没有足够的可用内存导致heap space
f.readFully(chunksBytes);
// report in a counter the data we just scanned
BenchmarkCounter.incrementBytesRead(length);
int currentChunkOffset = 0;
for (int i = 0; i < chunks.size(); i++) {
ChunkDescriptor descriptor = chunks.get(i);
if (i < chunks.size() - 1) {
result.add(new Chunk(descriptor, chunksBytes, currentChunkOffset));
} else {
// because of a bug, the last chunk might be larger than descriptor.size
result.add(new WorkaroundChunk(descriptor, chunksBytes, currentChunkOffset, f));
}
currentChunkOffset += descriptor.size;
}
return result ;
}

Spark执行失败时的一个错误分析的更多相关文章

  1. Hadoop概念学习系列之为什么hadoop/spark执行作业时,输出路径必须要不存在?(三十九)

    很多人只会,但没深入体会和想为什么要这样? 拿Hadoop来说,当然,spark也一样的道理. 输出路径由Hadoop自己创建,实际的结果文件遵守part-nnnn的约定. 如何指定一个已有目录作为H ...

  2. 程序中使用事务来管理sql语句的执行,执行失败时,可以达到回滚的要求。

    1.设置使用事务的SQL执行语句 /// <summary> /// 使用有事务的SQL语句 /// </summary> /// <param name="s ...

  3. svn cleanup 执行失败时,可以勾选 break locks,

  4. 【转】mysql触发器的实战(触发器执行失败,sql会回滚吗)

    1   引言Mysql的触发器和存储过程一样,都是嵌入到mysql的一段程序.触发器是mysql5新增的功能,目前线上凤巢系统.北斗系统以及哥伦布系统使用的数据库均是mysql5.0.45版本,很多程 ...

  5. ThreadPoolExecutor线程池任务执行失败的时候会怎样

    接上一篇 <JDK1.8中的线程池> 1.  任务执行失败时的处理逻辑 1.1.  Worker Worker相当于线程池中的线程 可以看到,Worker有几个重要的属性: thread ...

  6. pybot执行多条用例时,某一个用例执行失败,停止所有用例的执行

    问题: pybot执行多条用例时,某一个用例执行失败,停止所有用例的执行 解决办法: pybot -exitonfailure E:\robot\呼送项目\测试用例\基本流程\主流程.txt 参考文章 ...

  7. vs2015 生成项目时,提示执行失败,参数错误

    今天vs2015 生成项目时,提示执行失败,参数错误.查了很多资料未解决 后来,发现只有一个项目出现这个问题,其他项目生成正常.怀疑是该项目解决方案的问题 于是将解决项目中的项目移除,逐一生成引用,解 ...

  8. spark 在yarn执行job时一直抱0.0.0.0:8030错误

    近日新写完的spark任务放到yarn上面执行时,在yarn的slave节点中一直看到报错日志:连接不到0.0.0.0:8030 . The logs are as below: 2014-08-11 ...

  9. 大数据学习day23-----spark06--------1. Spark执行流程(知识补充:RDD的依赖关系)2. Repartition和coalesce算子的区别 3.触发多次actions时,速度不一样 4. RDD的深入理解(错误例子,RDD数据是如何获取的)5 购物的相关计算

    1. Spark执行流程 知识补充:RDD的依赖关系 RDD的依赖关系分为两类:窄依赖(Narrow Dependency)和宽依赖(Shuffle Dependency) (1)窄依赖 窄依赖指的是 ...

随机推荐

  1. MongoDB入门(介绍、安装、增删改查)

    文章作者公众号bigsai,已收录在回车课堂,如有帮助还请不吝啬点个赞赞支持一下! 课程导学 大家好我是bigsai,我们都学过数据库,但你可能更熟悉关系(型)数据库例如MySQL,SQL SERVE ...

  2. NodeJS沙箱逃逸&&vm

    NodeJS沙箱逃逸 关于nodejs的沙箱 使用场景 在线代码编辑器 第三方js代码 jsonp,like百度搜索框 https://www.baidu.com/s?wd=nodejs&mi ...

  3. 破晓行动----带你总结JVM的知识大全(二)

    JVM运行时内存 + 垃圾回收与算法

  4. 自定义springboot - starter 实现日志打印,并支持动态可插拔

    1. starter 命名规则: springboot项目有很多专一功能的starter组件,命名都是spring-boot-starter-xx,如spring-boot-starter-loggi ...

  5. Python练习题 002:奖金计算

    [Python练习题 002]企业发放的奖金根据利润提成.利润(I)低于或等于10万元时,奖金可提10%:利润高于10万元,低于20万元时,低于10万元的部分按10%提成,高于10万元的部分,可可提成 ...

  6. 图像sensor的bitdepth

    参考来源:https://blog.csdn.net/yuejisuo1948/article/details/83617359 bitdepth目前个人理解是sensor像素上表示颜色的范围,也可说 ...

  7. Spring Boot入门系列(二十)快速打造Restful API 接口

    spring boot入门系列文章已经写到第二十篇,前面我们讲了spring boot的基础入门的内容,也介绍了spring boot 整合mybatis,整合redis.整合Thymeleaf 模板 ...

  8. JVM系列【3】Class文件加载过程

    JVM系列笔记目录 虚拟机的基础概念 class文件结构 class文件加载过程 jvm内存模型 JVM常用指令 GC与调优 Class文件加载过程 JVM加载Class文件主要分3个过程:Loadi ...

  9. MYSQL账户是否不允许远程连接。如果无法连接可以尝试以下方法:

    mysql账户是否不允许远程连接.如果无法连接可以尝试以下方法: mysql -u root -p //登录MySQL mysql> GRANT ALL PRIVILEGES ON *.* TO ...

  10. 题解:POI2012 Salaries

    题解:POI2012 Salaries Description The Byteotian Software Corporation (BSC) has \(n\) employees. In BSC ...