OpLogMergeMessage-OutOfMemoryError-JavaHeapSpace
direct memory size
netty or oplog
5.5kw * 20
60G worker/ 26G MaxDirectMemorySize
1/2 tasks per worker both error
some tasks can work well
because of memory and multithreads pattern caused by resource scrambling
gc-log:
2018-11-09T14:10:47.973+0800: 7393.560: [CMS-concurrent-sweep: 0.241/0.241 secs] [Times: user=0.48 sys=0.00, real=0.24 secs]
2018-11-09T14:10:47.973+0800: 7393.560: [CMS-concurrent-reset-start]
2018-11-09T14:10:48.038+0800: 7393.625: [CMS-concurrent-reset: 0.065/0.065 secs] [Times: user=0.13 sys=0.00, real=0.07 secs]
2018-11-09T14:10:50.038+0800: 7395.625: [GC (CMS Initial Mark) [1 CMS-initial-mark: 25626226K(26204160K)] 39382689K(40762048K), 0.0139416 secs] [Times: user=0.02 sys=0.01, real=0.01 secs]
2018-11-09T14:10:50.052+0800: 7395.639: [CMS-concurrent-mark-start]
2018-11-09T14:10:50.427+0800: 7396.014: [CMS-concurrent-mark: 0.375/0.375 secs] [Times: user=2.59 sys=0.02, real=0.37 secs]
2018-11-09T14:10:50.427+0800: 7396.014: [CMS-concurrent-preclean-start]
2018-11-09T14:10:50.457+0800: 7396.044: [CMS-concurrent-preclean: 0.030/0.030 secs] [Times: user=0.06 sys=0.00, real=0.03 secs]
2018-11-09T14:10:50.457+0800: 7396.044: [CMS-concurrent-abortable-preclean-start]
2018-11-09T14:10:50.457+0800: 7396.044: [CMS-concurrent-abortable-preclean: 0.000/0.000 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2018-11-09T14:10:50.458+0800: 7396.045: [GC (CMS Final Remark) [YG occupancy: 13756466 K (14557888 K)]2018-11-09T14:10:50.458+0800: 7396.045: [GC (CMS Final Remark) 2018-11-09T14:10:50.458+0800: 7396.045: [ParNew: 13756466K->13756466K(14557888K), 0.0000233 secs] 39382693K->39382693K(40762048K), 0.0000914 secs] [Times: user=0.00 sys=0.00, real=0.00 secs]
2018-11-09T14:10:50.458+0800: 7396.045: [Rescan (parallel) , 0.0138796 secs]2018-11-09T14:10:50.472+0800: 7396.059: [weak refs processing, 0.0000406 secs]2018-11-09T14:10:50.472+0800: 7396.059: [class unloading, 0.0087389 secs]2018-11-09T14:10:50.481+0800: 7396.068: [scrub symbol table, 0.0055956 secs]2018-11-09T14:10:50.487+0800: 7396.074: [scrub string table, 0.0005615 secs][1 CMS-remark: 25626226K(26204160K)] 39382693K(40762048K), 0.0290641 secs] [Times: user=0.30 sys=0.00, real=0.02 secs]
2018-11-09T14:10:50.488+0800: 7396.075: [CMS-concurrent-sweep-start]
2018-11-09T14:10:50.729+0800: 7396.316: [CMS-concurrent-sweep: 0.241/0.241 secs] [Times: user=0.48 sys=0.00, real=0.24 secs]
2018-11-09T14:10:50.729+0800: 7396.316: [CMS-concurrent-reset-start]
2018-11-09T14:10:50.794+0800: 7396.381: [CMS-concurrent-reset: 0.065/0.065 secs] [Times: user=0.13 sys=0.00, real=0.06 secs]
2018-11-09T14:10:51.734+0800: 7397.321: [GC (Allocation Failure) 2018-11-09T14:10:51.734+0800: 7397.321: [ParNew: 14280769K->14280769K(14557888K), 0.0000297 secs]2018-11-09T14:10:51.734+0800: 7397.321: [CMS: 25626226K->25626226K(26204160K), 8.7144181 secs] 39906995K->39782608K(40762048K), [Metaspace: 37753K->37753K(38912K)], 8.7146944 secs] [Times: user=8.72 sys=0.00, real=8.72 secs]
2018-11-09T14:11:00.449+0800: 7406.036: [Full GC (Allocation Failure) 2018-11-09T14:11:00.449+0800: 7406.036: [CMS: 25626226K->25626196K(26204160K), 6.1291271 secs] 39782608K->39782578K(40762048K), [Metaspace: 37753K->37753K(38912K)], 6.1292957 secs] [Times: user=6.13 sys=0.00, real=6.13 secs]
2018-11-09T14:11:06.579+0800: 7412.166: [GC (CMS Initial Mark) [1 CMS-initial-mark: 25626196K(26204160K)] 39782578K(40762048K), 0.0017634 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
2018-11-09T14:11:06.581+0800: 7412.168: [CMS-concurrent-mark-start]
2018-11-09T14:11:06.840+0800: 7412.427: [Full GC (Allocation Failure) 2018-11-09T14:11:06.840+0800: 7412.427: [CMS2018-11-09T14:11:07.867+0800: 7413.454: [CMS-concurrent-mark: 1.033/1.286 secs] [Times: user=5.11 sys=0.61, real=1.28 secs]
(concurrent mode failure): 26150484K->26150474K(26204160K), 7.8489326 secs] 40314100K->39782414K(40762048K), [Metaspace: 37784K->37784K(38912K)], 7.8491778 secs] [Times: user=11.81 sys=0.39, real=7.85 secs]
2018-11-09T14:11:14.690+0800: 7420.277: [Full GC (Allocation Failure) 2018-11-09T14:11:14.690+0800: 7420.277: [CMS: 26150474K->26150474K(26204160K), 1.2736921 secs] 39782414K->39782404K(40762048K), [Metaspace: 37784K->37784K(38912K)], 1.2738487 secs] [Times: user=1.28 sys=0.00, real=1.27 secs]
- stdout
2018-11-09 14:09:01,703 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: for one batch with 400036 in 67002 ms..
2018-11-09 14:09:01,703 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: for one batch with stepRead is 0 ..
2018-11-09 14:09:05,694 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: dataBlock read finished with 41660237 ..
2018-11-09 14:09:07,408 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: Calculate Delta for update ...
2018-11-09 14:09:11,398 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMLearnerWAIC: Begin to update parameter in PS ...
2018-11-09 14:09:11,406 INFO [pool-6-thread-1] com.tencent.angel.ml.factorizationmachinesWAIC.FMModel: Start to push w0 from PS ...
2018-11-09 14:11:06,588 FATAL [pool-5-thread-1] com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache: merge OpLogMergeMessage [update=com.tencent.angel.ml.math.vector.DenseDoubleVector@77d861dc, toString()=OpLogMessage [matrixId=1, type=MERGE, context=com.tencent.angel.psagent.task.TaskContext@16aa8654TaskContext [index=0, matrix clocks=(matrixId=0,clock=2)(matrixId=1,clock=1)(matrixId=2,clock=1)], seqId=17]] falied,
java.lang.OutOfMemoryError: Java heap space
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:158)
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:169)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.resize(SparseDoubleVector.java:495)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:564)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:555)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:35)
at com.tencent.angel.ml.math.matrix.RowbaseMatrix.plusBy(RowbaseMatrix.java:126)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLog.merge(MatrixOpLog.java:160)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache$Merger.run(MatrixOpLogCache.java:444)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-11-09 14:11:15,964 FATAL [pool-5-thread-2] com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache: merge OpLogMergeMessage [update=com.tencent.angel.ml.math.vector.DenseDoubleVector@316dd3c4, toString()=OpLogMessage [matrixId=1, type=MERGE, context=com.tencent.angel.psagent.task.TaskContext@16aa8654TaskContext [index=0, matrix clocks=(matrixId=0,clock=2)(matrixId=1,clock=1)(matrixId=2,clock=1)], seqId=18]] falied,
java.lang.OutOfMemoryError: Java heap space
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:158)
at it.unimi.dsi.fastutil.ints.Int2DoubleOpenHashMap.<init>(Int2DoubleOpenHashMap.java:169)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.resize(SparseDoubleVector.java:495)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:564)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:555)
at com.tencent.angel.ml.math.vector.SparseDoubleVector.plusBy(SparseDoubleVector.java:35)
at com.tencent.angel.ml.math.matrix.RowbaseMatrix.plusBy(RowbaseMatrix.java:126)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLog.merge(MatrixOpLog.java:160)
at com.tencent.angel.psagent.matrix.oplog.cache.MatrixOpLogCache$Merger.run(MatrixOpLogCache.java:444)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-11-09 14:11:15,969 INFO [pool-5-thread-1] com.tencent.angel.psagent.PSAgent: psagent falied
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.Worker: worker failed message : merge OpLogMergeMessage [update=com.tencent.angel.ml.math.vector.DenseDoubleVector@77d861dc, toString()=OpLogMessage [matrixId=1, type=MERGE, context=com.tencent.angel.psagent.task.TaskContext@16aa8654TaskContext [index=0, matrix clocks=(matrixId=0,clock=2)(matrixId=1,clock=1)(matrixId=2,clock=1)], seqId=17]] falied, Java heap space, send it to appmaster success
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.Worker: start to close all modules in worker
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.Worker: stop workerService
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.worker.WorkerService: stop rpc server
2018-11-09 14:11:15,985 INFO [pool-5-thread-1] com.tencent.angel.ipc.NettyServer: Stopping server on 20586
2018-11-09 14:11:15,985 ERROR [Worker Heartbeat] com.tencent.angel.worker.Worker: report to appmaster failed, err:
java.lang.NullPointerException
at com.tencent.angel.worker.Worker.heartbeat(Worker.java:341)
at com.tencent.angel.worker.Worker.access$200(Worker.java:65)
at com.tencent.angel.worker.Worker$1.run(Worker.java:303)
at java.lang.Thread.run(Thread.java:745)
OpLogMergeMessage-OutOfMemoryError-JavaHeapSpace的更多相关文章
- OutOfMemoryError系列(1): Java heap space
每个Java程序都只能使用一定量的内存, 这种限制是由JVM的启动参数决定的.而更复杂的情况在于, Java程序的内存分为两部分: 堆内存(Heap space)和 永久代(Permanent Gen ...
- Java常见的几种内存溢出及解决方法
Java常见的几种内存溢出及解决方法[情况一]:java.lang.OutOfMemoryError:Javaheapspace:这种是java堆内存不够,一个原因是真不够(如递归的层数太多等),另一 ...
- 【转】JVM 堆内存设置原理
堆内存设置 原理 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Generation),主要存放的是Java类定 ...
- tomcat 启动时参数设置说明
使用Intellij idea 其发动tomcat时会配置启动vm options :-Xms128m -Xmx768m -XX:PermSize=64M -XX:MaxPermSize=512m. ...
- 巧解Tomcat中JVM内存溢出问题
你对Tomcat 的JVM内存溢出问题的解决方法是否了解,这里和大家分享一下,相信本文介绍一定会让你有所收获. tomcat 的JVM内存溢出问题的解决 最近在熟悉一个开发了有几年的项目,需要把数据库 ...
- JVM(Java虚拟机)优化大全和案例实战
堆内存设置 原理 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Generation),主要存放的是Java类定 ...
- Tomcat性能优化及JVM内存工作原理
Java性能优化原则:代码运算性能.内存回收.应用配置(影响Java程序主要原因是垃圾回收,下面会重点介绍这方面) 代码层优化:避免过多循环嵌套.调用和复杂逻辑. Tomcat调优主要内容如下: 1. ...
- 关于JVM的垃圾回收(GC) 这可能是你想了解的
目录 1 JVM中Java对象的分类 2 JVM的GC类型及触发条件 2.1 Young GC 2.2 Full GC 3 Java对象生成时的内存申请过程 3 Oracle JDK中的垃圾收集器 3 ...
- Permanent Space 和 Heap Space
JVM堆内存 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Generation),主要存放的是Java类定 ...
- Java------------JVM(Java虚拟机)优化大全和案例实战
JVM(Java虚拟机)优化大全和案例实战 堆内存设置 原理 JVM堆内存分为2块:Permanent Space 和 Heap Space. Permanent 即 持久代(Permanent Ge ...
随机推荐
- [物理学与PDEs]第2章习题10 一维理想流体力学方程组的 Lagrange 形式
试证明: 一维理想流体力学方程组的 Lagrange 形式 (5. 22)-(5. 24) 也可写成如下形式 $$\beex \bea \cfrac{\p \tau}{\p t}-\cfrac{\p ...
- css实现移动端水平滚动导航
<!DOCTYPE html> <html> <head lang="en"> <meta charset="UTF-8&quo ...
- 安装vs2017后造成无法打开xproj项目无法打开
安装vs2017后,再用vs2015打开xproj项目的时候会报错: Error MSB4019 The imported project "C:\Program Files\dotnet\ ...
- XSS闯关游戏准备阶段及XSS构造方法
请下载好XSS闯关文件后,解压后放在服务器的对应文件夹即可 在该闯关中,会在网页提示一个payload数值 payload,翻译过来是有效载荷 通常在传输数据时,为了使数据传输更可靠,要把原始数据分批 ...
- UML各种图总结-精华
UML(Unified Modeling Language)是一种统一建模语言,为面向对象开发系统的产品进行说明.可视化.和编制文档的一种标准语言.下面将对UML的九种图+包图的基本概念进行介绍以及各 ...
- c#基础之Type
官方文档:https://msdn.microsoft.com/zh-cn/library/system.type%28v=vs.110%29.aspx?f=255&MSPPError=-21 ...
- java第一个demo(简单登陆窗体)
首先新建一个Maven项目 选择一个存放项目的目录 ,点击完成(下图). 为了防止jdk版本的问题,所以在pom.xml里面做一个配置,让整个项目统一用jdk 1.8版本(1.7之前可能会存在一些问题 ...
- 根据SQL_ID查询并杀会话
Oracle 根据SQL_ID查询并杀会话,清空执行计划缓冲池2018年09月06日 10:31:40 小学生汤米 阅读数:4731. 查询最近五分钟内最高频次SQL,查看event select t ...
- Confluence 使用常见问题列表
Confluence 6 管理 Atlassian 提供的 App 摘要: Confluence 用户可以使用桌面应用来编辑一个已经上传到 Confluence 的文件,然后这个文件自动保存回 Con ...
- RSF 分布式 RPC 服务框架的分层设计
RSF 是个什么东西? 一个高可用.高性能.轻量级的分布式服务框架.支持容灾.负载均衡.集群.一个典型的应用场景是,将同一个服务部署在多个Server上提供 request.response 消息通知 ...