原文地址:https://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs

Understanding CMS GC Logs

By Poonam-Oracle on Mar 23, 2006

CMS GC with -XX:+PrintGCDetails and -XX:+PrintGCTimeStamps prints a lot of information. Understanding this information can help in fine tuning various parameters of the application and CMS to achieve best performance.

Let's have a look at some of the CMS logs generated with 1.4.2_10:

39.910: [GC 39.910: [ParNew: 261760K->0K(261952K), 0.2314667 secs] 262017K->26386K(1048384K), 0.2318679 secs]

Young generation (ParNew) collection. Young generation capacity is 261952K and after the collection its occupancy drops down from 261760K to 0. This collection took 0.2318679 secs.

40.146: [GC [1 CMS-initial-mark: 26386K(786432K)] 26404K(1048384K), 0.0074495 secs]

Beginning of tenured generation collection with CMS collector. This is initial Marking phase of CMS where all the objects directly reachable from roots are marked and this is done with all the mutator threads stopped.

Capacity of tenured generation space is 786432K and CMS was triggered at the occupancy of 26386K.

40.154: [CMS-concurrent-mark-start]

Start of concurrent marking phase. 
In Concurrent Marking phase, threads stopped in the first phase are started again and all the objects transitively reachable from the objects marked in first phase are marked here.

40.683: [CMS-concurrent-mark: 0.521/0.529 secs]

Concurrent marking took total 0.521 seconds cpu time and 0.529 seconds wall time that includes the yield to other threads also.

40.683: [CMS-concurrent-preclean-start]

Start of precleaning. 
Precleaning is also a concurrent phase. Here in this phase we look at the objects in CMS heap which got updated by promotions from young generation or new allocations or got updated by mutators while we were doing the concurrent marking in the previous concurrent marking phase. By rescanning those objects concurrently, the precleaning phase helps reduce the work in the next stop-the-world “remark” phase.

40.701: [CMS-concurrent-preclean: 0.017/0.018 secs]

Concurrent precleaning took 0.017 secs total cpu time and 0.018 wall time.

40.704: [GC40.704: [Rescan (parallel) , 0.1790103 secs]40.883: [weak refs processing, 0.0100966 secs] [1 CMS-remark: 26386K(786432K)] 52644K(1048384K), 0.1897792 secs]

Stop-the-world phase. This phase rescans any residual updated objects in CMS heap, retraces from the roots and also processes Reference objects. Here the rescanning work took 0.1790103 secs and weak reference objects processing took 0.0100966 secs. This phase took total 0.1897792 secs to complete.

40.894: [CMS-concurrent-sweep-start]

Start of sweeping of dead/non-marked objects. Sweeping is concurrent phase performed with all other threads running.

41.020: [CMS-concurrent-sweep: 0.126/0.126 secs]

Sweeping took 0.126 secs.

41.020: [CMS-concurrent-reset-start]

Start of reset.

41.147: [CMS-concurrent-reset: 0.127/0.127 secs]

In this phase, the CMS data structures are reinitialized so that a new cycle may begin at a later time. In this case, it took 0.127 secs.

This was how a normal CMS cycle runs. Now let us look at some other CMS log entries:

197.976: [GC 197.976: [ParNew: 260872K->260872K(261952K), 0.0000688 secs]197.976: [CMS197.981: [CMS-concurrent-sweep: 0.516/0.531 secs]
(concurrent mode failure): 402978K->248977K(786432K), 2.3728734 secs] 663850K->248977K(1048384K), 2.3733725 secs]

This shows that a ParNew collection was requested, but it was not attempted because it was estimated that there was not enough space in the CMS generation to promote the worst case surviving young generation objects. We name this failure as “full promotion guarantee failure”.

Due to this, Concurrent Mode of CMS is interrupted and a Full GC is invoked at 197.981. This mark-sweep-compact stop-the-world Full GC took 2.3733725 secs and the CMS generation space occupancy dropped from 402978K to 248977K.

The concurrent mode failure can either be avoided by increasing the tenured generation size or initiating the CMS collection at a lesser heap occupancy by setting CMSInitiatingOccupancyFraction to a lower value and setting UseCMSInitiatingOccupancyOnly to true. The value for CMSInitiatingOccupancyFraction should be chosen appropriately because setting it to a very low value will result in too frequent CMS collections.

Sometimes we see these promotion failures even when the logs show that there is enough free space in tenured generation. The reason is 'fragmentation' - the free space available in tenured generation is not contiguous, and promotions from young generation require a contiguous free block to be available in tenured generation. CMS collector is a non-compacting collector, so can cause fragmentation of space for some type of applications. In his blog, Jon talks in detail on how to deal with this fragmentation problem:
http://blogs.sun.com/roller/page/jonthecollector?entry=when_the_sum_of_the

Starting with 1.5, for the CMS collector, the promotion guarantee check is done differently. Instead of assuming that the promotions would be worst case i.e. all of the surviving young generation objects would get promoted into old gen, the expected promotion is estimated based on recent history of promotions. This estimation is usually much smaller than the worst case promotion and hence requires less free space to be available in old generation. And if the promotion in a scavenge attempt fails, then the young generation is left in a consistent state and a stop-the-world mark-compact collection is invoked. To get the same functionality with UseSerialGC you need to explicitly specify the switch -XX:+HandlePromotionFailure.

283.736: [Full GC 283.736: [ParNew: 261599K->261599K(261952K), 0.0000615 secs] 826554K->826554K(1048384K), 0.0003259 secs]
GC locker: Trying a full collection because scavenge failed
283.736: [Full GC 283.736: [ParNew: 261599K->261599K(261952K), 0.0000288 secs]

Stop-the-world GC happening when a JNI Critical section is released. Here again the young generation collection failed due to “full promotion guarantee failure” and then the Full GC is being invoked.

CMS can also be run in incremental mode (i-cms), enabled with -XX:+CMSIncrementalMode. In this mode, CMS collector does not hold the processor for the entire long concurrent phases but periodically stops them and yields the processor back to other threads in application. It divides the work to be done in concurrent phases in small chunks(called duty cycle) and schedules them between minor collections. This is very useful for applications that need low pause times and are run on machines with small number of processors.

Some logs showing the incremental CMS.

2803.125: [GC 2803.125: [ParNew: 408832K->0K(409216K), 0.5371950 secs] 611130K->206985K(1048192K) icms_dc=4 , 0.5373720 secs]
2824.209: [GC 2824.209: [ParNew: 408832K->0K(409216K), 0.6755540 secs] 615806K->211897K(1048192K) icms_dc=4 , 0.6757740 secs]

Here, the scavenges took respectively 537 ms and 675 ms. In between these two scavenges, iCMS ran for a brief period as indicated by the icms_dc value, which indicates a duty-cycle. In this case the duty cycle was 4%. A simple calculation shows that the iCMS incremental step lasted for 4/100 \* (2824.209 - 2803.125 - 0.537) = 821 ms, i.e. 4% of the time between the two scavenges.

Starting with 1.5, CMS has one more phase – concurrent abortable preclean. Abortable preclean is run between a 'concurrent preclean' and 'remark' until we have the desired occupancy in eden. This phase is added to help schedule the 'remark' phase so as to avoid back-to-back pauses for a scavenge closely followed by a CMS remark pause. In order to maximally separate a scavenge from a CMS remark pause, we attempt to schedule the CMS remark pause roughly mid-way between scavenges.

There is a second reason why we do this. Immediately following a scavenge there are likely a large number of grey objects that need rescanning. The abortable preclean phase tries to deal with such newly grey objects thus reducing a subsequent CMS remark pause.

The scheduling of 'remark' phase can be controlled by two jvm options CMSScheduleRemarkEdenSizeThreshold and CMSScheduleRemarkEdenPenetration. The defaults for these are 2m and 50% respectively. The first parameter determines the Eden size below which no attempt is made to schedule the CMS remark pause because the pay off is expected to be minuscule. The second parameter indicates the Eden occupancy at which a CMS remark is attempted.

After 'concurrent preclean' if the Eden occupancy is above CMSScheduleRemarkEdenSizeThreshold, we start 'concurrent abortable preclean' and continue precleanig until we have CMSScheduleRemarkEdenPenetration percentage occupancy in eden, otherwise we schedule 'remark' phase immediately.

7688.150: [CMS-concurrent-preclean-start]
7688.186: [CMS-concurrent-preclean: 0.034/0.035 secs]
7688.186: [CMS-concurrent-abortable-preclean-start]
7688.465: [GC 7688.465: [ParNew: 1040940K->1464K(1044544K), 0.0165840 secs] 1343593K->304365K(2093120K), 0.0167509 secs]
7690.093: [CMS-concurrent-abortable-preclean: 1.012/1.907 secs]
7690.095: [GC[YG occupancy: 522484 K (1044544 K)]7690.095: [Rescan (parallel) , 0.3665541 secs]7690.462: [weak refs processing, 0.0003850 secs] [1 CMS-remark: 302901K(1048576K)] 825385K(2093120K), 0.3670690 secs]

In the above log, after a preclean, 'abortable preclean' starts. After the young generation collection, the young gen occupancy drops down from 1040940K to 1464K. When young gen occupancy reaches 522484K which is 50% of the total capacity, precleaning is aborted and 'remark' phase is started.

Note that in 1.5, young generation occupancy also gets printed in the final remark phase.

For more detailed information and tips on GC tuning, please refer to the following documents:
http://java.sun.com/docs/hotspot/gc5.0/gc_tuning_5.html
http://java.sun.com/docs/hotspot/gc1.4.2/

Understanding CMS GC Logs--转载的更多相关文章

  1. 【转载】为什么不建议<=3G的情况下使用CMS GC

    之前曾经有讲过在heap size<=3G的情况下完全不要考虑CMS GC,在heap size>3G的情况下也优先选择ParallelOldGC,而不是CMS GC,只有在暂停时间无法接 ...

  2. Understanding G1 GC Logs--转载

    原文地址:https://blogs.oracle.com/poonam/entry/understanding_g1_gc_logs Understanding G1 GC Logs By Poon ...

  3. java CMS gc解析

    转载: http://www.blogjava.net/killme2008/archive/2009/09/22/295931.html     CMS,全称Concurrent Low Pause ...

  4. 理解CMS GC日志

    本文翻译自:https://blogs.oracle.com/poonam/entry/understanding_cms_gc_logs 准备工作 JVM的GC日志的主要参数包括如下几个:-XX:+ ...

  5. 一次CMS GC问题排查过程(理解原理+读懂GC日志)

    这个是之前处理过的一个线上问题,处理过程断断续续,经历了两周多的时间,中间各种尝试,总结如下.这篇文章分三部分: 1.问题的场景和处理过程:2.GC的一些理论东西:3.看懂GC的日志 先说一下问题吧 ...

  6. [转]一次CMS GC问题排查过程(理解原理+读懂GC日志)

    这个是之前处理过的一个线上问题,处理过程断断续续,经历了两周多的时间,中间各种尝试,总结如下.这篇文章分三部分: 1.问题的场景和处理过程:2.GC的一些理论东西:3.看懂GC的日志 先说一下问题吧 ...

  7. 由「Metaspace容量不足触发CMS GC」从而引发的思考

    https://mp.weixin.qq.com/s/1VP7l9iuId_ViP1Z_vCA-w 某天早上,毛老师在群里问「cat 上怎么看 gc」. 好好的一个群 看到有 GC 的问题,立马做出小 ...

  8. Java中9种常见的CMS GC问题分析与解决

    1. 写在前面 | 本文主要针对 Hotspot VM 中"CMS + ParNew"组合的一些使用场景进行总结.重点通过部分源码对根因进行分析以及对排查方法进行总结,排查过程会省 ...

  9. CMS GC启动参数优化配置

    简介: java启动参数共分为三类: 其一是标准参数(-),所有的JVM实现都必须实现这些参数的功能,而且向后兼容: 其二是非标准参数(-X),默认jvm实现这些参数的功能,但是并不保证所有jvm实现 ...

随机推荐

  1. Jquery EasyUI封装简化操作

    //confirm function Confirm(msg, control) { $.messager.confirm('确认', msg, function (r) { if (r) { eva ...

  2. Correct use of System.Web.HttpResponse.Redirect

    from:https://blogs.msdn.microsoft.com/tmarq/2009/06/25/correct-use-of-system-web-httpresponse-redire ...

  3. 了解 JavaScript (4)– 第一个 Web 应用程序

    在下面的例子中,我们将要构建一个 Bingo 卡片游戏,每个示例演示 JavaScript 的不同方面,通过每次的改进将会得到最终有效的 Bingo 卡片. Bingo 卡片的内容 美国 Bingo ...

  4. fine-uploader 5.11.8 (最新版) 使用感受

    以前测试和使用过 3.X 因为够用,所以一直没升级,今天闲来无聊测试了一下最新版.和3.X比性能上好了很多,但不支持了移动设备的多文件上传. 关于3.X版 可以看这里:http://www.cnblo ...

  5. Oracle中的索引详解

    Oracle中的索引概述 索引与表一样,也属于段(segment)的一种.里面存放了用户的数据,跟表一样需要占用磁盘空间.索引是一种允许直接访问数据表中某一数据行的树型结构,为了提高查询效率而引入,是 ...

  6. 【Vegas原创】安装rhel6.2,不能进图形化界面的终极解决方法

    安装的时候,千万不要一路下一步,you should know,linux不是windows那么的傻瓜.   方法一: 在倒数最后一步,选择Desktop,而千万不要下一步,默认选择Basic Ser ...

  7. Linux2.6 内核的 Initrd 机制解析(转)

    from: https://www.ibm.com/developerworks/cn/linux/l-k26initrd/ 简介: Linux 的 initrd 技术是一个非常普遍使用的机制,lin ...

  8. 最近使用ajaxFileUpload和Jcrop来实现图片上传和截图,出现一个图片无法更换的问题

    使用setImage 都无法更换 刷新图片 找了很久 什么方法都找过,最后发现...... 原来是 上传的图片的命名问题... 每次上传的图片 保存后都是同样的图片, 所以返回路径都是一样... jc ...

  9. benchmark pm2的cluster模式发布web app的性能与相关问题解决方法

    pm2以cluster集群方式发布app,可以高效地利用多核cpu,有效提升吞吐量.在上周对公司的redmine服务器进行性能调优后,深感ruby on rails的性能低下,这次测试nodejs的s ...

  10. BW:如何加载和生成自定义的层次结构,在不使用平面文件的SAP业务信息仓库

    介绍 通常情况下,报告需要在一个类似树的结构来显示数据.通过启用此特性在SAP BW层次结构.高级数据显示的层次结构的顶层节点.更详细的数据可以向下钻取到的层次结构中的下级节点的可视化. 考虑一个例子 ...