转载请注明原创出处,谢谢!

在第八系列最后有些疑惑的地方,后来还是在我坚持不懈不断打扰笨神,阿飞,ak大神等,终于解决了该问题。第八系列地址:http://www.jianshu.com/p/7f7cecb983cc

关于MAT工具相关知识解惑

MAT 不是一个万能工具,它并不能处理所有类型的堆存储文件。但是比较主流的厂家和格式,例如 Sun, HP, SAP 所采用的 HPROF 二进制堆存储文件,以及 IBM 的 PHD 堆存储文件等都能被很好的解析,MAT下载地址,以及相关文档。
获取堆信息:

我们使用如下命令获取:

jmap -dump:format=b,file=heap.bin pid  

后面讲堆信息下载本地,通过MAT打开即可,第一个问题来了,我通过jmap相关命令得到的信息与MAT观察的信息不一致问题。
如图所示:

而MAT里面显示如下:
有没有觉得很奇怪,明明4G的堆,dump文件也是4G多,为什么用MAT打开就220M左右????? 奇怪吧,我也疑惑好久,因为因为MAT把不可达对象没加在里面,如果需要显示在MAT工具进行选择设置即可,如下:

第一个问题相对其他几个问题简单点。

关于第八篇Xmn问题

第八篇问题早上过来查看gc日志发现如图效果:

看到这个日志觉得非常奇怪 real为什么会这么大呢?先来看看关于 user、sys、real的意思吧。
我们先来看看linux下面的time相关命令,执行

time ls 

Real, User and Sys process time statistics
One of these things is not like the other. Real refers to actual elapsed time; User and Sys refer to CPU time used only by the process.
Real is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).

User is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process. Other processes and time the process spends blocked do not count towards this figure.

Sys is the amount of CPU time spent in the kernel within the process. This means executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-space. Like 'user', this is only CPU time used by the process. See below for a brief description of kernel mode (also known as 'supervisor' mode) and the system call mechanism.

User+Sys
will tell you how much actual CPU time your process used. Note that this is across all CPUs, so if the process has multiple threads (and this process is running on a computer with more than one processor) it could potentially exceed the wall clock time reported by Real
(which usually occurs). Note that in the output these figures include the User
and Sys
time of all child processes (and their descendants) as well when they could have been collected, e.g. by wait(2)
or waitpid(2)
, although the underlying system calls return the statistics for the process and its children separately.
Origins of the statistics reported by time (1)

The statistics reported by time
are gathered from various system calls. 'User' and 'Sys' come from wait (2) or times (2), depending on the particular system. 'Real' is calculated from a start and end time gathered from the gettimeofday (2) call. Depending on the version of the system, various other statistics such as the number of context switches may also be gathered by time
.
On a multi-processor machine, a multi-threaded process or a process forking children could have an elapsed time smaller than the total CPU time - as different threads or processes may run in parallel. Also, the time statistics reported come from different origins, so times recorded for very short running tasks may be subject to rounding errors, as the example given by the original poster shows.
A brief primer on Kernel vs. User mode
On Unix, or any protected-memory operating system, 'Kernel' or 'Supervisor' mode refers to a privileged mode that the CPU can operate in. Certain privileged actions that could affect security or stability can only be done when the CPU is operating in this mode; these actions are not available to application code. An example of such an action might be manipulation of the MMU to gain access to the address space of another process. Normally, user-mode code cannot do this (with good reason), although it can request shared memory from the kernel, which could be read or written by more than one process. In this case, the shared memory is explicitly requested from the kernel through a secure mechanism and both processes have to explicitly attach to it in order to use it.
The privileged mode is usually referred to as 'kernel' mode because the kernel is executed by the CPU running in this mode. In order to switch to kernel mode you have to issue a specific instruction (often called a trap) that switches the CPU to running in kernel mode and runs code from a specific location held in a jump table. For security reasons, you cannot switch to kernel mode and execute arbitrary code - the traps are managed through a table of addresses that cannot be written to unless the CPU is running in supervisor mode. You trap with an explicit trap number and the address is looked up in the jump table; the kernel has a finite number of controlled entry points.
The 'system' calls in the C library (particularly those described in Section 2 of the man pages) have a user-mode component, which is what you actually call from your C program. Behind the scenes, they may issue one or more system calls to the kernel to do specific services such as I/O, but they still also have code running in user-mode. It is also quite possible to directly issue a trap to kernel mode from any user space code if desired, although you may need to write a snippet of assembly language to set up the registers correctly for the call. A page describing the system calls provided by the Linux kernel and the conventions for setting up registers can be found here.
More about 'sys'
There are things that your code cannot do from user mode - things like allocating memory or accessing hardware (HDD, network, etc.). These are under the supervision of the kernel, and it alone can do them. Some operations that you do (like malloc
orfread
/fwrite
) will invoke these Kernel functions and that then will count as 'sys' time. Unfortunately it's not as simple as "every call to malloc will be counted in 'sys' time". The call to malloc
will do some processing of its own (still counted in 'user' time) and then somewhere along the way it may call the function in kernel (counted in 'sys' time). After returning from the kernel call, there will be some more time in 'user' and then malloc
will return to your code. As for when the switch happens, and how much of it is spent in kernel mode... you cannot say. It depends on the implementation of the library. Also, other seemingly innocent functions might also use malloc
and the like in the background, which will again have some time in 'sys' then.

其实如果这些看完就知道了,我当时没有看这块内容,问笨神了,不得不说笨神大大很牛逼啊,直接告诉我有可能那个时候因为swap导致gc慢了,最后经过一系列验证的确是swap问题,把这个关闭就没有这块的问题了。其实类似linux很重要,需要重点学习下。
其实IO也很重要,在后面的一个问题虽然不是IO问题,但是笨神提到了是否是IO问题,也是需要关注的一个点,sar(System Activity Reporter系统活动情况报告)是目前 Linux 上最为全面的系统性能分析工具之一,可以从多方面对系统的活动进行报告,包括:文件的读写情况、系统调用的使用情况、磁盘I/O、CPU效率、内存使用状况、进程活动及IPC有关的活动等。一句话sar非常牛逼。
敲入“sar”,如果提示没有进行安装:

yum install sysstat     

安装成功后,sar有很多明白包括不限于, 查看CPU使用率、查看平均负荷、查看内存使用情况、查看页面交换发生状况、查看I/O和传送速率的统计信息等,重点看看IO这块:

-d option in the sar command is used to display the block device statistics report. Using option -p (pretty-print) along with -d make the dev column more readable, example is shown below :

sar -d -p 1  

等相关命令。
关于第八篇的Xmn问题也是类似问题,解决顺便学习了一把。

下面开始巨坑,该坑搞的我昨天晚上都没有修改好。

选择UseParallelOldGC垃圾回收器相关坑

为什么选择UseParallelOldGC回收器,其实在第八篇里面有介绍,地址:http://www.jianshu.com/p/7f7cecb983cc
问题由来,发现old涨的蛮快,觉得很奇怪,在思考为什么old涨的那么快呢?一般大概就几种情况:

  • 新生代对象每经历依次minor gc,年龄会加一,当达到年龄阀值会直接进入老年代。阀值大小一般为15。
  • Survivor空间中年龄所有对象大小的总和大于survivor空间的一半,年龄大于或等于该年龄的对象就可以直接进入老年代,而无需等到年龄阀值。
  • 大对象直接进入老年代。
  • 新生代复制算法需要一个survivor区进行轮换备份,如果出现大量对象在minor gc后仍然存活的情况时,就需要老年代进行分配担保,让survivor无法容纳的对象直接进入老年代。

其实笨神分享过,这个参数主要作用是设置在YGC的时候,新生代的对象正常情况下最多经过多少次YGC的过程会晋升到老生代,注意是表示最多而不是一定,也就是说某个对象的age其实并不是一定要达到这个值才会晋升到Old的,当你设置这个值的时候,第一次会以它为准,后面的就不一定以它为准了,而是JVM会自动计算在0~15之间决定。如果配置了CMS GC,这个值默认是6,这个值最大你可以设置到15,因为jvm里就4个bit来存这个值,所以最大就是1111,即15。
先通过命令:

jstat -gc pid 3s  

发现S0、S1才512k,很奇怪,以为是这块空间小了修改,-XX:SurvivorRatio=2,奇怪的问题发现啦,如图:
通过各种日志观察,见鬼啦,居然S0、S1大小在越来越小,对、你没有看错,PS下UseAdaptiveSizePolicy默认是打开的,对问题就是在这里,通过命令:

jinfo -flag UseAdaptiveSizePolicy pid  

的确是打开的,而CMS默认是关闭的,所以有些同学可能也并没有注意到这块内容,该jvm实现代码在:https://github.com/dmlloyd/openjdk/blob/jdk9/jdk9/hotspot/src/share/vm/gc/parallel/psAdaptiveSizePolicy.cpp感谢小峰提供并且读了源码。


ps然后开启UseAdaptiveSizePolicy之后,晋升年龄阈值最后减到1了,是因为ps在开启这个之后,会判断young gc和full gc的上消耗的时间差。如果young gc>full gc1.1,threshold就会一直减少。反之,如果young1.1<full,threshold就会一直增加。 而我的应用确实young gc比较频繁,所以threshold一直减到了1,最小是1。

通过 -XX:-UseAdaptiveSizePolicy 关闭,巨坑开始了就从这个参数开始。

再此观察,old涨的还是涨,之前了解过这个参数,PrintTenuringDistribution,以前了解过,会有如下日志出现:
再此观察,日志里面一个类似age的都没有,以为不是对象年龄到了晋升到old的,怀疑是对象过大了,通过加 -XX:PretenureSizeThreshold=100m参数(这个参数其实只对串行回收器和ParNew有效,对ParallelGC无效), 再次观察,还是old很涨,如果巨坑来了,纠结一晚上。

****
这里的s0 s1空间很大啊 (看使用率很少) 为什么每次Old都会涨呢 并且没有发现gc日志有类似age的,大对象也设置啦,纠结啊,又是等笨神解答的,一直在纠结为什么old在涨呢,解释不通啦!ps下如果自适应策略关了也不会打印,把自适应策略关了,去掉那个UseAdaptiveSizePolicy参数就好了。
后续会结合jvm来判断蛛丝马迹来进行代码层面或者其他层面的优化,该类操作比较复杂也一直在摸索,希望一起进步。


个人公众号

JVM菜鸟进阶高手之路九(解惑)的更多相关文章

  1. JVM 菜鸟进阶高手之路九(解惑)

    转载请注明原创出处,谢谢! 在第八系列最后有些疑惑的地方,后来还是在我坚持不懈不断打扰笨神,阿飞,ak大神等,终于解决了该问题.第八系列地址:http://www.cnblogs.com/lirenz ...

  2. JVM菜鸟进阶高手之路十三(等你来战!!!)

    转载请注明原创出处,谢谢! 前几天有个朋友问了我个问题,下面给大家分享下,希望大家积极在评论区进行评论留言,等你来战!!! 先来个趣味题,热身下,引出后面的jvm题目. 地上的影子是那个人的? 地上的 ...

  3. JVM菜鸟进阶高手之路十(基础知识开场白)

    转载请注明原创出处,谢谢! 最近没有什么实战,准备把JVM知识梳理一遍,先以开发人员的交流来谈谈jvm这块的知识以及重要性,依稀记得2.3年前用solr的时候老是经常oom,提到oom大家应该都不陌生 ...

  4. JVM菜鸟进阶高手之路十四:分析篇

    转载请注明原创出处,谢谢! 题目回顾 JVM菜鸟进阶高手之路十三,问题现象就是相同的代码,jvm参数不一样,表现的现象不一样. private static final int _1MB = 1024 ...

  5. JVM菜鸟进阶高手之路一[z]

    https://mp.weixin.qq.com/s/qD1LFmsOiqZHD8iZX97OfA? 问题现象 代码如下,使用 ParNew + Serial Old 回收器组合与使用 ParNew ...

  6. JVM菜鸟进阶高手之路二(JVM的重要性,Xmn是跟请求量有关。)

    转载请注明原创出处,谢谢! 今天看群聊jvm,通常会问ygc合适吗? 阿飞总结,可能需要2个维度,1.单位时间执行次数,2.执行时间 ps -p pid -o etime 查看下进程的运行时间, 17 ...

  7. JVM菜鸟进阶高手之路一(一次与笨神,阿飞近距离接触修改JVM)

    转载请注明原创出处,谢谢! 今天在JVMPocket群里面看见,阿牛发了一个gc截图,之后ak47截图了特别恐怖,我就觉得好奇,去看看服务情况,截图日志如下 关于jstat命令详情可以参考:https ...

  8. JVM菜鸟进阶高手之路十二(jdk9、JVM方面变化, 蹭热度)

    转载请注明原创出处,谢谢! 经过 4 次跳票,历经曲折的 Java 9 正式版终于发布了!今天看着到处都是jdk9发布了,新特性说明,心想这么好的蹭热度计划能错过嘛,哈哈,所以就发了这篇文章. 目前j ...

  9. JVM菜鸟进阶高手之路八(一些细节)

    转载请注明原创出处,谢谢! gc日志问题 查看docker环境的gc日志,发现是下面这种情况,很奇怪,一直怀疑是docker环境那里是否有点问题,并没有怀疑配置,之前物理机上面的gc日志都是正常那种. ...

随机推荐

  1. 【Hadoop】执行start-dfs.sh出错

    问题1:hadoop2.7.3部署警告: Unable to load native-hadoop library for your platform 解决办法: 1.编辑hadoop-env.sh ...

  2. Springboot系列文章

    一.springboot简介1.前世今生 在boot没有出现之前,基于spring的开发,常常需要配置大量的xml文件.工程狮们苦不堪言,渐渐厌倦了配置文件的复制黏贴.spring家族因为这件事,也经 ...

  3. MySQl数据库常用的DOS命令

    MySQl数据库常用的DOS命令.. 这是第一部分.. 数据库的连接信息:jdbc:mysql://localhost:3306/shxtcom.mysql.jdbc.Driver /*jdbc:sq ...

  4. javaweb中重定向和请求转发(response.sendRedirect()和request.getRequestDispatcher(rul).forward(request,response)))的区别

    先来两张图,方便理解: 可以看出,重定向时,是服务器向游览器重新发送了一个response命令,让游览器再次向url2发送请求,以获取url2的资源 而请求转发时,类似于是服务器自己向自己发了一个跳转 ...

  5. HttpResponseMessage获取请求响应体内容

    问题描述 使用httpClient获取的HttpResponseMessage类型的response,直接对其toString()获取的是请求的响应头,并没有获取响应体的内容 解决办法 HttpRes ...

  6. SpringMVC的@RequestParam的解释

    自SpringMVC4.2之后,RequestParam内部有4个参数: 1.String name 2.String value 3.boolean required 4.String defaul ...

  7. Struts2流程分析与工具配置

    1. 运行流程 请求 -- StrutsPrepareAndExecuteFilter 核心控制器 -– Interceptors 拦截器(实现代码功能 ) -– Action 的execuute - ...

  8. Java基础语法(一)---关键字、常量、变量、运算符

    一.关键字 定义:被Java赋予了特殊含义的单词. 特点:体现上都是英文小写. 1. 用于定义数据类型的关键字 基本数据类型: 整数类型:byte  short  int  long   浮点类型:f ...

  9. grep Pocket Reference读记

    1 简介 grep的基本命令格式如下:           grep [options] [regexp] [filename]   如果regexp中含有空格,应该使用单引号或双引号括起来.单引号和 ...

  10. ascii codec can't decode byte 0xe8 in position 0:ordinal not in range(128) python代码报错

    import sys reload(sys) sys.setdefaultencoding('utf-8')