Examining Huge Pages or Transparent Huge Pages performance
All modern processors use page-based mechanisms to translate the user-space processes virtual addresses into physical addresses for RAM. The pages are commonly 4KB in size and the processor can hold a limited number of virtual-to-physical address mappings in the Translation Lookaside Buffers (TLB). The number TLB entries ranges from tens to hundreds of mappings. This limits a processor to a few
megabytes of memory it can address without changing the TLB entries. When a virtual-to-physical address mapping is not in the TLB the processor must do an expensive computation to generate a new virtual-to-physical address mapping.
To increase the amount of memory the processor can address without performing the expensive TLB updates many processors allow larger page sizes to be used. On x86_64 processors huge pages are 2MB, 512 times larger than regular 4KB pages. In ideal situations huge pages can decrease the overhead of the TLB updates (misses). However, huge page use can increase memory pressure, add latency for minor pages faults, and add overhead when splitting huge pages or coalescing normal sized pages into huge pages.
There are two mechanisms available for huge pages in Linux: the hugepages and Transparent Huge Pages (THP). Explicit configuration is required for the original hugepages mechanism. The newer transparent hugepage (THP) mechanism will automatically use larger pages for dynamically allocated memory in Red Hat Enterprise Linux 6.
To determine whether the newer Transparent HugePages (THP) or the older HugePages mechanism are being used, look at the output of /proc/meminfo as below:
$ cat /proc/meminfo|grep Huge
AnonHugePages: 3049472 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
The AnonHugePages entry lists the number of pages that the newer Transparent Huge Page mechanism currently has in use. For this machine there are 309472kB, 1489 huge pages each 2048kB in size.
In this case there are zero pages in the pool of the older hugepage mechanism as shown by HugePages_Total of 0. The HugePages_Free shows how many pages are still available for allocation, which is going to be less than or equal to HugePages_Total. The number of HugePages in use can be computed as HugePages_Total–HugePagesFree. For more information about the configuration of HugePages see Tuning and Optimizing Red Hat Enterprise Linux for Oracle 9i and 10g Databases.
Determining whether page fault latency is due to huge pages use
Huge page use can reduce the number of TLB updates required to access large regions of memory and reducing the overall cost of TLB updates but increase costs and latency for other operations. When a user-space application is given a range of addresses for a memory allocation the assignment of a physical page is deferred until the first time the page is accessed. To prevent information leakage from the previous user of the page the kernel writes zeros in the entire page. For a 4096 byte page this is a relatively short operation and will only take a couple of microseconds. The x86 hugepages are 2MB in size, 512 times larger than the normal page. Thus, the operation may take hundreds of microseconds and impact the operation of latency sensitive code. Below is a simple SystemTap command line script to show which applications have huge pages zeroed out and how long those operations take. It will run until cntl-c is pressed.
stap -e 'global huge_clear probe kernel.function("clear_huge_page").return {huge_clear [execname(), pid()] <<< (gettimeofday_us() - @entry(gettimeofday_us()))}'
Below is the a run of the above SystemTap clear huge page script. The script will output a list sorted from the executable name and process with the most huge page clears to the least. The @count is the number of times that process encountered a huge page clear operation. Following that information is time statistics displayed in microseconds of wall clock time. The @min and the @max are the minimum and the maximum time respectively to clear out a page. The @sum is the total wall clock time. In the example below the ld process 17050 took a total 1924 microseconds to clear out huge pages and on average those page clears took 128 microseconds.
# stap -e 'global huge_clear probe kernel.function("clear_huge_page").return {huge_clear [execname(), pid()] <<< (gettimeofday_us() - @entry(gettimeofday_us()))}'
^CChuge_clear["ld",17050] @count=15 @min=114 @max=148 @sum=1924 @avg=128
huge_clear["ld",27996] @count=13 @min=121 @max=160 @sum=1674 @avg=128
huge_clear["ld",19595] @count=11 @min=86 @max=181 @sum=1251 @avg=113
huge_clear["cc1",22840] @count=6 @min=108 @max=180 @sum=862 @avg=143
huge_clear["ld",15640] @count=5 @min=160 @max=599 @sum=1274 @avg=254
huge_clear["ld",27733] @count=4 @min=95 @max=145 @sum=443 @avg=110
huge_clear["cc1",24455] @count=4 @min=103 @max=159 @sum=535 @avg=133
huge_clear["cc1",20431] @count=3 @min=112 @max=172 @sum=408 @avg=136
huge_clear["cc1",21906] @count=3 @min=125 @max=159 @sum=431 @avg=143
The system may attempt to save memory by using the same physical page for multiple processes. When one of the processes attempts to modify the contents of the page a new copy needs to be made of the page. The Copy-On-Write (COW) operation for the huge page can be observed with a script very similar to the one watching for huge pages to be zeroed out. Below is the script to watch for Copy-On-Write for huge pages and it will output data in a similar format.
stap -e 'global huge_cow probe kernel.function("copy_user_huge_page").return {huge_cow [execname(), pid()] <<< (gettimeofday_us() - @entry(gettimeofday_us()))}'
Determining whether huge page split and collapse operations are affecting performance
Because some portions of the kernel code only work with normal-sized pages the kernel may convert a huge page into a set of normal-sized pages using a split operation. One can identify if split operations are occurring with the following systemtap script:
stap -e 'probe kernel.function("split_huge_page") { printf("%s: %s(%d)n", pp(), execname(), pid());}'
Below is an example run of the script showing which processes are performing split huge page operations. In this case the same virtualized guest machine (qemu-system-x86_64) has some huge pages splits.
# stap -e 'probe kernel.function("split_huge_page") { printf("%s: %s(%d)n", pp(), execname(), pid());}'
kernel.function("split_huge_page@include/linux/huge_mm.h:103"): qemu-system-x86(9473)
kernel.function("split_huge_page@include/linux/huge_mm.h:103"): qemu-system-x86(9473)
kernel.function("split_huge_page@include/linux/huge_mm.h:103"): plugin-containe(16582)
kernel.function("split_huge_page@include/linux/huge_mm.h:103"): StreamT~ns #697(2942)
The inverse of the huge page split operation is the huge page collapse operation that converts a set of normal-sized pages into a single huge page. It is desirable to have a range of addresses need fewer TLB entries, but the conversion process is expensive because the system needs to find a candidate set of pages to group together and then copy all the memory from the possibly scattered normal-sized pages into a single huge page. The khugepaged kernel thread searches for candidates pages to collapse into a single huge page. Even if khugepaged is not successful converting normal-sized pages into huge pages it may still be taking processor time to search for candidate pages. You can see if the khugepaged kernel thread is taking a significant amount of processor time with:
top -p `pidof khugepaged`
If you want to see when the huge page collapse operations occur, the following will note each time khugepaged is able to collapse normal-sized pages into huge pages:
stap -e 'probe kernel.function("collapse_huge_page") { printf("%-25s: %s (%d) collapse_huge_pagen", tz_ctime(gettimeofday_s()), execname(), pid())}'
The above one line script will generate output like the following:
$ stap -e 'probe kernel.function("collapse_huge_page") { printf("%-25s: %s (%d) collapse_huge_pagen", ctime(gettimeofday_s()), execname(), pid())}'
Mon Oct 21 15:12:44 2013 : khugepaged (88) collapse_huge_page
Mon Oct 21 15:13:44 2013 : khugepaged (88) collapse_huge_page
Mon Oct 21 15:13:54 2013 : khugepaged (88) collapse_huge_page
Mon Oct 21 15:14:54 2013 : khugepaged (88) collapse_huge_page
Mon Oct 21 15:15:04 2013 : khugepaged (88) collapse_huge_page
TIPS:
if stap run failed:
# semantic error: missing x86_64 kernel/module debuginfo [man warning::debuginfo] under '/lib/modules/3.10.0-327.ali2000.alios7.x86_64/build'
please run:
# debuginfo-install kernel
References
- http://kvm.et.redhat.com/wiki/images/9/9e/2010-forum-thp.pdf
- http://git.kernel.org/?p=linux/kernel/git/andrea/aa.git;a=blob_plain;f=Documentation/vm/transhuge.txt;hb=HEAD
- Red Hat Enterprise Linux 6 SystemTap Beginners Guide Introduction to SystemTap
- https://developers.redhat.com/blog/2014/03/10/examining-huge-pages-or-transparent-huge-pages-performance/
Examining Huge Pages or Transparent Huge Pages performance的更多相关文章
- Linux传统Huge Pages与Transparent Huge Pages再次学习总结
Linux下的大页分为两种类型:标准大页(Huge Pages)和透明大页(Transparent Huge Pages).Huge Pages有时候也翻译成大页/标准大页/传统大页,它们都是Hu ...
- Linux Transparent Huge Pages 对 Oracle 的影响
1 Transparent Huge Pages 说明 官网上有2篇文章对THP 做了说明: https://access.redhat.com/solutions/46111 https://acc ...
- redis启动后出现"WARNING you have Transparent Huge Pages (THP) support enabled in your kernel"问题
问题描述:启动redis后出现:WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This w ...
- Linux中禁用THP(Transparent Huge Pages)
一.简介 Centos6开始引入THP,Centos7时默认启用,用来提升内存性能. 二.说明 争对一些数据库,如Oracle.MariaDB.MongoDB.VoltDB在使用时,要求关闭此功能. ...
- Transparent Huge Pages
在RHEL6中,透明大页功能是默认开启的. 开启该选项后,内核会尽可能地尝试分配大页,如果mmap区域是2mb,那么每个linux进程都会分配到2mb大小的页.如果大页不够用了(比如物理内存不够了), ...
- Configuring HugePages for Oracle on Linux (x86-64)
Introduction Configuring HugePages Force Oracle to use HugePages (USE_LARGE_PAGES) Disabling Transpa ...
- MongoDB 生产环境笔记
目录 MongoDB 生产环境笔记 一.vm.zone_reclaim_mode 参数 二.添加 swap 分区 三.设置 swappiness 参数 四.内核和文件系统版本 五.禁用 Transpa ...
- HBase最佳实践-用好你的操作系统
终于又切回HBase模式了,之前一段时间因为工作的原因了解接触了一段时间大数据生态的很多其他组件(诸如Parquet.Carbondata.Hive.SparkSQL.TPC-DS/TPC-H等),虽 ...
- 现在的 Linux 内核和 Linux 2.6 的内核有多大区别?
作者:larmbr宇链接:https://www.zhihu.com/question/35484429/answer/62964898来源:知乎著作权归作者所有.商业转载请联系作者获得授权,非商业转 ...
随机推荐
- Angular CLI 使用教程指南参考
Angular CLI 使用教程指南参考 Angular CLI 现在虽然可以正常使用但仍然处于测试阶段. Angular CLI 依赖 Node 4 和 NPM 3 或更高版本. 安装 要安装Ang ...
- 大数据开发实战:Stream SQL实时开发一
1.流计算SQL原理和架构 流计算SQL通常是一个类SQL的声明式语言,主要用于对流式数据(Streams)的持续性查询,目的是在常见流计算平台和框架(如Storm.Spark Streaming.F ...
- 【Java】Java-XML解析利器-SAX-高性能-易用
Java-XML解析利器-SAX-高性能-易用 java xml 大_百度搜索 (3)java处理比较大的xml文件 - SegmentFault How to read UTF-8 XML file ...
- win8下Source Insight has not been installed completely问题的解决
系统:windows8 软件:Source Insight 3.5 安装后打开总是提示如下图错误,没法使用. 卸载重新安装好多次,还是不行,百度一下,终于找到方法,记录一下,方便以后查找. 解决方法: ...
- (转)真正的中国天气api接口xml,json(求加精) ...
我只想说现在网上那几个api完全坑爹有木有??? 官方的申请不来有木有,还有收费有木有?? 咱这种菜鸟只能用免费的了!!!! http://m.weather.com.cn/data/101110 ...
- 树莓派3中运行Netcore2.0程序
一.简介 Netcore2.0发部后,可以运行在Arm平台上.因此,我们可以尝试在装了Debain的树莓派中运行. 二.方法: 1.在自己的电脑上使用VS写一个NetCore2.0的控制台程序,我假设 ...
- C++ 第十一课 标准c内存函数
calloc() 分配一个二维储存空间 free() 释放已分配空间 malloc() 分配空间 realloc() 改变已分配空间的大小 calloc 语法: #include <st ...
- 解决Visio复制绘图时虚框变实框的问题
参考:http://www.educity.cn/help/653700.html 问题好像是,在VISIO里只要虚线框的大小超过一个界限,拷贝之后就会变成实线框. 解决办法是修改注册表:[运行reg ...
- 关于Storm 中Topology的并发度的理解
来自:https://storm.apache.org/documentation/Understanding-the-parallelism-of-a-Storm-topology.html htt ...
- python的traceback模块
import traceback try: 1/0 except Exception,e: traceback.print_exc() 输出结果是 Traceback (most recent cal ...