In previous posts on vm.swappiness and using RAM disks we talked about how the memory on a Linux guest is used for the OS itself (the kernel, buffers, etc.), applications, and also for file cache. File caching is an important performance improvement, and read caching is a clear win in most cases, balanced against applications using the RAM directly. Write caching is trickier. The Linux kernel stages disk writes into cache, and over time asynchronously flushes them to disk. This has a nice effect of speeding disk I/O but it is risky. When data isn’t written to disk there is an increased chance of losing it.

There is also the chance that a lot of I/O will overwhelm the cache, too. Ever written a lot of data to disk all at once, and seen large pauses on the system while it tries to deal with all that data? Those pauses are a result of the cache deciding that there’s too much data to be written asynchronously (as a non-blocking background operation, letting the application process continue), and switches to writing synchronously (blocking and making the process wait until the I/O is committed to disk). Of course, a filesystem also has to preserve write order, so when it starts writing synchronously it first has to destage the cache. Hence the long pause.

The nice thing is that these are controllable options, and based on your workloads & data you can decide how you want to set them up. Let’s take a look:

$ sysctl -a | grep dirty
vm.dirty_background_ratio =
vm.dirty_background_bytes =
vm.dirty_ratio =
vm.dirty_bytes =
vm.dirty_writeback_centisecs =
vm.dirty_expire_centisecs =

vm.dirty_background_ratio is the percentage of system memory that can be filled with “dirty” pages — memory pages that still need to be written to disk — before the pdflush/flush/kdmflush background processes kick in to write it to disk. My example is 10%, so if my virtual server has 32 GB of memory that’s 3.2 GB of data that can be sitting in RAM before something is done.

vm.dirty_ratio is the absolute maximum amount of system memory that can be filled with dirty pages before everything must get committed to disk. When the system gets to this point all new I/O blocks until dirty pages have been written to disk. This is often the source of long I/O pauses, but is a safeguard against too much data being cached unsafely in memory.

vm.dirty_background_bytes and vm.dirty_bytes are another way to specify these parameters. If you set the _bytes version the _ratio version will become 0, and vice-versa.

vm.dirty_expire_centisecs is how long something can be in cache before it needs to be written. In this case it’s 30 seconds. When the pdflush/flush/kdmflush processes kick in they will check to see how old a dirty page is, and if it’s older than this value it’ll be written asynchronously to disk. Since holding a dirty page in memory is unsafe this is also a safeguard against data loss.

vm.dirty_writeback_centisecs is how often the pdflush/flush/kdmflush processes wake up and check to see if work needs to be done.

You can also see statistics on the page cache in /proc/vmstat:

$ cat /proc/vmstat | egrep "dirty|writeback"
nr_dirty
nr_writeback
nr_writeback_temp

In my case I have 878 dirty pages waiting to be written to disk.

Approach 1: Decreasing the Cache

As with most things in the computer world, how you adjust these depends on what you’re trying to do. In many cases we have fast disk subsystems with their own big, battery-backed NVRAM caches, so keeping things in the OS page cache is risky. Let’s try to send I/O to the array in a more timely fashion and reduce the chance our local OS will, to borrow a phrase from the service industry, be “in the weeds.” To do this we lower vm.dirty_background_ratio and vm.dirty_ratio by adding new numbers to /etc/sysctl.conf and reloading with “sysctl –p”:

vm.dirty_background_ratio =
vm.dirty_ratio =

This is a typical approach on virtual machines, as well as Linux-based hypervisors. I wouldn’t suggest setting these parameters to zero, as some background I/O is nice to decouple application performance from short periods of higher latency on your disk array & SAN (“spikes”).

Approach 2: Increasing the Cache

There are scenarios where raising the cache dramatically has positive effects on performance. These situations are where the data contained on a Linux guest isn’t critical and can be lost, and usually where an application is writing to the same files repeatedly or in repeatable bursts. In theory, by allowing more dirty pages to exist in memory you’ll rewrite the same blocks over and over in cache, and just need to do one write every so often to the actual disk. To do this we raise the parameters:

vm.dirty_background_ratio =
vm.dirty_ratio =

Sometimes folks also increase the vm.dirty_expire_centisecs parameter to allow more time in cache. Beyond the increased risk of data loss, you also run the risk of long I/O pauses if that cache gets full and needs to destage, because on large VMs there will be a lot of data in cache.

Approach 3: Both Ways

There are also scenarios where a system has to deal with infrequent, bursty traffic to slow disk (batch jobs at the top of the hour, midnight, writing to an SD card on a Raspberry Pi, etc.). In that case an approach might be to allow all that write I/O to be deposited in the cache so that the background flush operations can deal with it asynchronously over time:

vm.dirty_background_ratio =
vm.dirty_ratio =

Here the background processes will start writing right away when it hits that 5% ceiling but the system won’t force synchronous I/O until it gets to 80% full. From there you just size your system RAM and vm.dirty_ratio to be able to consume all the written data. Again, there are tradeoffs with data consistency on disk, which translates into risk to data. Buy a UPS and make sure you can destage cache before the UPS runs out of power. :)

No matter the route you choose you should always be gathering hard data to support your changes and help you determine if you are improving things or making them worse. In this case you can get data from many different places, including the application itself, /proc/vmstat, /proc/meminfo, iostat, vmstat, and many of the things in /proc/sys/vm. Good luck!

注:

1、vm.dirty_background_ratio是指脏页的数量占用缓存的比率达到该数值时,后台调用pdflush/flush/kdmflush写入进程时会将脏页落盘。此时执行的写入操作是异步写入,不会阻塞应用程序的写入。

2、vm.dirty_ratio是指脏页的数量占用缓存的比率达到该数值时,新产生的io操作将会被阻塞,直到产生的脏页写入到磁盘。此时的写入是同步操作,会阻塞应用程序的写入。虽然前面已经有了vm.dirty_background_ratio,可能有的人认为后面的vm.dirty_ratio是不必要的,其实不然,当程序的写入速度远超操作系统的落盘速度时,脏页比率是有可能达到vm.dirty_ratio的。此时,为了减少宕机时,数据的丢失,则进行同步写入操作。

3、至于vm.dirty_background_bytes和vm.dirty_bytes的含义和上面等同,只是比率改为数值而已

4、上面两个比率参数,是相对的是缓存,而不是操作系统的物理内存,上面举了一个32GB内存的例子,计算的结果是有误的。

缓存的计算是:

VmSize = memory + memory-mapped
Actually, a couple things. The article has the intent correct, but some of the details aren't.
The total size of the page cache, the last I looked, was MemFree + Cached - Mapped, which is not equal to the size the physical memory in the
system. You can find these amounts in /proc/meminfo, which also list a number of interesting things about the dirty cache behavior in a system.
root@server0:/proc# grep -i free meminfo
MemFree: kB
SwapFree: kB
HugePages_Free:
root@server0:/proc# grep -i dirty meminfo
Dirty: kB
root@server0:/proc# grep -i write meminfo
Writeback: kB
WritebackTmp: kB
root@server0:/proc#
I have read several times that the dirty cache behavior is used by ext2+ to make index and data writes more efficient. As I understand it,
the longer the data is held in cache before being written to the media, the fewer inode/index updates it will actually do. So, for flash type
media, holding writes does make sense if you are interested in increasing the longevity of the flash memory.
Also, in Linux, the cache can always be assumed to be the size of MemFree + Cached - Mapped, it will never be any bigger or smaller than that.
The only tuning option is how much of that memory do you want holding data waiting for writeback (dirty).
These are fairly minor, the intent of the article seems sound.
For the sake of discussion, here's the dirty_* parameters off of that server.
root@server0:/proc/sys/vm# grep . dirty_*
dirty_background_bytes:
dirty_background_ratio:
dirty_bytes:
dirty_expire_centisecs:
dirty_ratio:
dirty_writeback_centisecs:
root@server0:/proc/sys/vm#
Years ago, I did quite a bit of custom tuning for each of my systems. I've somewhat settled on the above. These work well in nearly every
situation, from Raspi's/Odroids, systems running off USB, to the largest general purpose DC boxes.
There are a few situations that this wouldn't work well for, database servers in particular, but these settings are safer than the default,
and perform better.
Take it for what you will.

5、另外,vm.dirty_writeback_centisecs和vm.dirty_expire_centisecs的单位是百分之一秒。

6、这是操作系统缓存部分的落盘,至于数据库的buffer则由自己管理了,不过因为存在数据库缓存和操作系统缓存双缓存的存在,在数据库参数设置合理的情况下,也要注意操作系统缓存的调整,虽然数据库是先写日志,缓存中的数据能否及时落盘,也就没有那么重要了。但操作系统参数的设置还是会影响到磁盘io的使用,调整vm.dirty_background_ratio的写频率,可以影响操作系统缓存合并io的操作,特别对于SSD硬盘更是如此,因为SSD磁盘的原子写通常最小单位是4K,即使是再少的数据,调整vm.dirty_background_ratio的写频率,从而影响合并粒度,可以大幅度降低SSD的真实写入,起到对SSD保护的作用。

参考:

https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/

https://www.reddit.com/r/linux/comments/3h7w8f/better_linux_disk_caching_performance_with/

http://www.cnblogs.com/xiaotengyi/p/6907190.html

https://www.kernel.org/doc/Documentation/sysctl/vm.txt

Better Linux Disk Caching & Performance with vm.dirty_ratio & vm.dirty_background_ratio的更多相关文章

  1. vm.dirty_ratio & vm.dirty_background_ratio

    https://lonesysadmin.net/2013/12/22/better-linux-disk-caching-performance-vm-dirty_ratio/ Better Lin ...

  2. Linux 文件系统缓存dirty_ratio与dirty_background_ratio两个参数区别

    文件系统缓存dirty_ratio与dirty_background_ratio两个参数区别 (2014-03-16 17:54:32) 转载▼ 标签: linux 文件系统缓存 cache dirt ...

  3. 文件系统缓存dirty_ratio与dirty_background_ratio两个参数区别

    这两天在调优数据库性能的过程中需要降低操作系统文件Cache对数据库性能的影响,故调研了一些降低文件系统缓存大小的方法,其中一种是通过修改/proc/sys/vm/dirty_background_r ...

  4. (转)文件系统缓存dirty_ratio与dirty_background_ratio两个参数区别

    这两天在调优数据库性能的过程中需要降低操作系统文件Cache对数据库性能的影响,故调研了一些降低文件系统缓存大小的方法,其中一种是通过修改/proc/sys/vm/dirty_background_r ...

  5. Linux虚拟内存(swap)调优篇-“swappiness”,“vm.dirty_background_ratio”和“vm.dirty_ratio”

      Linux虚拟内存(swap)调优篇-“swappiness”,“vm.dirty_background_ratio”和“vm.dirty_ratio” 作者:尹正杰 版权声明:原创作品,谢绝转载 ...

  6. dirty_background_ration 与 /proc/sys/vm/dirty_ratio

    wappiness的值的大小对如何使用swap分区是有着很大的联系的.swappiness=0的时候表示最大限度使用物理内存,然后才是 swap空间,swappiness=100的时候表示积极的使用s ...

  7. Linux System and Performance Monitoring

    写在前面:本文是对OSCon09的<Linux System and Performance Monitoring>一文的学习笔记,主要内容是总结了其中的要点,以及加上了笔者自己的一些理解 ...

  8. vm.dirty_background_ratio and vm.dirty_ratio

    http://hellojava.info/?p=264&utm_source=tuicool&utm_medium=referral 解决磁盘io紧张的一种临时方法 有些时候可能会碰 ...

  9. Disk IO Performance

    一,使用 Performance counter 监控Disk IO问题 1,Physical Disk vs. Logical Disk Windows可以在一个Physical Disk上划出若干 ...

随机推荐

  1. servlet和Jsp的复习整理

    servlet 1.生命周期 a.构造方法.生成一个servlet b.init()方法.当开启服务器时,servlet第一次被装载,servlet引擎调用这个servlet的init()的方法,只调 ...

  2. Java进阶知识点: 枚举值

    Java进阶知识点1:白捡的扩展性 - 枚举值也是对象   一.背景 枚举经常被大家用来储存一组有限个数的候选常量.比如下面定义了一组常见数据库类型: public enum DatabaseType ...

  3. 【zabbix 监控】第二章 安装测试被监控主机

    客户端安装测试 一.准备两台被监控主机,分别做如下操作: web129:192.168.19.129 web130:192.168.19.130 [root@web129 ~]#yum -y inst ...

  4. C++复合类型(结构,共用体,枚举)

    •结构是用户定义的类型,而结构的声明定义了这种类型的数据属性. 一.关键字struct声明:   定义了一种新类型 struct inflatable{ char name[20];//结构成员 fl ...

  5. 【转载】完全版线段树 by notonlysuccess大牛

    原文出处:http://www.notonlysuccess.com/ 今晚上比赛就考到了 排兵布阵啊,难受. [完全版]线段树 很早前写的那篇线段树专辑至今一直是本博客阅读点击量最大的一片文章,当时 ...

  6. 在本地电脑使用远程服务器的图形界面——包括 MATLAB、PyCharm 等各种软件

    在用本地电脑连接远程服务器的时候,大部分时候只能用命令行来操作.虽然可以 在本地电脑用 PyCharm 进行远程调试.在本地电脑远程使用服务器的 Jupyter Notebook.Ubuntu 和 W ...

  7. 【转载】图解Java常用数据结构(一)

    图解Java常用数据结构(一)  作者:大道方圆 原文:https://www.cnblogs.com/xdecode/p/9321848.html 最近在整理数据结构方面的知识, 系统化看了下Jav ...

  8. Thunder团队第五周 - Scrum会议6

    Scrum会议6 小组名称:Thunder 项目名称:i阅app Scrum Master:邹双黛 工作照片: 宋雨同学在拍照,所以不在照片内. 参会成员: 王航:http://www.cnblogs ...

  9. Java学习个人备忘录之内部类

    内部类: 将一个类定义在另一个类的里面,对里面那个类就称为内部类. class Outer { private int num = 3; class Inner //它想访问Outer中的num, 如 ...

  10. LintCode-112.删除排序链表中的重复元素

    删除排序链表中的重复元素 给定一个排序链表,删除所有重复的元素每个元素只留下一个. 样例 给出 1->1->2->null,返回 1->2->null 给出 1-> ...