http://blog.itpub.net/15480802/viewspace-753819/
http://blog.itpub.net/15480802/viewspace-753757/
http://blog.itpub.net/15480802/viewspace-753890/
http://blog.chinaunix.net/uid-26126915-id-3481343.html
 
 

[wuyaalan@localhost desktop]$ cd /proc/sys/vm/
[wuyaalan@localhost vm]$ ls
block_dump                 hugepages_treat_as_movable  oom_kill_allocating_task
compact_memory             hugetlb_shm_group           overcommit_memory
dirty_background_bytes     laptop_mode                 overcommit_ratio
dirty_background_ratio     legacy_va_layout            page-cluster
dirty_bytes                lowmem_reserve_ratio        panic_on_oom
dirty_expire_centisecs     max_map_count               percpu_pagelist_fraction
dirty_ratio                min_free_kbytes             scan_unevictable_pages
dirty_writeback_centisecs  mmap_min_addr               stat_interval
drop_caches                nr_hugepages                swappiness
extfrag_threshold          nr_overcommit_hugepages     vdso_enabled
extra_free_kbytes          nr_pdflush_threads          vfs_cache_pressure
highmem_is_dirtyable       oom_dump_tasks              would_have_oomkilled

从上面结果可以看出,proc文件系统给用户提供了很多内核信息帮助,使得用户可以通过修改内核参数达到提高系统性能的目的。

接下来对上面列出的部分参数含义进行解释说明。

一   block_dump      block_dump enables block I/O
debugging when set to a nonzero value. If you want to find out which
process caused the disk to spin up(see /proc/sys/vm/laptop_mode ), you
can gather information by setting the flag.     When this flag is set, Linux
reports all disk read and write operations that take place, and all
block dirtyings done to files. This makes it possible to debug why a
disk needs to spin up, and to increase battery life even more. The
output of block_dump is written to the kernel output, and it can be
retrieved using "dmesg". When you use block_dump and your kernel logging
level also includes kernel debugging messages, you probably want to
turn off klogd, otherwise the output of block_dump will be logged,
causing disk activity that is not normally there.

参数block_dump使块I / O调试时设置为一个非零的值。如果你想找出哪些过程引起的磁盘旋转(见/proc/sys/vm/laptop_mode),
你可以通过设置标志收集信息。设置该标志后,Linux将会以文件的形式报告所有磁盘活动时的读写操作以及所有脏块。这使得它可以解释为什么一个磁盘需要
旋转起来,甚至可以增加电池寿命。把block_dump输出写至内核输出,可以使用“dmesg”相关信息。当你使用block_dump和内核日志记
录级别,还包括内核调试信息,你可能要关闭klogd,否则block_dump输出将被记录,导致不正常的磁盘活动有。

二  dirty_background_ratio     Contains, as a percentage of
total system memory, the number of pages at which the pdflush background
writeback daemon will start writing out dirty data.

参数dirty_background_ratio是当所有被更改页面总大小占工作内存超过 一定比例 时,pdflush 会开始写回工作。用户可以增加这个比例,以增加页面驻留在内存的时间。

三   dirty_expire_centisecs     This tunable is used to define
when dirty data is old enough to be eligible for writeout by the pdflush
daemons. It is expressed in 100'ths of a second. Data which has been
dirty in memory for longer than this interval will be written out next
time a pdflush daemon wakes up.

参数dirty_expire_centisecs控制一个更改过的页面经过多长时间后被认为是过期的、必须被写回的页面。

四  dirty_ratio     Contains, as a percentage of
total system memory, the number of pages at which a process which is
generating disk writes will itself start writing out dirty data.
五  dirty_writeback_centisecs    The pdflush writeback daemons will
periodically wake up and write "old" data out to disk. This tunable
expresses the interval between those wakeups, in 100'ths of a second.    Setting this to zero disables periodic writeback altogether.

参数dirty_writeback_centisecs 是在pdflash线程周期唤醒的时间间隔。也就是每过一定时间pdflsh就会将修改过得数据回写到磁盘。

六  drop_caches  Writing to this will cause the kernel to drop clean caches, dentries and inodes from memory, causing that memory to become free. To free pagecache:

  • echo 1 > /proc/sys/vm/drop_caches

To free dentries and inodes:

  • echo 2 > /proc/sys/vm/drop_caches

To free pagecache, dentries and inodes:

  • echo 3 > /proc/sys/vm/drop_caches

As this is a non-destructive
operation, and dirty objects are not freeable, the user should run
"sync" first in order to make sure all cached objects are freed. This tunable was added in 2.6.16.

七  hugepages_treat_as_movable When a non-zero value is written to
this tunable, future allocations for the huge page pool will use
ZONE_MOVABLE. Despite huge pages being non-movable, we do not introduce
additional external fragmentation of note as huge pages are always the
largest contiguous block we care about. Huge pages are not movable so are not
allocated from ZONE_MOVABLE by default. However, as ZONE_MOVABLE will
always have pages that can be migrated or reclaimed, it can be used to
satisfy hugepage allocations even when the system has been running a
long time. This allows an administrator to resize the hugepage pool at
runtime depending on the size of ZONE_MOVABLE.

八 hugetlb_shm_group    hugetlb_shm_group contains group id that is allowed to create SysV shared memory segment using hugetlb page

九 laptop_mode    laptop_mode is a knob that
controls "laptop mode". When the knob is set, any physical disk I/O
(that might have caused the hard disk to spin up, see
。/proc/sys/vm/block_dump) causes Linux to flush all dirty blocks. The
result of this is that after a disk has spun down, it will not be spun
up anymore to write dirty blocks, because those blocks had already been
written immediately after the most recent read operation. The value of
the laptop_mode knob determines the time between the occurrence of disk
I/O and when the flush is triggered. A sensible value for the knob is 5
seconds. Setting the knob to 0 disables laptop mode.

在“笔记本模式”下,内核更智能的使用 I/O 系统,它会尽量使磁盘处于低能耗的状态下。“笔记本模式”会将许多的 I/O 操作组织在一起,一次完成,而在每次的磁盘 I/O 之间是默认长达 10 分钟的非活动期,这样会大大减少磁盘启动的次数。为了完成这么长时间的非活动期,内核就要在一次活动期时完成尽可能多的 I/O 任务。在一次活动期间,要完成大量的预读,然后将所有的缓冲同步。

十 legacy_va_layout   If non-zero, this sysctl disables
the new 32-bit mmap map layout - the kernel will use the legacy (2.4)
layout for all processes

十一 lowmem_reserve_ratio Ratio of total pages to free pages for each memory zone.

十二 max_map_count   This file contains the maximum
number of memory map areas a process may have. Memory map areas are used
as a side-effect of calling malloc, directly by mmap and mprotect, and
also when loading shared libraries. While most applications need less
than a thousand maps, certain programs, particularly malloc debuggers,
may consume lots of them, e.g., up to one or two maps per allocation. The default value is 65536.

十三 min_free_kbytes    This is used to force the Linux VM
to keep a minimum number of kilobytes free. The VM uses this number to
compute a pages_min value for each lowmem zone in the system. Each
lowmem zone gets a number of reserved free pages based proportionally on
its size.

十四 mmap_min_addr     This file indicates the amount of
address space which a user process will be restricted from mmaping.
Since kernel null dereference bugs could accidentally operate based on
the information in the first couple of pages of memory userspace
processes should not be allowed to write to them.    By default this value is set to 0
and no protections will be enforced by the security module. Setting this
value to something like 64k will allow the vast majority of
applications to work correctly and provide defense in depth against
future potential kernel bugs.

十五 nr_hugepages nr_hugepages configures number of hugetlb page reserved for the system.

十六 nr_pdflush_threads The count of currently-running pdflush threads. This is a read-only value.

十七 numa_zonelist_order This sysctl is only for NUMA. 'Where the memory is allocated from' is controlled by zonelists. In non-NUMA case, a zonelist for
GFP_KERNEL is ordered as following: ZONE_NORMAL -> ZONE_DMA. This
means that a memory allocation request for GFP_KERNEL will get memory
from ZONE_DMA only when ZONE_NORMAL is not available. In NUMA case, you can think of following 2 types of order. Assume 2 node NUMA and below is zonelist of Node(0)'s GFP_KERNEL:

(A) Node(0) ZONE_NORMAL -> Node(0) ZONE_DMA -> Node(1) ZONE_NORMAL
(B) Node(0) ZONE_NORMAL -> Node(1) ZONE_NORMAL -> Node(0) ZONE_DMA. Type(A) offers the best locality for
processes on Node(0), but ZONE_DMA will be used before ZONE_NORMAL
exhaustion. This increases possibility of out-of-memory (OOM) of
ZONE_DMA because ZONE_DMA is tend to be small. Type(B) cannot offer the best locality but is more robust against OOM of the DMA zone. Type(A) is called as "Node" order. Type (B) is "Zone" order. "Node order" orders the zonelists by node, then by zone within each node. Specify "[Nn]ode" for node order. "Zone Order" orders the zonelists by zone type, then by node within each zone. Specify "[Zz]one" for zone order. Specify "[Dd]efault" to request automatic configuration. Autoconfiguration will select "node" order in following case:

(1) if the DMA zone does not exist or

(2) if the DMA zone comprises greater than 50% of the available memory or

(3) if any node's DMA zone comprises greater than 60% of its local memory and the amount of local memory is big enough. Otherwise, "zone" order will be
selected. Default order is recommended unless this is causing problems
for your system/application.

十八 overcommit_memory Controls overcommit of system memory,
possibly allowing processes to allocate (but not use) more memory than
is actually available.

  • 0 - Heuristic overcommit handling.
    Obvious overcommits of address space are refused. Used for a typical
    system. It ensures a seriously wild allocation fails while allowing
    overcommit to reduce swap usage. root is allowed to allocate slighly
    more memory in this mode. This is the default.
  • 1 - Always overcommit. Appropriate for some scientific applications.
  • 2 - Don't overcommit. The total
    address space commit for the system is not permitted to exceed swap plus
    a configurable percentage (default is 50) of physical RAM. Depending on
    the percentage you use, in most situations this means a process will
    not be killed while attempting to use already-allocated memory but will
    receive errors on memory allocation as appropriate.

十九 overcommit_ratio Percentage of physical memory size to include in overcommit calculations. Memory allocation limit = swapspace + physmem * (overcommit_ratio / 100) swapspace = total size of all swap areas

physmem = size of physical memory in system

二十  page-cluster page-cluster controls the number of pages which are written to swap in a single attempt. The swap I/O size. It is a logarithmic value - setting
it to zero means "1 page", setting it to 1 means "2 pages", setting it
to 2 means "4 pages", etc. The default value is three (eight
pages at a time). There may be some small benefits in tuning this to a
different value if your workload is swap-intensive.

二十一 panic_on_oom This enables or disables panic on
out-of-memory feature. If this is set to 1, the kernel panics when
out-of-memory happens. If this is set to 0, the kernel will kill some
rogue process, by calling oom_kill(). Usually, oom_killer can kill rogue
processes and system will survive. If you want to panic the system
rather than killing rogue processes, set this to 1. The default value is 0.

二十二 percpu_pagelist_fraction This is the fraction of pages at most
(high mark pcp->high) in each zone that are allocated for each per
cpu page list. The min value for this is 8. It means that we don't allow
more than 1/8th of pages in each zone to be allocated in any single
per_cpu_pagelist. This entry only changes the value of hot per cpu
pagelists. User can specify a number like 100 to allocate 1/100th of
each zone to each per cpu page list. The batch value of each per cpu
pagelist is also updated as a result. It is set to pcp->high / 4. The
upper limit of batch is (PAGE_SHIFT * 8). The initial value is zero. Kernel does not use this value at boot time to set the high water marks for each per cpu page list.

二十三  stat_interval With this tunable you can configure
VM statistics update interval. The default value is 1. This tunable
first appeared in 2.6.22 kernel.

二十四  swap_token_timeout This file contains valid hold time of
swap out protection token. The Linux VM has token based thrashing
control mechanism and uses the token to prevent unnecessary page faults
in thrashing situation. The unit of the value is second. The value would
be useful to tune thrashing behavior. This tunable was removed in 2.6.20 when the algorithm got improved.

二十五  swappiness swappiness is a parameter which sets
the kernel's balance between reclaiming pages from the page cache and
swapping process memory. The default value is 60. If you want kernel to swap out more
process memory and thus cache more file contents increase the value.
Otherwise, if you would like kernel to swap less decrease it.

二十六  vdso_enabled When this flag is set, the kernel
maps a vDSO page into newly created processes and passes its address
down to glibc upon exec(). This feature is enabled by default. vDSO is a virtual DSO (dynamic shared
object) exposed by the kernel at some address in every process' memory.
It's purpose is to speed up system calls. The mapping address used to
be fixed (0xffffe000), but starting with 2.6.18 it's randomized (besides
the security implications, this also helps debuggers

二十七 vfs_cache_pressure Controls the tendency of the kernel to reclaim the memory which is used for caching of directory and inode objects. At the default value of
vfs_cache_pressure = 100 the kernel will attempt to reclaim dentries and
inodes at a "fair" rate with respect to pagecache and swapcache
reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer to
retain dentry and inode caches. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

/proc/sys/ 下内核参数解析的更多相关文章

  1. sysctl内核参数解析

    sysctl内核参数解析 kernel.参数 kernel.shmall = 2097152    ## 1> 表示所有内存大小.可以分配的所有共享内存段的总和最大值.(以页为单位) ## 2& ...

  2. [转载][转]修改/proc目录下的参数优化网络性能

    原文地址:[转]修改/proc目录下的参数优化网络性能作者:雪人 网络优化 注意: 1. 参数值带有速度(rate)的参数不能在loopback接口上工作. 2.因为内核是以HZ为单位的内部时钟来定义 ...

  3. /proc/sys/vm/ 内存参数

    linux下proc里关于磁盘性能的参数 http://blog.csdn.net/eroswang/article/details/6126646  我们在磁盘写操作持续繁忙的服务器上曾经碰到一个特 ...

  4. Linux内核参数配置

    Linux在系统运行时修改内核参数(/proc/sys与/etc/sysctl.conf),而不需要重新引导系统,这个功能是通过/proc虚拟文件系统实现的. 在/proc/sys目录下存放着大多数的 ...

  5. /etc/sysctl.conf 调优 & 优化Linux内核参数

    from: http://apps.hi.baidu.com/share/detail/15652067 http://keyknight.blog.163.com/blog/static/36637 ...

  6. (转)/etc/sysctl.conf 调优 & 优化Linux内核参数

    /etc/sysctl.conf 调优 & 优化Linux内核参数 from: http://apps.hi.baidu.com/share/detail/15652067 http://ke ...

  7. Linux TCP/IP调优参数 /proc/sys/net/目录

    所有的TCP/IP调优参数都位于/proc/sys/net/目录. 例如, 下面是最重要的一些调优参数,后面是它们的含义: /proc/sys/net/core/rmem_default " ...

  8. linux 内核参数优化

    Sysctl命令及linux内核参数调整   一.Sysctl命令用来配置与显示在/proc/sys目录中的内核参数.如果想使参数长期保存,可以通过编辑/etc/sysctl.conf文件来实现.   ...

  9. Linxu内核参数详解

    #表示SYN队列的长度,默认为1024,加大队列长度,可以容纳更多等待连接的网络连接数. net.ipv4.tcp_max_syn_backlog = 65536 #每个网络接口接收数据包的速率比内核 ...

随机推荐

  1. ubuntu 下截图工具的使用

    我个人觉得,ubuntu自带的截图工具功能就不错.具体使用如下: 在ubuntu下的系统设置中找到硬盘区的“键盘”处,进入该设置界面如下: 选择标签“快捷键”,进入新设置界面如下所示: 之后,你就可以 ...

  2. PHP之路——Redis安装

    windows: redis下载链接:https://github.com/ServiceStack/redis-windows 然后编辑redis.windows.conf文件,我看网上有的教程说编 ...

  3. Meta 的两个 相关属性

    Meta标签中的apple-mobile-web-app-status-bar-style属性及含义: “apple-mobile-web-app-status-bar-style”作用是控制状态栏显 ...

  4. Arduino从基础到实践第三章练习题

    先写在这里,还没经过测试. 1. LED两端往中间移动,到中间后向两边返回. // adr301.ino , , , , , , , , , }; ); ; ; unsigned long chang ...

  5. [BZOJ 1559] [JSOI2009] 密码 【AC自动机DP】

    题目链接:BZOJ - 1559 题目分析 将给定的串建成AC自动机,然后在AC自动机上状压DP. 转移边就是Father -> Son 或 Now -> Fail. f[i][j][k] ...

  6. ibatis 中isNull, isNotNull与isEmpty, isNotEmpty区别

    在iBATIS中isNull用于判断参数是否为Null,isNotNull相反 isEmpty判断参数是否为Null或者空,满足其中一个条件则其true isNotEmpty相反,当参数既不为Null ...

  7. cron表达式详解(Spring定时任务配置时间间隔)

    Cron表达式是一个字符串,字符串以5或6个空格隔开,分为6或7个域,每一个域代表一个含义,Cron有如下两种语法格式: Seconds Minutes Hours DayofMonth Month ...

  8. mysql左联右联内联

    在MySQL中由于性能的关系,常常要将子查询(Sub-Queries)用连接(join)来却而代之,能够更好地使用表中索引提高查询效率. 下面介绍各种join的使用,先上图: 我们MySQL常用的为左 ...

  9. haskell模块(modules)

    装载模块 Haskell 中的模块是含有一组相关的函数,类型和类型类的组合.而 Haskell 进程的本质便是从主模块中引用其它模块并调用其中的函数来执行操作.这样可以把代码分成多块,只要一个模块足够 ...

  10. Linux Kernel空指针引用本地拒绝服务漏洞(CVE-2013-5634)

    漏洞版本: Linux kernel 漏洞描述: BUGTRAQ ID: 61995 CVE(CAN) ID: CVE-2013-5634 Linux Kernel是Linux操作系统的内核. 适用于 ...