理解LINUX LOAD AVERAGE的误区

一直不解，为什么io占用较高时，系统负载也会变高，偶遇此文，终解吾惑。

uptime和top等命令都可以看到load average指标，从左至右三个数字分别表示1分钟、5分钟、15分钟的load average：

$ uptime

 :: up  days,  :,   user,  load average: 5.76, 5.54, 5.61

Load average的概念源自UNIX系统，虽然各家的公式不尽相同，但都是用于衡量正在使用CPU的进程数量和正在等待CPU的进程数量，一句话就是runnable processes的数量。所以load average可以作为CPU瓶颈的参考指标，如果大于CPU的数量，说明CPU可能不够用了。

但是，Linux上不是这样的！

Linux上的load average除了包括正在使用CPU的进程数量和正在等待CPU的进程数量之外，还包括uninterruptible sleep的进程数量。通常等待IO设备、等待网络的时候，进程会处于uninterruptible sleep状态。Linux设计者的逻辑是，uninterruptible sleep应该都是很短暂的，很快就会恢复运行，所以被等同于runnable。然而uninterruptible sleep即使再短暂也是sleep，何况现实世界中uninterruptible sleep未必很短暂，大量的、或长时间的uninterruptible sleep通常意味着IO设备遇到了瓶颈。众所周知，sleep状态的进程是不需要CPU的，即使所有的CPU都空闲，正在sleep的进程也是运行不了的，所以sleep进程的数量绝对不适合用作衡量CPU负载的指标，Linux把uninterruptible sleep进程算进load average的做法直接颠覆了load average的本来意义。所以在Linux系统上，load average这个指标基本失去了作用，因为你不知道它代表什么意思，当看到load average很高的时候，你不知道是runnable进程太多还是uninterruptible sleep进程太多，也就无法判断是CPU不够用还是IO设备有瓶颈。

参考资料：https://en.wikipedia.org/wiki/Load_(computing)“Most UNIX systems count only processes in the running (on CPU) or runnable (waiting for CPU) states. However, Linux also includes processes in uninterruptible sleep states (usually waiting for disk activity), which can lead to markedly different results if many processes remain blocked in I/O due to a busy or stalled I/O system.“

源代码：

RHEL6

kernel/sched.c:

===============

static void calc_load_account_active(struct rq *this_rq)

{

        long nr_active, delta;

        nr_active = this_rq->nr_running;

        nr_active += (long) this_rq->nr_uninterruptible;

        if (nr_active != this_rq->calc_load_active) {

                delta = nr_active - this_rq->calc_load_active;

                this_rq->calc_load_active = nr_active;

                atomic_long_add(delta, &calc_load_tasks);

        }

}

RHEL7

kernel/sched/core.c:

====================

static long calc_load_fold_active(struct rq *this_rq)

{

        long nr_active, delta = ;

        nr_active = this_rq->nr_running;

        nr_active += (long) this_rq->nr_uninterruptible;

        if (nr_active != this_rq->calc_load_active) {

                delta = nr_active - this_rq->calc_load_active;

                this_rq->calc_load_active = nr_active;

        }

        return delta;

}

RHEL7

kernel/sched/core.c:

====================

/*

 * Global load-average calculations

 *

 * We take a distributed and async approach to calculating the global load-avg

 * in order to minimize overhead.

 *

 * The global load average is an exponentially decaying average of nr_running +

 * nr_uninterruptible.

 *

 * Once every LOAD_FREQ:

 *

 *   nr_active = 0;

 *   for_each_possible_cpu(cpu)

 *      nr_active += cpu_of(cpu)->nr_running + cpu_of(cpu)->nr_uninterruptible;

 *

 *   avenrun[n] = avenrun[0] * exp_n + nr_active * (1 - exp_n)

 *

 * Due to a number of reasons the above turns in the mess below:

 *

 *  - for_each_possible_cpu() is prohibitively expensive on machines with

 *    serious number of cpus, therefore we need to take a distributed approach

 *    to calculating nr_active.

 *

 *        \Sum_i x_i(t) = \Sum_i x_i(t) - x_i(t_0) | x_i(t_0) := 0

 *                      = \Sum_i { \Sum_j=1 x_i(t_j) - x_i(t_j-1) }

 *

 *    So assuming nr_active := 0 when we start out -- true per definition, we

 *    can simply take per-cpu deltas and fold those into a global accumulate

 *    to obtain the same result. See calc_load_fold_active().

 *

 *    Furthermore, in order to avoid synchronizing all per-cpu delta folding

 *    across the machine, we assume 10 ticks is sufficient time for every

 *    cpu to have completed this task.

 *

 *    This places an upper-bound on the IRQ-off latency of the machine. Then

 *    again, being late doesn't loose the delta, just wrecks the sample.

 *

 *  - cpu_rq()->nr_uninterruptible isn't accurately tracked per-cpu because

 *    this would add another cross-cpu cacheline miss and atomic operation

 *    to the wakeup path. Instead we increment on whatever cpu the task ran

 *    when it went into uninterruptible state and decrement on whatever cpu

 *    did the wakeup. This means that only the sum of nr_uninterruptible over

 *    all cpus yields the correct result.

 *

 *  This covers the NO_HZ=n code, for extra head-aches, see the comment below.

 */

参考：

http://linuxperf.com/?p=176

理解LINUX LOAD AVERAGE的误区的更多相关文章

Linux load average负载量分析与解决思路
一.load average top命令中load average显示的是最近1分钟.5分钟和15分钟的系统平均负载.系统平均负载表示系统平均负载被定义为在特定时间间隔内运行队列中(在CPU上运行或 ...
Linux Load average负载详细解释
http://tianmaotalk.iteye.com/blog/1027970 Linux Load average负载详细解释 linux查看机器负载
linux load average
性能分析_linux服务器CPU_Load Average 理解Linux系统中的load average(图文版) 理解Load Average做好压力测试 top命令的Load average 含 ...
Linux CPU Load Average
理解Linux系统负荷 LINUX下CPU Load Average的一点研究 Linux load average负载量分析与解决思路 Understanding Linux CPU Load - ...
【转】top命令输出解释以及load average 详解及排查思路
https://blog.csdn.net/zhangchenglikecc/article/details/52103737 昨天nagios报警warning,没来得及留下报警截图,nagios值 ...
top命令输出解释以及load average 详解及排查思路
原地址: https://blog.csdn.net/zhangchenglikecc/article/details/52103737 1.top输出以及load average 详解昨天nagi ...
理解Linux系统负荷load average
理解Linux系统负荷一.查看系统负荷如果你的电脑很慢,你或许想查看一下,它的工作量是否太大了. 在Linux系统中,我们一般使用uptime命令查看(w命令和top命令也行).(另外,它们在 ...
理解Linux系统中的load average
理解Linux系统中的load average(图文版) 博客分类: Linux linux load nagios 一.什么是load average? linux系统中的Load对当前CPU工作 ...
[转]理解Linux系统中的load average
转自:http://heipark.iteye.com/blog/1340384 谢谢,写的非常好的文章. 一.什么是load average linux系统中的Load对当前CPU工作量的度量 (W ...

随机推荐

JS原型链与继承别再被问倒了
原文:详解JS原型链与继承摘自JavaScript高级程序设计: 继承是OO语言中的一个最为人津津乐道的概念.许多OO语言都支持两种继承方式: 接口继承和实现继承 .接口继承只继承方法签名,而实 ...
C++ 学习笔记之引用
一.定义: 引用就是某一变量(目标)的一个别名,对引用的操作与对变量直接操作完全一样. 二.用法: 基本用法例如: int & a = b; 引用作为函数返回值先看一个例子: #inclu ...
php中array_map和array_walk的使用对比
一.array_map() 1.array_map() 函数将用户自定义函数作用到数组中的每个值上,并返回用户自定义函数作用后的带有新值的数组,若函数作用后无返回值,则对应的新值数组中为空. 2.回调 ...
代替iframe的方法
$('#framecont').html('').load("pageURL"); 使用jQuery.
LintCode-4.丑数 II
丑数 II 设计一个算法,找出只含素因子2,3,5 的第 n 大的数. 符合条件的数如:1, 2, 3, 4, 5, 6, 8, 9, 10, 12... 注意事项我们可以认为1也是一个丑数样例 ...
css那些事儿4 背景图像
background:背景颜色,图像,平铺方式,大小,位置能够显示背景区域即为盒子模型的填充和内容部分,其中背景图像将会覆盖背景颜色.常见的水平或垂直渐变颜色背景通常使用水平或垂直渐变的背景图像在水 ...
3dContactPointAnnotationTool开发日志（九）
今天的任务是实现选中接触点功能并添加模型或接触点的属性改变功能,先从最简单的位置x,y,z改变入手,于是创建了一个面板(PanelStatus),添加了几个InputField来让用户输入数值改变 ...
关于已部署的WCF服务升级的问题
在日常的开发过程中,我们会经常迭代发布不同的版本,所以WCF服务的接口也会经常处于变动的状态,比如在传递实体类中新加一个字段.修改参数名称等等关于服务升级的问题.但是我们不可能让已发布的版本重新引用新 ...
Delphi 模式窗体返回值ModalResult的使用方法及注意事项
1.基础知识简介: ModalResult是指一个模式窗体(form.showmodal)的返回值,一般用于相应窗体上按钮的ModalResult属性: 显示完窗体(关闭)后,会返回此属性预设的值做为 ...
DELPHI enablecontrols,disablecontrols函数
DisableControls方法是在程序修改或后台有刷新记录的时候切断数据组件,如TTABLE.ADOQUERY等等与组件数据源的联系.如果没有切断,数据源中只要一有数据的改动,尤其是批量改动的话, ...

理解LINUX LOAD AVERAGE的误区

理解LINUX LOAD AVERAGE的误区的更多相关文章

随机推荐

热门专题