Your Linux server is running slow, so you follow standard procedure and run top. You see the CPU metrics:

But what do all of those 2-letter abbreviations mean?

The 3 CPU states

Let's take a step back. There are 3 general states your CPU can be in:

  • Idle, which means it has nothing to do.
  • Running a user space program, like a command shell, an email server, or a compiler.
  • Running the kernel, servicing interrupts or managing resources.

These three meta states can be further subdivided. For example, user space programs can be categorized as those running under their initial priority level or those running with a nice priority. Niceness is a way to tweak the priority level of a process so that it runs less frequently. The niceness level ranges from -20 (most favorable scheduling) to 19 (least favorable). By default processes on Linux are started with a niceness of 0. See our blog post Restricting process CPU usage using nice, cpulimit, and cgroups for more information on nice.

The 7 cpu statistics explained

There are several different ways to see the various CPU statistics. The most common is probably using the top command.

To start the top command you just type top at the command line:

The output from top is divided into two sections. The first few lines give a summary of the system resources including a breakdown of the number of tasks, the CPU statistics, and the current memory usage. Beneath these stats is a live list of the current running processes. This list can be sorted by PID, CPU usage, memory usage, and so on.

The CPU line will look something like this:

%Cpu(s): 24.8 us,  0.5 sy,  0.0 ni, 73.6 id,  0.4 wa,  0.0 hi,  0.2 si,  0.0 st

24.8 us - This tells us that the processor is spending 24.8% of its time running user space processes. A user space program is any process that doesn't belong to the kernel. Shells, compilers, databases, web servers, and the programs associated with the desktop are all user space processes. If the processor isn't idle, it is quite normal that the majority of the CPU time should be spent running user space processes.

73.6 id - Skipping over a few of the other statistics, just for a moment, the id statistic tell us that the processor was idle just over 73% of the time during the last sampling period. The total of the user space percentage - us, the niced percentage - ni, and the idle percentage - id, should be close to 100%. Which it is in this case. If the CPU is spending a more time in the other states then something is probably awry - see the Troubleshooting section below.

0.5 sy - This is the amount of time that the CPU spent running the kernel. All the processes and system resources are handled by the Linux kernel. When a user space process needs something from the system, for example when it needs to allocate memory, perform some I/O, or it needs to create a child process, then the kernel is running. In fact the scheduler itself which determines which process runs next is part of the kernel. The amount of time spent in the kernel should be as low as possible. In this case, just 0.5% of the time given to the different processes was spent in the kernel. This number can peak much higher, especially when there is a lot of I/O happening.

0.0 ni - As mentioned above, the priority level a user space process can be tweaked by adjusting its niceness. The ni stat shows how much time the CPU spent running user space processes that have been niced. On a system where no processes have been niced then the number will be 0.

0.4 wa - Input and output operations, like reading or writing to a disk, are slow compared to the speed of a CPU. Although this operations happen very fast compared to everyday human activities, they are still slow when compared to the performance of a CPU. There are times when the processor has initiated a read or write operation and then it has to wait for the result, but has nothing else to do. In other words it is idle while waiting for an I/O operation to complete. The time the CPU spends in this state is shown by the wa statistic.

0.0 hi & 0.2 si - These two statistics show how much time the processor has spent servicing interruptshi is for hardware interrupts, and si is for software interrupts. Hardware interrupts are physical interrupts sent to the CPU from various peripherals like disks and network interfaces. Software interrupts come from processes running on the system. A hardware interrupt will actually cause the CPU to stop what it is doing and go handle the interrupt. A software interrupt doesn't occur at the CPU level, but rather at the kernel level.

0.0 st - This last number only applies to virtual machines. When Linux is running as a virtual machine on a hypervisor, the st (short for stolen) statistic shows how long the virtual CPU has spent waiting for the hypervisor to service another virtual CPU running on a different virtual machine. Since in the real-world these virtual processors are sharing the same physical processor(s) then there will be times when the virtual machine wanted to run but the hypervisor scheduled another virtual machine instead.

Troubleshooting

On a busy server or desktop PC, you can expect the amount of time the CPU spends in idle to be small. However, if a system rarely has any idle time then then it is either a) overloaded (and you need a better one), or b) something is wrong.

Here is a brief look at some of the things that can go wrong and how they affect the CPU utilization.

High user mode - If a system suddenly jumps from having spare CPU cycles to running flat out, then the first thing to check is the amount of time the CPU spends running user space processes. If this is high then it probably means that a process has gone crazy and is eating up all the CPU time. Using the top command you will be able to see which process is to blame and restart the service or kill the process.

High kernel usage - Sometimes this is acceptable. For example a program that does lots of console I/O can cause the kernel usage to spike. However if it remains higher for long periods of time then it could be an indication that something isn't right. A possible cause of such spikes could be a problem with a driver/kernel module.

High niced value - If the amount of time the CPU is spending running processes with a niced priority value jumps then it means that someone has started some intensive CPU jobs on the system, but they have niced the task.

If the niceness level is greater than zero then the user has been courteous enough lower to the priority of the process and therefore avoid a CPU overload. There is probably little that needs to be done in this case, other than maybe find out who has started the process and talk about how you can help out!

But if the niceness level is less than 0, then you will need to investigate what is happening and who is responsible, as such a task could easily cripple the responsiveness of the system.

High waiting on I/O - This means that there are some intensive I/O tasks running on the system that don't use up much CPU time. If this number is high for anything other than short bursts then it means that either the I/O performed by the task is very inefficient, or the data is being transferred to a very slow device, or there is a potential problem with a hard disk that is taking a long time to process reads & writes.

High interrupt processing - This could be an indication of a broken peripheral that is causing lots of hardware interrupts or of a process that is issuing lots of software interrupts.

Large stolen time - Basically this means that the host system running the hypervisor is too busy. If possible, check the other virtual machines running on the hypervisor, and/or migrate to your virtual machine to another host.

TL;DR

Linux keeps statistics on how much time the CPU spends performing different tasks. Most of its time should be spent running user space programs or being idle. However there are several other execution states including running the kernel and servicing interrupts. Monitoring these different states can help you keep your system healthy and running smoothly.

Refer:

Understanding Linux CPU stats

Understanding Linux CPU stats的更多相关文章

  1. Understanding Linux CPU Load - when should you be worried?

    http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages

  2. Linux CPU Load Average

    理解Linux系统负荷 LINUX下CPU Load Average的一点研究 Linux load average负载量分析与解决思路 Understanding Linux CPU Load - ...

  3. 理解Linux CPU负载和 CPU使用率

    CPU负载和 CPU使用率 这两个从一定程度上都可以反映一台机器的繁忙程度. cpu使用率反映的是当前cpu的繁忙程度,忽高忽低的原因在于占用cpu处理时间的进程可能处于io等待状态但却还未释放进入w ...

  4. How do I Find Out Linux CPU Utilization?

    From:http://www.cyberciti.biz/tips/how-do-i-find-out-linux-cpu-utilization.html Whenever a Linux sys ...

  5. Linux CPU亲缘性详解

    前言 在淘宝开源自己基于nginx打造的tegine服务器的时候,有这么一项特性引起了笔者的兴趣.“自动根据CPU数目设置进程个数和绑定CPU亲缘性”.当时笔者对CPU亲缘性没有任何概念,当时作者只是 ...

  6. 查看线程linux cpu使用率

    Linux下如何查看高CPU占用率线程 LINUX CPU利用率计算 转 http://www.cnblogs.com/lidabo/p/4738113.html目录(?)[-] proc文件系统 p ...

  7. Linux CPU数量判断,通过/proc/cpuinfo.

    Linux CPU数量判断,通过/proc/cpuinfo. 相同 physical id :决定一个物理处理器 如果“siblings”和“cpu cores”一致,则说明不支持超线程,或者超线程未 ...

  8. Linux CPU监控指标

    Linux CPU监控指标 Linux提供了非常丰富的命令可以进行CPU相关数据进行监控,例如:top.vmstat等命令.top是一个动态显示过程,即可以通过用户按键来不断刷新当前状态.如果在前台执 ...

  9. 转载: 一、linux cpu、内存、IO、网络的测试工具

    来源地址: http://blog.csdn.net/wenwenxiong/article/details/77197997 记录一下 以后好找.. 一.linux cpu.内存.IO.网络的测试工 ...

随机推荐

  1. python基础下的数据结构与算法之顺序表

    一.什么是顺序表: 线性表的两种基本的实现模型: 1.将表中元素顺序地存放在一大块连续的存储区里,这样实现的表称为顺序表(或连续表).在这种实现中,元素间的顺序关系由它们的存储顺序自然表示. 2.将表 ...

  2. jstat命令总结

    jvm统计信息监控工具 一. jstat是什么 jstat是JDK自带的一个轻量级小工具.全称"Java Virtual Machine statistics monitoring tool ...

  3. 在多线程中使用spring的bean

    由于spring在java开发中的广泛运用大大的方便了开发的同时,当运用一些技术比如多线程等 在由spring管理的配置文件中,可以通过封装spring提供工具,手动获得spring管理的bean,这 ...

  4. [USACO08OCT]Watering Hole

    [USACO08OCT]Watering Hole 题目大意: Farmer John 有\(n(n\le300)\)个牧场,他希望灌溉他的所有牧场.牧场编号为\(1\sim n\),要灌溉一个牧场有 ...

  5. String和StringBuilder、StringBuffer的区别?

    估计很多Java初学者在学习Java的过程中都会遇到这个问题,那就是String,StringBuilder,StringBuffer这三个类之间有什么区别?今天在这里整理一下,希望对大家有帮助哈.如 ...

  6. BZOJ2888 : 资源运输

    显然资源集合处就是树的重心,这题需要动态维护树的重心. 每个连通块以重心为根,用link-cut tree维护每个点的子树大小以及子树内所有点到它的距离和. 合并两个连通块时,考虑启发式合并,暴力往大 ...

  7. bzoj 2753: [SCOI2012]滑雪与时间胶囊 -- 最小生成树

    2753: [SCOI2012]滑雪与时间胶囊 Time Limit: 50 Sec  Memory Limit: 128 MB Description a180285非常喜欢滑雪.他来到一座雪山,这 ...

  8. 关于MongoDB时区问题

    由于MongoDb存储时间按照UTC时间存储的,其官方驱动MongoDB.driver存储时间的时候将本地时间转换为了utc时间,但它有个蛋疼的bug,读取的时候非常蛋疼的是返回的是utc使时间.一个 ...

  9. Magent搭建Memcached集群

    原文地址:http://ultrasql.blog.51cto.com/9591438/1636374 Memcached集群介绍 由于Memcached服务器与服务器之间没有任何通讯,并且不进行任何 ...

  10. CefSharp 在同一窗口打开链接的方法

    摘要 在winform中使用cefsharp的时候,我们在浏览网页的时候,想在同一个窗口打开链接,而不是创建新的窗口.可以通过下面的方法实现. 解决方案 CefSharp 中控制弹窗的接口是 ILif ...