From: http://hustcat.github.io/about/

Linux中进程内存与cgroup内存的统计

在Linux内核,对于进程的内存使用与Cgroup的内存使用统计有一些相同和不同的地方。

进程的内存统计

一般来说,业务进程使用的内存主要有以下几种情况:

(1)用户空间的匿名映射页(Anonymous pages in User Mode address spaces),比如调用malloc分配的内存,以及使用MAP_ANONYMOUS的mmap;当系统内存不够时,内核可以将这部分内存交换出去;

(2)用户空间的文件映射页(Mapped pages in User Mode address spaces),包含map file和map tmpfs;前者比如指定文件的mmap,后者比如IPC共享内存;当系统内存不够时,内核可以回收这些页,但回收之前可能需要与文件同步数据;

(3)文件缓存(page in page cache of disk file);发生在程序通过普通的read/write读写文件时,当系统内存不够时,内核可以回收这些页,但回收之前可能需要与文件同步数据;

(4)buffer pages,属于page cache;比如读取块设备文件。

其中(1)和(2)是算作进程的RSS,(3)和(4)属于page cache。

与进程内存统计相关的几个文件:

/proc/[pid]/stat
(23) vsize %lu
Virtual memory size in bytes.
(24) rss %ld
Resident Set Size: number of pages the process has
in real memory. This is just the pages which count
toward text, data, or stack space. This does not
include pages which have not been demand-loaded in,
or which are swapped out.

RSS的计算:

对应top的RSS列,do_task_stat

#define get_mm_rss(mm)					\
(get_mm_counter(mm, file_rss) + get_mm_counter(mm, anon_rss))

即RSS=file_rss + anon_rss

  • /proc/[pid]/statm
Provides information about memory usage, measured in pages.
The columns are: size (1) total program size
(same as VmSize in /proc/[pid]/status)
resident (2) resident set size
(same as VmRSS in /proc/[pid]/status)
share (3) shared pages (i.e., backed by a file)
text (4) text (code)
lib (5) library (unused in Linux 2.6)
data (6) data + stack
dt (7) dirty pages (unused in Linux 2.6)

见函数proc_pid_statm

int task_statm(struct mm_struct *mm, int *shared, int *text,
int *data, int *resident)
{
*shared = get_mm_counter(mm, file_rss);
*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
>> PAGE_SHIFT;
*data = mm->total_vm - mm->shared_vm;
*resident = *shared + get_mm_counter(mm, anon_rss);
return mm->total_vm;
}

top的SHR=file_rss。实际上,进程使用的共享内存,也是算到file_rss的,因为共享内存基于tmpfs。

anon_rss与file_rss的计算

static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, pmd_t *pmd,
pgoff_t pgoff, unsigned int flags, pte_t orig_pte)
{
if (flags & FAULT_FLAG_WRITE) {
if (!(vma->vm_flags & VM_SHARED)) {
anon = 1;///anon page
...
if (anon) {
inc_mm_counter(mm, anon_rss);
page_add_new_anon_rmap(page, vma, address);
} else {
inc_mm_counter(mm, file_rss);
page_add_file_rmap(page);

cgroup 的内存统计

stat file

memory.stat file includes following statistics

# per-memory cgroup local status
cache - # of bytes of page cache memory.
rss - # of bytes of anonymous and swap cache memory (includes
transparent hugepages).
rss_huge - # of bytes of anonymous transparent hugepages.
mapped_file - # of bytes of mapped file (includes tmpfs/shmem)
pgpgin - # of charging events to the memory cgroup. The charging
event happens each time a page is accounted as either mapped
anon page(RSS) or cache page(Page Cache) to the cgroup.
pgpgout - # of uncharging events to the memory cgroup. The uncharging
event happens each time a page is unaccounted from the cgroup.
swap - # of bytes of swap usage
dirty - # of bytes that are waiting to get written back to the disk.
writeback - # of bytes of file/anon cache that are queued for syncing to
disk.
inactive_anon - # of bytes of anonymous and swap cache memory on inactive
LRU list.
active_anon - # of bytes of anonymous and swap cache memory on active
LRU list.
inactive_file - # of bytes of file-backed memory on inactive LRU list.
active_file - # of bytes of file-backed memory on active LRU list.
unevictable - # of bytes of memory that cannot be reclaimed (mlocked etc).

相关代码

static void
mem_cgroup_get_local_stat(struct mem_cgroup *mem, struct mcs_total_stat *s)
{
s64 val; /* per cpu stat */
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_CACHE);
s->stat[MCS_CACHE] += val * PAGE_SIZE;
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_RSS);
s->stat[MCS_RSS] += val * PAGE_SIZE;
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_FILE_MAPPED);
s->stat[MCS_FILE_MAPPED] += val * PAGE_SIZE;
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_PGPGIN_COUNT);
s->stat[MCS_PGPGIN] += val;
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_PGPGOUT_COUNT);
s->stat[MCS_PGPGOUT] += val;
if (do_swap_account) {
val = mem_cgroup_read_stat(&mem->stat, MEM_CGROUP_STAT_SWAPOUT);
s->stat[MCS_SWAP] += val * PAGE_SIZE;
} /* per zone stat */
val = mem_cgroup_get_local_zonestat(mem, LRU_INACTIVE_ANON);
s->stat[MCS_INACTIVE_ANON] += val * PAGE_SIZE;
val = mem_cgroup_get_local_zonestat(mem, LRU_ACTIVE_ANON);
s->stat[MCS_ACTIVE_ANON] += val * PAGE_SIZE;
val = mem_cgroup_get_local_zonestat(mem, LRU_INACTIVE_FILE);
s->stat[MCS_INACTIVE_FILE] += val * PAGE_SIZE;
val = mem_cgroup_get_local_zonestat(mem, LRU_ACTIVE_FILE);
s->stat[MCS_ACTIVE_FILE] += val * PAGE_SIZE;
val = mem_cgroup_get_local_zonestat(mem, LRU_UNEVICTABLE);
s->stat[MCS_UNEVICTABLE] += val * PAGE_SIZE;
}

数据结构

struct mem_cgroup {
...
/*
* statistics. This must be placed at the end of memcg.
*/
struct mem_cgroup_stat stat; ///统计数据
}; /* memory cgroup统计值
* Statistics for memory cgroup.
*/
enum mem_cgroup_stat_index {
/*
* For MEM_CONTAINER_TYPE_ALL, usage = pagecache + rss.
*/
MEM_CGROUP_STAT_CACHE, /* # of pages charged as cache */
MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */
MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */
MEM_CGROUP_STAT_PGPGIN_COUNT, /* # of pages paged in */
MEM_CGROUP_STAT_PGPGOUT_COUNT, /* # of pages paged out */
MEM_CGROUP_STAT_EVENTS, /* sum of pagein + pageout for internal use */
MEM_CGROUP_STAT_SWAPOUT, /* # of pages, swapped out */ MEM_CGROUP_STAT_NSTATS,
}; struct mem_cgroup_stat_cpu {
s64 count[MEM_CGROUP_STAT_NSTATS];
} ____cacheline_aligned_in_smp; struct mem_cgroup_stat {
struct mem_cgroup_stat_cpu cpustat[0];
};
rss and cache
cache - # of bytes of page cache memory. rss - # of bytes of anonymous and swap cache memory (includes transparent hugepages). static void mem_cgroup_charge_statistics(struct mem_cgroup *mem,
struct page_cgroup *pc,
long size)
{
...
cpustat = &stat->cpustat[cpu];
if (PageCgroupCache(pc))
__mem_cgroup_stat_add_safe(cpustat,
MEM_CGROUP_STAT_CACHE, numpages);
else
__mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_RSS,
numpages); static void __mem_cgroup_commit_charge(struct mem_cgroup *mem,
struct page_cgroup *pc,
enum charge_type ctype,
int page_size)
{ switch (ctype) {
case MEM_CGROUP_CHARGE_TYPE_CACHE:
case MEM_CGROUP_CHARGE_TYPE_SHMEM: //file cache + shm
SetPageCgroupCache(pc);
SetPageCgroupUsed(pc);
break;
case MEM_CGROUP_CHARGE_TYPE_MAPPED:
ClearPageCgroupCache(pc);
SetPageCgroupUsed(pc);
break;
default:
break;
}
///更新统计值
mem_cgroup_charge_statistics(mem, pc, page_size); int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm,
gfp_t gfp_mask)
{
...
if (page_is_file_cache(page))
return mem_cgroup_charge_common(page, mm, gfp_mask,
MEM_CGROUP_CHARGE_TYPE_CACHE, NULL); ///file cache /* shmem */
if (PageSwapCache(page)) {
ret = mem_cgroup_try_charge_swapin(mm, page, gfp_mask, &mem);
if (!ret)
__mem_cgroup_commit_charge_swapin(page, mem,
MEM_CGROUP_CHARGE_TYPE_SHMEM);
} else
ret = mem_cgroup_charge_common(page, mm, gfp_mask, ///shm memory
MEM_CGROUP_CHARGE_TYPE_SHMEM, mem);
可以看到,cache包含共享内存和file cache mapped_file
mapped_file - # of bytes of mapped file (includes tmpfs/shmem) void mem_cgroup_update_file_mapped(struct page *page, int val)
{
... __mem_cgroup_stat_add_safe(cpustat, MEM_CGROUP_STAT_FILE_MAPPED, val);
__do_fault –> page_add_file_rmap –> mem_cgroup_update_file_mapped。 inactive_anon
inactive_anon - # of bytes of anonymous and swap cache memory on inactive LRU list. static int shmem_getpage_gfp(struct inode *inode, pgoff_t index,
struct page **pagep, enum sgp_type sgp, gfp_t gfp, int *fault_type)
{
...
lru_cache_add_anon(page); /**
* lru_cache_add: add a page to the page lists
* @page: the page to add
*/
static inline void lru_cache_add_anon(struct page *page)
{
__lru_cache_add(page, LRU_INACTIVE_ANON);
} 从这里可以看到,共享内存会增加到INACTIVE_ANON。 inactive_file
inactive_file - # of bytes of file-backed memory on inactive LRU list.文件使用的page cache(不包含共享内存) static inline void lru_cache_add_file(struct page *page)
{
__lru_cache_add(page, LRU_INACTIVE_FILE);
}
add_to_page_cache_lru –> lru_cache_add_file.

示例程序

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <errno.h>
#include <unistd.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#define BUF_SIZE 4000000000
#define MYKEY 26 int main(int argc,char **argv){
int shmid;
char *shmptr; if((shmid = shmget(MYKEY,BUF_SIZE,IPC_CREAT)) ==-1){
fprintf(stderr,"Create Share Memory Error0m~Z%s\n\a",strerror(errno));
exit(1);
} if((shmptr =shmat(shmid,0,0))==(void *)-1){
printf("shmat error!\n");
exit(1);
} memset(shmptr,'\0',1000000000); printf("sleep...\n");
while(1)
sleep(1); exit(0);
}

执行程序前后,cgroup memory.stat的值:

  • 执行前:*
# cat memory.stat
cache 1121185792
rss 23678976
rss_huge 0
mapped_file 14118912
inactive_anon 1002643456
active_anon 23687168
inactive_file 46252032
active_file 72282112
  • 执行后:*
# cat memory.stat
cache 2121187328
rss 23760896
rss_huge 0
mapped_file 1014124544
inactive_anon 2002608128
active_anon 23736320
inactive_file 46247936
active_file 72286208

ipcs -m

0x0000001a 229380 root 0 4000000000 1

可以看到cgroup中,共享内存计算在cache、mapped_file、inactive_anon中。

小结

(1)进程rss与cgroup rss的区别

进程的RSS为进程使用的所有物理内存(file_rss+anon_rss),即Anonymous pages+Mapped apges(包含共享内存)。cgroup RSS为(anonymous and swap cache memory),不包含共享内存。两者都不包含file cache。

(2)cgroup cache包含file cache和共享内存。

参考

http://man7.org/linux/man-pages/man5/proc.5.html

https://www.kernel.org/doc/Documentation/cgroups/memory.txt

[转]Linux中进程内存与cgroup内存的统计的更多相关文章

  1. linux中进程控制

    1.进程标识 每个进程都有一个非负整型表示的唯一的进程ID.进程ID标识符总是唯一的.  虽然进程ID是唯一的,但某个ID被回收后,ID号是可以复用的. ID为0的进程通常是调度进程(其常常被称交换进 ...

  2. 【网络编程基础】Linux下进程通信方式(共享内存,管道,消息队列,Socket)

    在网络课程中,有讲到Socket编程,对于tcp讲解的环节,为了加深理解,自己写了Linux下进程Socket通信,在学习的过程中,又接触到了其它的几种方式.记录一下. 管道通信(匿名,有名) 管道通 ...

  3. Linux中进程控制块PCB-------task_struct结构体结构

    Linux中task_struct用来控制管理进程,结构如下: struct task_struct { //说明了该进程是否可以执行,还是可中断等信息 volatile long state; // ...

  4. 『学了就忘』Linux系统管理 — 82、Linux中进程的查看(ps命令)

    目录 1.ps命令介绍 2.ps aux命令示例 3.ps -le命令示例 4.pstree命令 1.ps命令介绍 ps命令是用来静态显示系统中进程的命令. 不过这个命令有些特殊,它部分命令的选项前不 ...

  5. 『学了就忘』Linux系统管理 — 83、Linux中进程的查看(top命令)

    目录 1.top命令介绍 2.top命令示例 3.top命令输出项解释 4.top命令常用的实例 1.top命令介绍 top命令是用来动态显示系统中进程的命令. [root@localhost ~]# ...

  6. Linux中进程的优先级

    Linux採用两种不同的优先级范围,一种是nice值.还有一种是实时优先级. 1.nice值 nice值得范围是-20~19,默认值是0. 越大的nice值意味着更低的优先级.也就是说nice值为-2 ...

  7. Linux 中进程的管理

    Linux 的进程信号 1  HUP  挂起 2  INT  中断 3 QUIT  结束运行 9 KILL 无条件终止 11 SEGV 段错误 15 TERM 尽可能终止 17 STOP 无条件终止运 ...

  8. 使用Visual VM 查看linux中tomcat运行时JVM内存

    前言:在生产环境中经常发生服务器内存溢出,假死或者线程死锁等异常,导致服务不可用.我们经常使用的解决方法是通过分析错误日记,然后去寻找代码到底哪里出现了问题,这样的方式也许会奏效,但是排查起来耗费时间 ...

  9. 关于Linux下进程间使用共享内存和信号量通信的时的编译问题

    今天在编译一个使用信号量实现进程同步时,出现了库函数不存在的问题.如下图 编译结果实际上是说,没include相应的头文件,或是头文件不存在(即系统不支持该库函数) 但我man shm_open是可以 ...

随机推荐

  1. ubuntu双网卡绑定配置

    1,安装bonding需要的软件 sudo apt-get install ifenslave 2,在/etc/modules中加入: bonding mode= miimon= 3,在/etc/ne ...

  2. POJ2352 star

    传送门 这道题有个非常好听的名字,求二维偏序! 听起来似乎很高端,但就是让求满足对于每个i,xi < xj && yi < yj的个数. 这道题特别良心,给的顺序都是y递增 ...

  3. nginx开发(一) 源码-编译

    1:获取源码 http://nginx.org/download/nginx-1.8.0.tar.gz 2:编译 解压之后,进入根目录,执行 ./configuer.sh make make inst ...

  4. mybatis批量update操作的写法,及批量update报错的问题解决方法

    mybatis的批量update操作写法很简单,如下: public interface YourMapper extends BaseMapper<YourExt> { void upd ...

  5. 【转载】DNS原理及其解析过程

    1.在浏览器中输入www.qq.com域名,操作系统会先检查自己本地的hosts文件是否有这个网址映射关系,如果有,就先调用这个IP地址映射,完成域名解析. 2.如果hosts里没有这个域名的映射,则 ...

  6. jQuery插件之jqzoom的使用和参数设置

    jqzoom是一款基于jQuery的图片方法插件. 使用方法:1.引入jQuery与jqzoom,jqzoom.css 2.准备两张一大一小大小相同的图片,小图片放在<img>标签的&qu ...

  7. vbnet 进程监控,监控Acad开启的数量,并且添加到开机启动

    1# 自定义函数,添加到注册表 Public Sub StartRunRegHKLM() REM HKEY_LOCAL_MACHINE \ SOFTWARE \ WOW6432Node \ Micro ...

  8. 在CentOS下安装VMware tool

    VMware tools是虚拟机VMware Workstation自带的一款工具.它的作用就是使用户可以从物理主机直接往虚拟机里面拖文件.如果不安装它,我们是无法进行虚拟机和物理机之间的文件传输的. ...

  9. python orm / 表与model相互转换

    orm英文全称object relational mapping,就是对象映射关系程序,简单来说我们类似python这种面向对象的程序来说一切皆对象,但是我们使用的数据库却都是关系型的,为了保证一致的 ...

  10. RHEL5.6更新yum源

    RHEL5.6更新yum源记录,2017年2月20日 root用户切换目录至:/etc/yum.repos.d/ [root@localhost yum.repos.d]# pwd /etc/yum. ...