A brief introduction to per-cpu variables

墙外通道：http://thinkiii.blogspot.com/2014/05/a-brief-introduction-to-per-cpu.html

per-cpu variables are widely used in Linux kernel such as per-cpu counters, per-cpu cache. The advantages of per-cpu variables are obvious: for a per-cpu data, we do not need locks to synchronize with other cpus. Without locks, we can gain more performance.

There are two kinds of type of per-cpu variables: static and dynamic. For static variables are defined in build time. Linux provides a DEFINE_PER_CPU macro to defines this per-cpu variables.

#define DEFINE_PER_CPU(type, name)

static DEFINE_PER_CPU(struct delayed_work, vmstat_work);

Dynamic per-cpu variables can be obtained in run-time by __alloc_percpu API. __alloca_percpu returns the per-cpu address of the variable.

void __percpu *__alloc_percpu(size_t size, size_t align)

s->cpu_slab = __alloc_percpu(sizeof(struct kmem_cache_cpu), * sizeof(void *));

One big difference between per-cpu variable and other variable is that we must use per-cpu variable macros to access the real per-cpu variable for a given cpu. Accessing per-cpu variables without through these macros is a bug in Linux kernel programming. We will see the reason later.

Here are two examples of accessing per-cpu variables:

struct vm_event_state *this = &per_cpu(vm_event_states, cpu);

struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);

Let's take a closer look at the behaviour of Linux per-cpu variables. After we define our static per-cpu variables, the complier will collect all static per-cpu variables to the per-cpu sections. We can see them by 'readelf' or 'nm' tools:

 D __per_cpu_start

...

000000000000f1c0 d lru_add_drain_work

000000000000f1e0 D vm_event_states

000000000000f420 d vmstat_work

000000000000f4a0 d vmap_block_queue

000000000000f4c0 d vfree_deferred

000000000000f4f0 d memory_failure_cpu

...

0000000000013ac0 D __per_cpu_end

[] .vvar             PROGBITS         ffffffff81698000

     00000000000000f0    WA

[] .data..percpu     PROGBITS           00a00000

     0000000000013ac0    WA

[] .init.text        PROGBITS         ffffffff816ad000  00aad000

     000000000003fa21    AX

You can see our vmstat_work is at 0xf420, which is within __per_cpu_start and __per_cpu_end. The two special symbols (__per_cpu_start and __per_cpu_end) mark the start and end address of the per-cpu section.

One simple question: there are only one entry of vmstat_work in the per-cpu section, but we should have NR_CPUS entries of it. Where are all other vmstat_work entries?

Actually the per-cpu section is just a roadmap of all per-cpu variables. The real body of every per-cpu variable is allocated in a per-cpu chunk at runt-time. Linux make NR_CPUS copies of static/dynamic varables. To get to those real bodies of per-cpu variables, we use per_cpu or per_cpu_ptr macros.

What per_cpu and per_cpu_ptr do is to add a offset (named __per_cpu_offset) to the given address to reach the read body of the per-cpu variable.

#define per_cpu(var, cpu) \

        (*SHIFT_PERCPU_PTR(&(var), per_cpu_offset(cpu)))

#define per_cpu_offset(x) (__per_cpu_offset[x])

It's easier to understand the idea by a picture:

Translating a per-cpu variable to its real body (NR_CPUS = 4)

Take a closer look:
There are three part of an unit: static, reserved, and dynamic.
static: the static per-cpu variables. (__per_cpu_end - __per_cpu_start)
reserved: per-cpu slot reserved for kernel modules
dynamic: slots for dynamic allocation (__alloc_percpu)

Unit and chunk

static struct pcpu_alloc_info * __init pcpu_build_alloc_info(

                                size_t reserved_size, size_t dyn_size,

                                size_t atom_size,

                                pcpu_fc_cpu_distance_fn_t cpu_distance_fn)

{

        static int group_map[NR_CPUS] __initdata;

        static int group_cnt[NR_CPUS] __initdata;

        const size_t static_size = __per_cpu_end - __per_cpu_start;

+--  lines: int nr_groups = , nr_units = ;----------------------

        /* calculate size_sum and ensure dyn_size is enough for early alloc */

        size_sum = PFN_ALIGN(static_size + reserved_size +

                            max_t(size_t, dyn_size, PERCPU_DYNAMIC_EARLY_SIZE));

        dyn_size = size_sum - static_size - reserved_size;

+-- lines: Determine min_unit_size, alloc_size and max_upa such that--

}

After determining the size of the unit, the chunk is allocated by the memblock APIs.

int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,

                                  size_t atom_size,

                                  pcpu_fc_cpu_distance_fn_t cpu_distance_fn,

                                  pcpu_fc_alloc_fn_t alloc_fn,

                                  pcpu_fc_free_fn_t free_fn)

{

+--  lines: void *base = (void *)ULONG_MAX;---------------------------------

        /* allocate, copy and determine base address */

        for (group = ; group < ai->nr_groups; group++) {

                struct pcpu_group_info *gi = &ai->groups[group];

                unsigned int cpu = NR_CPUS;

                void *ptr;

                for (i = ; i < gi->nr_units && cpu == NR_CPUS; i++)

                        cpu = gi->cpu_map[i];

                BUG_ON(cpu == NR_CPUS);

                /* allocate space for the whole group */

                ptr = alloc_fn(cpu, gi->nr_units * ai->unit_size, atom_size);

                if (!ptr) {

                        rc = -ENOMEM;

                        goto out_free_areas;

                }

                /* kmemleak tracks the percpu allocations separately */

                kmemleak_free(ptr);

                areas[group] = ptr;

                base = min(ptr, base);

        }

+--  lines: Copy data and free unused parts.  This should happen after all---

}

static void * __init pcpu_dfl_fc_alloc(unsigned int cpu, size_t size,

                                       size_t align)

{

        return  memblock_virt_alloc_from_nopanic(

                        size, align, __pa(MAX_DMA_ADDRESS));

}

A brief introduction to per-cpu variables的更多相关文章

InnoDB Spin rounds per wait在>32位机器上可能为负
今天发现一个系统innodb的spin rounds per wait为负,感觉很奇怪,原来是个bug: For example (output from PS but we have no patc ...
机器学习、NLP、Python和Math最好的150余个教程（建议收藏）
编辑 | MingMing 尽管机器学习的历史可以追溯到1959年,但目前,这个领域正以前所未有的速度发展.最近,我一直在网上寻找关于机器学习和NLP各方面的好资源,为了帮助到和我有相同需求的人,我整 ...
超过 150 个最佳机器学习，NLP 和 Python教程
超过 150 个最佳机器学习,NLP 和 Python教程微信号 & QQ:862251340微信公众号:coderpai简书地址:http://www.jianshu.com/p/2be3 ...
Introduction to Parallel Computing
Copied From:https://computing.llnl.gov/tutorials/parallel_comp/ Author: Blaise Barney, Lawrence Live ...
Linux CPU Hotplug CPU热插拔
http://blog.chinaunix.net/uid-15007890-id-106930.html CPU hotplug Support in Linux(tm) Kernel Linu ...
Sed - An Introduction and Tutorial by Bruce Barnett
http://www.grymoire.com/unix/sed.html Quick Links - NEW Sed Commands : label # comment {....} Block ...
An Introduction to Lock-Free Programming
Lock-free programming is a challenge, not just because of the complexity of the task itself, but bec ...
Android 性能优化（20）多核cpu入门：SMP Primer for Android
SMP Primer for Android 1.In this document Theory Memory consistency models Processor consistency CPU ...
Introduction to Linux Threads
Introduction to Linux Threads A thread of execution is often regarded as the smallest unit of proces ...

随机推荐

ELK+SpringBoot+Logback离线安装及配置
ELK+SpringBoot+Logback 离线安装及配置版本 v1.0 编写时间 2018/6/11 编写人 xxx 目录一. ELK介绍2 二. 安装环境2 三. Elasticse ...
杨其菊201771010134《面向对象程序设计Java》第二周学习总结
第三章 Java基本程序设计结构第一部分:(理论知识部分) 本章主要学习:基本内容:数据类型:变量:运算符:类型转换,字符串,输入输出,控制流程,大数值以及数组. 1.基本概念: 1)标识符:由字母 ...
nc6 用业务插件注册来跑按钮事件
在实际开发中,有些需求是要求系统单据,编辑或者触发其他按钮来回写其他模块单据这时候就能用业务插件方式来触发其他模块的按钮事件,而不用去模块找对应的按钮编辑事件类 package hz.bs.hzct ...
tar: Removing leading `/' from member names
解决办法使用 -P 参数注意 -f 参数后面跟压缩后的文件名
mysql c-api 预处理语句
stmt = mysql_stmt_init(mysql) mysql_stmt_prepare(stmt, "SELECT ?", strlen("SELECT ?&q ...
SpringMVC 学习十一 springMVC控制器向jsp或者别的控制器传递参数的四种方法
以后的开发,大部分是发送ajax,因此这四种传递参数的方法,并不太常用.作为了解吧第一种:使用原生 Servlet 在控制器的响应的方法中添加Servlet中的一些作用域:HttpRequestSe ...
liunx Ubuntu 设置IP、网关、DNS
说明:在网上给的教程上面通常会有这样的一个误导思路,按照配置文件设置后会不生效的问题,甚至没有一点效果,经过排查发现Linux下设置IP这个话题的入口线索应该分为两种:1为Server版,2为Desk ...
Spring配置Bean，为属性赋值
SayHello的实体类: package com.langchao; /** * @ClassName: SayHello * @description: * @author: ZhangYawei ...
PLL与PHY的连接：通道绑定或者不绑定
用到的术语: clock skew的产生延时与时钟线的长度及被时钟线驱动的时序单元的负载电容.个数有关由于时钟线长度及负载不同,导致时钟信号到达相邻两个时序单元的时间不同于是产生所谓的clock ...
SDWebImage之SDWebImageDownloaderOperation
上篇讲了SDWebImageDownloader,从源码分析的过程中,我们知道,实际执行下载任务的是SDWebImageDownloaderOperation,本篇我们来看看SDWebImageDow ...

A brief introduction to per-cpu variables

A brief introduction to per-cpu variables的更多相关文章

随机推荐

热门专题