A brief introduction to per-cpu variables

墙外通道：http://thinkiii.blogspot.com/2014/05/a-brief-introduction-to-per-cpu.html

per-cpu variables are widely used in Linux kernel such as per-cpu counters, per-cpu cache. The advantages of per-cpu variables are obvious: for a per-cpu data, we do not need locks to synchronize with other cpus. Without locks, we can gain more performance.

There are two kinds of type of per-cpu variables: static and dynamic. For static variables are defined in build time. Linux provides a DEFINE_PER_CPU macro to defines this per-cpu variables.

#define DEFINE_PER_CPU(type, name)

static DEFINE_PER_CPU(struct delayed_work, vmstat_work);

Dynamic per-cpu variables can be obtained in run-time by __alloc_percpu API. __alloca_percpu returns the per-cpu address of the variable.

void __percpu *__alloc_percpu(size_t size, size_t align)

s->cpu_slab = __alloc_percpu(sizeof(struct kmem_cache_cpu), * sizeof(void *));

One big difference between per-cpu variable and other variable is that we must use per-cpu variable macros to access the real per-cpu variable for a given cpu. Accessing per-cpu variables without through these macros is a bug in Linux kernel programming. We will see the reason later.

Here are two examples of accessing per-cpu variables:

struct vm_event_state *this = &per_cpu(vm_event_states, cpu);

struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu);

Let's take a closer look at the behaviour of Linux per-cpu variables. After we define our static per-cpu variables, the complier will collect all static per-cpu variables to the per-cpu sections. We can see them by 'readelf' or 'nm' tools:

 D __per_cpu_start

...

000000000000f1c0 d lru_add_drain_work

000000000000f1e0 D vm_event_states

000000000000f420 d vmstat_work

000000000000f4a0 d vmap_block_queue

000000000000f4c0 d vfree_deferred

000000000000f4f0 d memory_failure_cpu

...

0000000000013ac0 D __per_cpu_end

[] .vvar             PROGBITS         ffffffff81698000

     00000000000000f0    WA

[] .data..percpu     PROGBITS           00a00000

     0000000000013ac0    WA

[] .init.text        PROGBITS         ffffffff816ad000  00aad000

     000000000003fa21    AX

You can see our vmstat_work is at 0xf420, which is within __per_cpu_start and __per_cpu_end. The two special symbols (__per_cpu_start and __per_cpu_end) mark the start and end address of the per-cpu section.

One simple question: there are only one entry of vmstat_work in the per-cpu section, but we should have NR_CPUS entries of it. Where are all other vmstat_work entries?

Actually the per-cpu section is just a roadmap of all per-cpu variables. The real body of every per-cpu variable is allocated in a per-cpu chunk at runt-time. Linux make NR_CPUS copies of static/dynamic varables. To get to those real bodies of per-cpu variables, we use per_cpu or per_cpu_ptr macros.

What per_cpu and per_cpu_ptr do is to add a offset (named __per_cpu_offset) to the given address to reach the read body of the per-cpu variable.

#define per_cpu(var, cpu) \

        (*SHIFT_PERCPU_PTR(&(var), per_cpu_offset(cpu)))

#define per_cpu_offset(x) (__per_cpu_offset[x])

It's easier to understand the idea by a picture:

Translating a per-cpu variable to its real body (NR_CPUS = 4)

Take a closer look:
There are three part of an unit: static, reserved, and dynamic.
static: the static per-cpu variables. (__per_cpu_end - __per_cpu_start)
reserved: per-cpu slot reserved for kernel modules
dynamic: slots for dynamic allocation (__alloc_percpu)

Unit and chunk

static struct pcpu_alloc_info * __init pcpu_build_alloc_info(

                                size_t reserved_size, size_t dyn_size,

                                size_t atom_size,

                                pcpu_fc_cpu_distance_fn_t cpu_distance_fn)

{

        static int group_map[NR_CPUS] __initdata;

        static int group_cnt[NR_CPUS] __initdata;

        const size_t static_size = __per_cpu_end - __per_cpu_start;

+--  lines: int nr_groups = , nr_units = ;----------------------

        /* calculate size_sum and ensure dyn_size is enough for early alloc */

        size_sum = PFN_ALIGN(static_size + reserved_size +

                            max_t(size_t, dyn_size, PERCPU_DYNAMIC_EARLY_SIZE));

        dyn_size = size_sum - static_size - reserved_size;

+-- lines: Determine min_unit_size, alloc_size and max_upa such that--

}

After determining the size of the unit, the chunk is allocated by the memblock APIs.

int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size,

                                  size_t atom_size,

                                  pcpu_fc_cpu_distance_fn_t cpu_distance_fn,

                                  pcpu_fc_alloc_fn_t alloc_fn,

                                  pcpu_fc_free_fn_t free_fn)

{

+--  lines: void *base = (void *)ULONG_MAX;---------------------------------

        /* allocate, copy and determine base address */

        for (group = ; group < ai->nr_groups; group++) {

                struct pcpu_group_info *gi = &ai->groups[group];

                unsigned int cpu = NR_CPUS;

                void *ptr;

                for (i = ; i < gi->nr_units && cpu == NR_CPUS; i++)

                        cpu = gi->cpu_map[i];

                BUG_ON(cpu == NR_CPUS);

                /* allocate space for the whole group */

                ptr = alloc_fn(cpu, gi->nr_units * ai->unit_size, atom_size);

                if (!ptr) {

                        rc = -ENOMEM;

                        goto out_free_areas;

                }

                /* kmemleak tracks the percpu allocations separately */

                kmemleak_free(ptr);

                areas[group] = ptr;

                base = min(ptr, base);

        }

+--  lines: Copy data and free unused parts.  This should happen after all---

}

static void * __init pcpu_dfl_fc_alloc(unsigned int cpu, size_t size,

                                       size_t align)

{

        return  memblock_virt_alloc_from_nopanic(

                        size, align, __pa(MAX_DMA_ADDRESS));

}

A brief introduction to per-cpu variables的更多相关文章

InnoDB Spin rounds per wait在>32位机器上可能为负
今天发现一个系统innodb的spin rounds per wait为负,感觉很奇怪,原来是个bug: For example (output from PS but we have no patc ...
机器学习、NLP、Python和Math最好的150余个教程（建议收藏）
编辑 | MingMing 尽管机器学习的历史可以追溯到1959年,但目前,这个领域正以前所未有的速度发展.最近,我一直在网上寻找关于机器学习和NLP各方面的好资源,为了帮助到和我有相同需求的人,我整 ...
超过 150 个最佳机器学习，NLP 和 Python教程
超过 150 个最佳机器学习,NLP 和 Python教程微信号 & QQ:862251340微信公众号:coderpai简书地址:http://www.jianshu.com/p/2be3 ...
Introduction to Parallel Computing
Copied From:https://computing.llnl.gov/tutorials/parallel_comp/ Author: Blaise Barney, Lawrence Live ...
Linux CPU Hotplug CPU热插拔
http://blog.chinaunix.net/uid-15007890-id-106930.html CPU hotplug Support in Linux(tm) Kernel Linu ...
Sed - An Introduction and Tutorial by Bruce Barnett
http://www.grymoire.com/unix/sed.html Quick Links - NEW Sed Commands : label # comment {....} Block ...
An Introduction to Lock-Free Programming
Lock-free programming is a challenge, not just because of the complexity of the task itself, but bec ...
Android 性能优化（20）多核cpu入门：SMP Primer for Android
SMP Primer for Android 1.In this document Theory Memory consistency models Processor consistency CPU ...
Introduction to Linux Threads
Introduction to Linux Threads A thread of execution is often regarded as the smallest unit of proces ...

随机推荐

SVN忘记登陆用户
C:\Users\Yaolz\AppData\Roaming\Subversion\auth 删除里面所有文件
mybatis电子商务平台b2b2c
技术解决方案开发语言: java.j2ee 数据库:mysql JDK支持版本: JDK1.6.JDK1.7.JDK1.8版本核心技术:分布式.云服务.微服务.服务编排等. 核心架构: 使用Spr ...
谷歌发布了 T2T（Tensor2Tensor）深度学习开源系统
谷歌开源T2T模型库,深度学习系统进入模块化时代! 谷歌大脑颠覆深度学习混乱现状,要用单一模型学会多项任务 https://github.com/tensorflow/models https://g ...
selenium实现淘宝的商品爬取
一.问题本次利用selenium自动化测试,完成对淘宝的爬取,这样可以避免一些反爬的措施,也是一种爬虫常用的手段.本次实战的难点: 1.如何利用selenium绕过淘宝的登录界面 2.获取淘宝的页面 ...
qhfl-6 购物车
购物车中心用户点击价格策略加入购物车,个人中心可以查看自己所有购物车中数据在购物车中可以删除课程,还可以更新购物车中课程的价格策略所以接口应该有四种请求方式, get,post,patch,de ...
第37章：MongoDB-集群--Replica Sets(副本集)---单机的搭建
①创建副本集 1:先创建几个存放数据的文件夹,比如在前面的dbs下面创建db1,db2,db3: 同理在前面的logs下面创建logs1,logs2,logs3 2:在启动MongoDB服务器的时候, ...
【repost】javascript:;与javascript:void(0)使用介绍
有时候我们在编写js过程中,需要触发事件而不需要返回值,那么就可能需要这样的写法最近看了好几个关于<a>标签和javascript:void(0)的帖子,谨记于此,以资查阅. 注:以下代 ...
_ZNote_Objective-C_用终端编译OC程序
某些情况下,仅仅想写一些简单的代码,可以不用Xcode,仅仅使用终端即可编译OC程序. 打开终端. 输入vi test.m 输入一下代码: #import <Foundation/Foundat ...
php中 curl， fsockopen ，file_get_contents 三个函数
赵永斌:有些时候用file_get_contents()调用外部文件,容易超时报错.换成curl后就可以.具体原因不清楚curl 效率比file_get_contents()和fsockopen()高 ...
Visual Studio中xml文件使用app.config、web.config等的智能提示的方法
在.Net开发的过程中,有时我们需要使用Xml文件作为配置文件(基于某些情况的考虑),而不是app.config.web.config这种,但是我们在xml中配置时希望可以增加类似编辑app.conf ...

A brief introduction to per-cpu variables

A brief introduction to per-cpu variables的更多相关文章

随机推荐

热门专题