linux 内核的RCU本质

RCU，Read-Copy Update，是一种同步机制，它的本质就是在同步什么？1. 它只有reader-side lock，并且不产生锁竞争。2. 它同步reader-side 临界区和 reclaim-side 临界区，而不是writer-side临界区。3. rcu reader并发访问由共享指针指向的不同版本的数据副本（copy），而(reader/writer) spinlock同步对同一份数据的所有访问。4. rcu writer-side临界区的同步必须由使用者来完成。5. rcu对数据旧副本的读访问和回收进行同步，保护的是数据旧副本。

rcu的主要思想是将update拆分成removal和reclamation两步，并且延后(defer)(旧副本的)析构回收(reclaim)。

rcu将原本同一份数据互斥的read操作和write操作，转化成read操作和update操作在一份数据不同版本的副本上，依赖的是指针指向的切换。由于数据有不同副本，旧副本必须要回收，所以rcu将update操作拆分成removal操作和reclamation操作。这样一来，数据的每一个副本的write操作从竞态条件分离出来，根据不需要write_lock，包含在update-side的removal操作步骤。而read操作则是并发在不同的副本上，它与update-side的removal操作步骤唯一的竞态条件发生在共享指针的访问上。副本的切换是由update-side的removal操作完成的，它只使用了内存屏障来同步reader-side对共享指针的访问。update-side的reclammation操作完成对数据的旧副本回收，它必须与进行访问旧副本的reader-side临界区同步，但是并不与reader-side lock存在锁竞争，换句话说，reader-side临界区因为锁竞争而阻塞在reader-side lock上。事实也是如此，虽然rcu提供了原语操作rcu_read_lock和rcu_read_unlock，但是在里面的锁计数并没有进行原子操作，并且锁的计数不是对应于一个锁，而对应于一个线程，只用来描述rcu_read_lock在当前线程嵌套的层数。所以虽然不同cpu上的线程都进入了rcu reader-side临界区，但是它们的却各自使用一个锁计数，因此reader-side临界区不会阻塞在rcu_read_lock上。reclaim-side临界区必须同步于访问旧副本reader-side临界区之后，但是并不与访问新副本reader-side临界区同步。这样来看，rcu的同步机制保护的不是同一份数据的访问，而是一份数据的旧副本的访问和回收。

下面是官方设计文档

What is RCU?

RCU is a synchronization mechanism that was added to the Linux kernel

during the 2.5 development effort that is optimized for read-mostly

situations.  Although RCU is actually quite simple once you understand it,

getting there can sometimes be a challenge.  Part of the problem is that

most of the past descriptions of RCU have been written with the mistaken

assumption that there is "one true way" to describe RCU.  Instead,

the experience has been that different people must take different paths

to arrive at an understanding of RCU.  This document provides several

different paths, as follows:

.    RCU OVERVIEW

.    WHAT IS RCU'S CORE API?

.    WHAT ARE SOME EXAMPLE USES OF CORE RCU API?

.    WHAT IF MY UPDATING THREAD CANNOT BLOCK?

.    WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?

.    ANALOGY WITH READER-WRITER LOCKING

.    FULL LIST OF RCU APIs

.    ANSWERS TO QUICK QUIZZES

People who prefer starting with a conceptual overview should focus on

Section , though most readers will profit by reading this section at

some point.  People who prefer to start with an API that they can then

experiment with should focus on Section .  People who prefer to start

with example uses should focus on Sections  and .  People who need to

understand the RCU implementation should focus on Section , then dive

into the kernel source code.  People who reason best by analogy should

focus on Section .  Section  serves as an index to the docbook API

documentation, and Section  is the traditional answer key.

So, start with the section that makes the most sense to you and your

preferred method of learning.  If you need to know everything about

everything, feel free to read the whole thing -- but if you are really

that type of person, you have perused the source code and will therefore

never need this document anyway.  ;-)

.  RCU OVERVIEW

The basic idea behind RCU is to split updates into "removal" and

"reclamation" phases.  The removal phase removes references to data items

within a data structure (possibly by replacing them with references to

new versions of these data items), and can run concurrently with readers.

The reason that it is safe to run the removal phase concurrently with

readers is the semantics of modern CPUs guarantee that readers will see

either the old or the new version of the data structure rather than a

partially updated reference.  The reclamation phase does the work of reclaiming

(e.g., freeing) the data items removed from the data structure during the

removal phase.  Because reclaiming data items can disrupt any readers

concurrently referencing those data items, the reclamation phase must

not start until readers no longer hold references to those data items.

Splitting the update into removal and reclamation phases permits the

updater to perform the removal phase immediately, and to defer the

reclamation phase until all readers active during the removal phase have

completed, either by blocking until they finish or by registering a

callback that is invoked after they finish.  Only readers that are active

during the removal phase need be considered, because any reader starting

after the removal phase will be unable to gain a reference to the removed

data items, and therefore cannot be disrupted by the reclamation phase.

So the typical RCU update sequence goes something like the following:

a.    Remove pointers to a data structure, so that subsequent

    readers cannot gain a reference to it.

b.    Wait for all previous readers to complete their RCU read-side

    critical sections.

c.    At this point, there cannot be any readers who hold references

    to the data structure, so it now may safely be reclaimed

    (e.g., kfree()d).

Step (b) above is the key idea underlying RCU's deferred destruction.

The ability to wait until all readers are done allows RCU readers to

use much lighter-weight synchronization, in some cases, absolutely no

synchronization at all.  In contrast, in more conventional lock-based

schemes, readers must use heavy-weight synchronization in order to

prevent an updater from deleting the data structure out from under them.

This is because lock-based updaters typically update data items in place,

and must therefore exclude readers.  In contrast, RCU-based updaters

typically take advantage of the fact that writes to single aligned

pointers are atomic on modern CPUs, allowing atomic insertion, removal,

and replacement of data items in a linked structure without disrupting

readers.  Concurrent RCU readers can then continue accessing the old

versions, and can dispense with the atomic operations, memory barriers,

and communications cache misses that are so expensive on present-day

SMP computer systems, even in absence of lock contention.

In the three-step procedure shown above, the updater is performing both

the removal and the reclamation step, but it is often helpful for an

entirely different thread to do the reclamation, as is in fact the case

in the Linux kernel's directory-entry cache (dcache).  Even if the same

thread performs both the update step (step (a) above) and the reclamation

step (step (c) above), it is often helpful to think of them separately.

For example, RCU readers and updaters need not communicate at all,

but RCU provides implicit low-overhead communication between readers

and reclaimers, namely, in step (b) above.

So how the heck can a reclaimer tell when a reader is done, given

that readers are not doing any sort of synchronization operations???

Read on to learn about how RCU's API makes this easy.

whatisRCU

RCU的思想是将updates分离成removal（迁移）和reclamation（回收）两个动作。
removal：
将数据结构的成员用新版本替换旧版本的引用。可以与readers安全地并发。
原因是现在cpu保证reader能够可见一个数据结构的新旧两个版本。
reclamation：
回收在removal动作过程中移除的数据项。由于回收动作会破坏任何并发的readers在那些要回收的数据项上的引用，
所以reclamation一定不能开始直到没有readers保留那些数据项。

updates分离的好处是，允许removal（迁移）动作可以立即执行，而延后reclamation（回收）动作起到readers在removal期间的所有活动
完成。延后的reclamation要么同步阻塞等待，要么注册异步回调。
只要关心进行removal阶段的readers活动，因为在removal阶段之后的readers开始的活动是不可能得到引用到被移除的数据项，
就不会受到reclamation的破坏。

所以典型的RCU update顺序如下三步：
1. 移去一个数据结构的指针(s)，以使后到的readers不能得到它的引用。
2. 等待所有已经进行的readers完成RCU读侧（端）临界区。
3. 在某一时刻，没有任何readers保留数据结构的（旧）引用，此时就可以安全地reclaim。

第2步是RCU延后析构的主要底层思想。阻塞等待所有readers完成允许RCU readers使用十分轻型的同步，
某些情况下，完全无同步。但是，在更传统的基于锁的情况，readers必须使用重型同步来避免一个updater删除数据结构，它们正使用。
因为基于锁的updaters典型地，必须排他readers才能正确update数据项。但是，基于RCU的updaters典型地有这样的事实优点，
对单一对齐的指针的写是原子的，在现代cpu，允许对一链表结构的数据项原子地插入，移除，以及替换，而不破坏readers。
并发RCU readers能够继续访问旧版本（数据项），还能够不得不原子操作，内存屏障，和通讯快存不命中（现代SMP计算机系统昂贵的开销），甚至在锁竞争。

第3步，updater将执行迁移和回收，但让一个完全不同的线程去执行回收却十分有帮助，正如事实上内核的dcache例子。
甚至于在同一线程上执行第1步和第3步，将它们分开来思考仍然十分有帮助。
例如，RCU readers和updaters完全不需要进行通讯，但RCU提供隐式的低成本的通讯，带名称的，在第2步。

在源代码rcu/rcupdate.h对rcu_read_lock有这样的注释说明：

/**

 * rcu_read_lock() - mark the beginning of an RCU read-side critical section

 *

 * When synchronize_rcu() is invoked on one CPU while other CPUs

 * are within RCU read-side critical sections, then the

 * synchronize_rcu() is guaranteed to block until after all the other

 * CPUs exit their critical sections.  Similarly, if call_rcu() is invoked

 * on one CPU while other CPUs are within RCU read-side critical

 * sections, invocation of the corresponding RCU callback is deferred

 * until after the all the other CPUs exit their critical sections.

 *

 * Note, however, that RCU callbacks are permitted to run concurrently

 * with new RCU read-side critical sections.  One way that this can happen

 * is via the following sequence of events: (1) CPU 0 enters an RCU

 * read-side critical section, (2) CPU 1 invokes call_rcu() to register

 * an RCU callback, (3) CPU 0 exits the RCU read-side critical section,

 * (4) CPU 2 enters a RCU read-side critical section, (5) the RCU

 * callback is invoked.  This is legal, because the RCU read-side critical

 * section that was running concurrently with the call_rcu() (and which

 * therefore might be referencing something that the corresponding RCU

 * callback would free up) has completed before the corresponding

 * RCU callback is invoked.

 *

 * RCU read-side critical sections may be nested.  Any deferred actions

 * will be deferred until the outermost RCU read-side critical section

 * completes.

 *

 * You can avoid reading and understanding the next paragraph by

 * following this rule: don't put anything in an rcu_read_lock() RCU

 * read-side critical section that would block in a !PREEMPT kernel.

 * But if you want the full story, read on!

 *

 * In non-preemptible RCU implementations (TREE_RCU and TINY_RCU),

 * it is illegal to block while in an RCU read-side critical section.

 * In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPT

 * kernel builds, RCU read-side critical sections may be preempted,

 * but explicit blocking is illegal.  Finally, in preemptible RCU

 * implementations in real-time (with -rt patchset) kernel builds, RCU

 * read-side critical sections may be preempted and they may also block, but

 * only when acquiring spinlocks that are subject to priority inheritance.

 */

下面一段注释说明根本不需要write_lock，与read_lock进行锁竞争，但是写与写操作之间必须由rcu的使用都来完成它们的同步。

/*

 * So where is rcu_write_lock()?  It does not exist, as there is no

 * way for writers to lock out RCU readers.  This is a feature, not

 * a bug -- this property is what provides RCU's performance benefits.

 * Of course, writers must coordinate with each other.  The normal

 * spinlock primitives work well for this, but any other technique may be

 * used as well.  RCU does not care how the writers keep out of each

 * others' way, as long as they do so.

 */

linux 内核的RCU本质的更多相关文章

Linux内核同步 - RCU synchronize原理分析
RCU(Read-Copy Update)是Linux内核比较成熟的新型读写锁,具有较高的读写并发性能,常常用在需要互斥的性能关键路径.在kernel中,rcu有tiny rcu和tree rcu两种 ...
Linux内核同步 - RCU基础
一.前言关于RCU的文档包括两份,一份讲基本的原理(也就是本文了),一份讲linux kernel中的实现.第二章描述了为何有RCU这种同步机制,特别是在cpu core数目不断递增的今天,一个性能 ...
Linux内核同步 - classic RCU的实现
一.前言无论你愿意或者不愿意,linux kernel的版本总是不断的向前推进,做为一个热衷于专研内核的工程师,最大的痛苦莫过于此:当你熟悉了一个版本的内核之后,内核已经推进到一个新的版本,你曾经熟 ...
Linux内核中锁机制之RCU、大内核锁
在上篇博文中笔者分析了关于完成量和互斥量的使用以及一些经典的问题,下面笔者将在本篇博文中重点分析有关RCU机制的相关内容以及介绍目前已被淘汰出内核的大内核锁(BKL).文章的最后对<大话Linu ...
大话Linux内核中锁机制之RCU、大内核锁
大话Linux内核中锁机制之RCU.大内核锁在上篇博文中笔者分析了关于完成量和互斥量的使用以及一些经典的问题,下面笔者将在本篇博文中重点分析有关RCU机制的相关内容以及介绍目前已被淘汰出内核的大内核 ...
linux 内核 RCU机制详解
RCU(Read-Copy Update)是数据同步的一种方式,在当前的Linux内核中发挥着重要的作用.RCU主要针对的数据对象是链表,目的是提高遍历读取数据的效率,为了达到目的使用RCU机制读取数 ...
Linux内核分析（六）----字符设备控制方法实现|揭秘系统调用本质
原文:Linux内核分析(六)----字符设备控制方法实现|揭秘系统调用本质 Linux内核分析(六) 昨天我们对字符设备进行了初步的了解,并且实现了简单的字符设备驱动,今天我们继续对字符设备的某些方 ...
再谈Linux内核中的RCU机制
转自:http://blog.chinaunix.net/uid-23769728-id-3080134.html RCU的设计思想比较明确,通过新老指针替换的方式来实现免锁方式的共享保护.但是具体到 ...
Linux内核同步：RCU
linux内核 RCU机制详解简介 RCU(Read-Copy Update)是数据同步的一种方式,在当前的Linux内核中发挥着重要的作用.RCU主要针对的数据对象是链表,目的是提高遍历读取数据的 ...

随机推荐

1.7.3.1版本ride乱码的解决方法
现象: 解决方式: 修改文件\Python36\Lib\site-packages\robotide\contrib\testrunner\testrunner.py 将latin1修改为mbcs 然 ...
ES6基本语法入门
一.用let代替var声明变量 ES5中,我们可以在代码中任意位置声明变量,甚至可以重写已经声明的变量,ES6引入了一个let关键字,它是新的var. let language = 'javascri ...
FormData交互以及Node multiparty插件的使用
一.FormData FormData是ajax2.0里面添加的新特性. FormData的主要用途有两个: (1).将form表单元素的name与value进行组合,实现表单数据的序列化,从而减少表 ...
小白学 Python（12）：基础数据结构（字典）（上）
人生苦短,我选Python 前文传送门小白学 Python(1):开篇小白学 Python(2):基础数据类型(上) 小白学 Python(3):基础数据类型(下) 小白学 Python(4):变 ...
Java基础（二十）集合（2）Collection接口
1.Collection接口通常不被直接使用.但是Collection接口定义了一些通用的方法,通过这些方法可以实现对集合的基本操作,因为List接口和Set接口都实现了Collection接口,所以 ...
leetcode系列---atoiFunction C#code
Function: /// <summary> /// ToInt /// </summary> /// <param name="str">& ...
Mysql用户管理及权限分配
早上到公司,在服务器上Mysql的数据库里新建了个database,然后本地的系统里用原来连接Mysql账号admin连这个数据库.结果报错了,大概是这样子的: Access denied for u ...
记 Maven 本地仓库埋坑之依赖包为何不能用
记一次 Maven 本地仓库埋坑之 Verifying Availability 背景某 Java 后端项目使用 maven 构建,因为某些原因,某些依赖库下载不了,直接找其它人索要了他电脑上的 m ...
MongoDB的基础命令
MongoDB的介绍 MongoDB: 是一个基于bson(二进制json)的NoSQL数据库 MongoDB的三要素: 数据库: 类似于MYSQL的数据库集合: 类似于MYSQL的表文档: 类似 ...
RabbitMQ-交换机模式
在说正题之前先解释一下交换机模式是个笼统的称呼,它不是一个单独的模式(包括了订阅模式,路由模式和主题模式),交换机模式是一个比较常用的模式,主要是为了实现数据的同步. 首先,说一下订阅模式,就和字面上 ...

linux 内核的RCU本质

linux 内核的RCU本质的更多相关文章

随机推荐

热门专题