linux 内核的RCU本质
RCU,Read-Copy Update,是一种同步机制,它的本质就是在同步什么?1. 它只有reader-side lock,并且不产生锁竞争。2. 它同步reader-side 临界区和 reclaim-side 临界区,而不是writer-side临界区。3. rcu reader并发访问由共享指针指向的不同版本的数据副本(copy),而(reader/writer) spinlock同步对同一份数据的所有访问。4. rcu writer-side临界区的同步必须由使用者来完成。5. rcu对数据旧副本的读访问和回收进行同步,保护的是数据旧副本。
rcu的主要思想是将update拆分成removal和reclamation两步,并且延后(defer)(旧副本的)析构回收(reclaim)。
rcu将原本同一份数据互斥的read操作和write操作,转化成read操作和update操作在一份数据不同版本的副本上,依赖的是指针指向的切换。由于数据有不同副本,旧副本必须要回收,所以rcu将update操作拆分成removal操作和reclamation操作。这样一来,数据的每一个副本的write操作从竞态条件分离出来,根据不需要write_lock,包含在update-side的removal操作步骤。而read操作则是并发在不同的副本上,它与update-side的removal操作步骤唯一的竞态条件发生在共享指针的访问上。副本的切换是由update-side的removal操作完成的,它只使用了内存屏障来同步reader-side对共享指针的访问。update-side的reclammation操作完成对数据的旧副本回收,它必须与进行访问旧副本的reader-side临界区同步,但是并不与reader-side lock存在锁竞争,换句话说,reader-side临界区因为锁竞争而阻塞在reader-side lock上。事实也是如此,虽然rcu提供了原语操作rcu_read_lock和rcu_read_unlock,但是在里面的锁计数并没有进行原子操作,并且锁的计数不是对应于一个锁,而对应于一个线程,只用来描述rcu_read_lock在当前线程嵌套的层数。所以虽然不同cpu上的线程都进入了rcu reader-side临界区,但是它们的却各自使用一个锁计数,因此reader-side临界区不会阻塞在rcu_read_lock上。reclaim-side临界区必须同步于访问旧副本reader-side临界区之后,但是并不与访问新副本reader-side临界区同步。这样来看,rcu的同步机制保护的不是同一份数据的访问,而是一份数据的旧副本的访问和回收。
下面是官方设计文档
What is RCU? RCU is a synchronization mechanism that was added to the Linux kernel
during the 2.5 development effort that is optimized for read-mostly
situations. Although RCU is actually quite simple once you understand it,
getting there can sometimes be a challenge. Part of the problem is that
most of the past descriptions of RCU have been written with the mistaken
assumption that there is "one true way" to describe RCU. Instead,
the experience has been that different people must take different paths
to arrive at an understanding of RCU. This document provides several
different paths, as follows: . RCU OVERVIEW
. WHAT IS RCU'S CORE API?
. WHAT ARE SOME EXAMPLE USES OF CORE RCU API?
. WHAT IF MY UPDATING THREAD CANNOT BLOCK?
. WHAT ARE SOME SIMPLE IMPLEMENTATIONS OF RCU?
. ANALOGY WITH READER-WRITER LOCKING
. FULL LIST OF RCU APIs
. ANSWERS TO QUICK QUIZZES People who prefer starting with a conceptual overview should focus on
Section , though most readers will profit by reading this section at
some point. People who prefer to start with an API that they can then
experiment with should focus on Section . People who prefer to start
with example uses should focus on Sections and . People who need to
understand the RCU implementation should focus on Section , then dive
into the kernel source code. People who reason best by analogy should
focus on Section . Section serves as an index to the docbook API
documentation, and Section is the traditional answer key. So, start with the section that makes the most sense to you and your
preferred method of learning. If you need to know everything about
everything, feel free to read the whole thing -- but if you are really
that type of person, you have perused the source code and will therefore
never need this document anyway. ;-) . RCU OVERVIEW The basic idea behind RCU is to split updates into "removal" and
"reclamation" phases. The removal phase removes references to data items
within a data structure (possibly by replacing them with references to
new versions of these data items), and can run concurrently with readers.
The reason that it is safe to run the removal phase concurrently with
readers is the semantics of modern CPUs guarantee that readers will see
either the old or the new version of the data structure rather than a
partially updated reference. The reclamation phase does the work of reclaiming
(e.g., freeing) the data items removed from the data structure during the
removal phase. Because reclaiming data items can disrupt any readers
concurrently referencing those data items, the reclamation phase must
not start until readers no longer hold references to those data items. Splitting the update into removal and reclamation phases permits the
updater to perform the removal phase immediately, and to defer the
reclamation phase until all readers active during the removal phase have
completed, either by blocking until they finish or by registering a
callback that is invoked after they finish. Only readers that are active
during the removal phase need be considered, because any reader starting
after the removal phase will be unable to gain a reference to the removed
data items, and therefore cannot be disrupted by the reclamation phase. So the typical RCU update sequence goes something like the following: a. Remove pointers to a data structure, so that subsequent
readers cannot gain a reference to it. b. Wait for all previous readers to complete their RCU read-side
critical sections. c. At this point, there cannot be any readers who hold references
to the data structure, so it now may safely be reclaimed
(e.g., kfree()d). Step (b) above is the key idea underlying RCU's deferred destruction.
The ability to wait until all readers are done allows RCU readers to
use much lighter-weight synchronization, in some cases, absolutely no
synchronization at all. In contrast, in more conventional lock-based
schemes, readers must use heavy-weight synchronization in order to
prevent an updater from deleting the data structure out from under them.
This is because lock-based updaters typically update data items in place,
and must therefore exclude readers. In contrast, RCU-based updaters
typically take advantage of the fact that writes to single aligned
pointers are atomic on modern CPUs, allowing atomic insertion, removal,
and replacement of data items in a linked structure without disrupting
readers. Concurrent RCU readers can then continue accessing the old
versions, and can dispense with the atomic operations, memory barriers,
and communications cache misses that are so expensive on present-day
SMP computer systems, even in absence of lock contention. In the three-step procedure shown above, the updater is performing both
the removal and the reclamation step, but it is often helpful for an
entirely different thread to do the reclamation, as is in fact the case
in the Linux kernel's directory-entry cache (dcache). Even if the same
thread performs both the update step (step (a) above) and the reclamation
step (step (c) above), it is often helpful to think of them separately.
For example, RCU readers and updaters need not communicate at all,
but RCU provides implicit low-overhead communication between readers
and reclaimers, namely, in step (b) above. So how the heck can a reclaimer tell when a reader is done, given
that readers are not doing any sort of synchronization operations???
Read on to learn about how RCU's API makes this easy.
whatisRCU
RCU的思想是将updates分离成removal(迁移)和reclamation(回收)两个动作。
removal:
将数据结构的成员用新版本替换旧版本的引用。可以与readers安全地并发。
原因是现在cpu保证reader能够可见一个数据结构的新旧两个版本。
reclamation:
回收在removal动作过程中移除的数据项。由于回收动作会破坏任何并发的readers在那些要回收的数据项上的引用,
所以reclamation一定不能开始直到没有readers保留那些数据项。
updates分离的好处是,允许removal(迁移)动作可以立即执行,而延后reclamation(回收)动作起到readers在removal期间的所有活动
完成。延后的reclamation要么同步阻塞等待,要么注册异步回调。
只要关心进行removal阶段的readers活动,因为在removal阶段之后的readers开始的活动是不可能得到引用到被移除的数据项,
就不会受到reclamation的破坏。
所以典型的RCU update顺序如下三步:
1. 移去一个数据结构的指针(s),以使后到的readers不能得到它的引用。
2. 等待所有已经进行的readers完成RCU读侧(端)临界区。
3. 在某一时刻,没有任何readers保留数据结构的(旧)引用,此时就可以安全地reclaim。
第2步是RCU延后析构的主要底层思想。阻塞等待所有readers完成允许RCU readers使用十分轻型的同步,
某些情况下,完全无同步。但是,在更传统的基于锁的情况,readers必须使用重型同步来避免一个updater删除数据结构,它们正使用。
因为基于锁的updaters典型地,必须排他readers才能正确update数据项。但是,基于RCU的updaters典型地有这样的事实优点,
对单一对齐的指针的写是原子的, 在现代cpu,允许对一链表结构的数据项原子地插入,移除,以及替换,而不破坏readers。
并发RCU readers能够继续访问旧版本(数据项),还能够不得不原子操作,内存屏障,和通讯快存不命中(现代SMP计算机系统昂贵的开销),甚至在锁竞争。
第3步,updater将执行迁移和回收,但让一个完全不同的线程去执行回收却十分有帮助,正如事实上内核的dcache例子。
甚至于在同一线程上执行第1步和第3步,将它们分开来思考仍然十分有帮助。
例如,RCU readers和updaters完全不需要进行通讯,但RCU提供隐式的低成本的通讯,带名称的,在第2步。
在源代码rcu/rcupdate.h对rcu_read_lock有这样的注释说明:
/**
* rcu_read_lock() - mark the beginning of an RCU read-side critical section
*
* When synchronize_rcu() is invoked on one CPU while other CPUs
* are within RCU read-side critical sections, then the
* synchronize_rcu() is guaranteed to block until after all the other
* CPUs exit their critical sections. Similarly, if call_rcu() is invoked
* on one CPU while other CPUs are within RCU read-side critical
* sections, invocation of the corresponding RCU callback is deferred
* until after the all the other CPUs exit their critical sections.
*
* Note, however, that RCU callbacks are permitted to run concurrently
* with new RCU read-side critical sections. One way that this can happen
* is via the following sequence of events: (1) CPU 0 enters an RCU
* read-side critical section, (2) CPU 1 invokes call_rcu() to register
* an RCU callback, (3) CPU 0 exits the RCU read-side critical section,
* (4) CPU 2 enters a RCU read-side critical section, (5) the RCU
* callback is invoked. This is legal, because the RCU read-side critical
* section that was running concurrently with the call_rcu() (and which
* therefore might be referencing something that the corresponding RCU
* callback would free up) has completed before the corresponding
* RCU callback is invoked.
*
* RCU read-side critical sections may be nested. Any deferred actions
* will be deferred until the outermost RCU read-side critical section
* completes.
*
* You can avoid reading and understanding the next paragraph by
* following this rule: don't put anything in an rcu_read_lock() RCU
* read-side critical section that would block in a !PREEMPT kernel.
* But if you want the full story, read on!
*
* In non-preemptible RCU implementations (TREE_RCU and TINY_RCU),
* it is illegal to block while in an RCU read-side critical section.
* In preemptible RCU implementations (PREEMPT_RCU) in CONFIG_PREEMPT
* kernel builds, RCU read-side critical sections may be preempted,
* but explicit blocking is illegal. Finally, in preemptible RCU
* implementations in real-time (with -rt patchset) kernel builds, RCU
* read-side critical sections may be preempted and they may also block, but
* only when acquiring spinlocks that are subject to priority inheritance.
*/
下面一段注释说明根本不需要write_lock,与read_lock进行锁竞争,但是写与写操作之间必须由rcu的使用都来完成它们的同步。
/*
* So where is rcu_write_lock()? It does not exist, as there is no
* way for writers to lock out RCU readers. This is a feature, not
* a bug -- this property is what provides RCU's performance benefits.
* Of course, writers must coordinate with each other. The normal
* spinlock primitives work well for this, but any other technique may be
* used as well. RCU does not care how the writers keep out of each
* others' way, as long as they do so.
*/
linux 内核的RCU本质的更多相关文章
- Linux内核同步 - RCU synchronize原理分析
RCU(Read-Copy Update)是Linux内核比较成熟的新型读写锁,具有较高的读写并发性能,常常用在需要互斥的性能关键路径.在kernel中,rcu有tiny rcu和tree rcu两种 ...
- Linux内核同步 - RCU基础
一.前言 关于RCU的文档包括两份,一份讲基本的原理(也就是本文了),一份讲linux kernel中的实现.第二章描述了为何有RCU这种同步机制,特别是在cpu core数目不断递增的今天,一个性能 ...
- Linux内核同步 - classic RCU的实现
一.前言 无论你愿意或者不愿意,linux kernel的版本总是不断的向前推进,做为一个热衷于专研内核的工程师,最大的痛苦莫过于此:当你熟悉了一个版本的内核之后,内核已经推进到一个新的版本,你曾经熟 ...
- Linux内核中锁机制之RCU、大内核锁
在上篇博文中笔者分析了关于完成量和互斥量的使用以及一些经典的问题,下面笔者将在本篇博文中重点分析有关RCU机制的相关内容以及介绍目前已被淘汰出内核的大内核锁(BKL).文章的最后对<大话Linu ...
- 大话Linux内核中锁机制之RCU、大内核锁
大话Linux内核中锁机制之RCU.大内核锁 在上篇博文中笔者分析了关于完成量和互斥量的使用以及一些经典的问题,下面笔者将在本篇博文中重点分析有关RCU机制的相关内容以及介绍目前已被淘汰出内核的大内核 ...
- linux 内核 RCU机制详解
RCU(Read-Copy Update)是数据同步的一种方式,在当前的Linux内核中发挥着重要的作用.RCU主要针对的数据对象是链表,目的是提高遍历读取数据的效率,为了达到目的使用RCU机制读取数 ...
- Linux内核分析(六)----字符设备控制方法实现|揭秘系统调用本质
原文:Linux内核分析(六)----字符设备控制方法实现|揭秘系统调用本质 Linux内核分析(六) 昨天我们对字符设备进行了初步的了解,并且实现了简单的字符设备驱动,今天我们继续对字符设备的某些方 ...
- 再谈Linux内核中的RCU机制
转自:http://blog.chinaunix.net/uid-23769728-id-3080134.html RCU的设计思想比较明确,通过新老指针替换的方式来实现免锁方式的共享保护.但是具体到 ...
- Linux内核同步:RCU
linux内核 RCU机制详解 简介 RCU(Read-Copy Update)是数据同步的一种方式,在当前的Linux内核中发挥着重要的作用.RCU主要针对的数据对象是链表,目的是提高遍历读取数据的 ...
随机推荐
- 浅谈线段树 Segment Tree
众所周知,线段树是algo中很重要的一项! 一.简介 线段树是一种二叉搜索树,与区间树相似,它将一个区间划分成一些单元区间,每个单元区间对应线段树中的一个叶结点. 使用线段树可以快速的查找某一个节点在 ...
- 使用 Hexo,Material Theme 以及 Github Pages 搭建个人博客
准备条件 Node.js npm Git GitHub账号 开始搭建 hexo init Blog cd Blog npm install hexo-deployer-git --save npm i ...
- 用Python将处理数据得到的csv文件分类(按顺序)保存
用Python中的os和numpy库对文件夹及处理数据后得到的文件进行分类保存: import numpy as np import os for m in range(699,0,-35): cur ...
- 别说家庭组开不了,一定成功的方法|win7家庭组无法开启|win7如何共享打印机
两台WIN7要开启家庭组并且共享打印机 这两台电脑必须打开以下服务:dnscache(任务管理器 简写).fdrespub(任务管理器 简写).(接下来下了都是服务里面的)SSDP Discovery ...
- (五)Unity插件生成
1)新建空的AndroidStudio工程,但是新建过程时最小SDK版本要与unity一致,如下图所示,本次操作均为api16 2)创建Library,如下图所示,新建module,然后选择Andro ...
- MIT线性代数:7.主变量,特解,求解AX=0
- 【建站02】WordPress主题设置
大家好,我是帝哥.相信很多朋友看了我上一篇文章的介绍之后已经可以搭建自己的个人网站了,但是网站的功能和美观程度都还是有所欠缺的,现在呢,再给大家大概的介绍一些如何美化自己的网站,当然了,这个过程也是很 ...
- AtCoder Grand Contest 038E - Gachapon
\(\bf Description\) 一个 \(0\) 到 \(n-1\) 的随机数生成器,生成 \(i\) 的概率是 \(A_i/S\) ,其中 \(S=\sum_{i=0}^{n} A_i\) ...
- noip模拟9 达哥随单题
T1.随 看题第一眼,就瞄到最下面 孙金宁教你学数学 ?????原根?目测神题,果断跳过. 最后打了个快速幂,愉快的收到了达哥送来的10分. 实际上这题暴力不难想,看到一个非常小的mod应该就能想到 ...
- Python基本数据结构之集合
一道python面试的一个小问题,说怎么使用一行代码将一个列表里的重复元素,其实这里只要将列表转换成集合就可以了. 定义 集合跟我们学的列表有点像,也是可以存一堆数据,不过它有几个独特的特点,令其在整 ...