http://paulmck.livejournal.com/7314.html

RCU的作者,paul在他的blog中有提到这个问题,也明确提到需要在module exit的地方使用rcu_barrier来等待保证call_rcu的回调函数callback能够执行完成,然后再正式卸载模块,方式快速卸载之后call_back回调发现空指针的问题,从而导致kernel panic的问题。

RCU and unloadable modules

  • Jun. 8th, 2009 at 1:38 PM

The rcu_barrier() function was described some time back in an article on Linux Weekly News. This rcu_barrier() function solves the problem where a given module invokes call_rcu() using a function in that module, but the module is removed before the corresponding grace period elapses, or at least before the callback can be invoked. This results in an attempt to call a function whose code has been removed from the Linux kernel. Oops!!!

Since the above article was written, rcu_barrier_bh() and rcu_barrier_sched() have been accepted into the Linux kernel, for use with call_rcu_bh() and call_rcu_sched(), respectively. These functions have seen relatively little use, which is no surprise, given that they are quite specialized. However, Jesper Dangaard recently discovered that they need to be used a bit more heavily. This lead to the question of exactly when they needed to be used, to which I responded as follows:

Unless there is some other mechanism to ensure that all the RCU callbacks have been invoked before the module exit, there needs to be code in the module-exit function that does the following:

  1. Prevents any new RCU callbacks from being posted. In other words, make sure that no future call_rcu()invocations happen from this module unless those call_rcu() invocations touch only functions and data that outlive this module.
  2. Invokes rcu_barrier().
  3. Of course, if the module uses call_rcu_sched() instead of call_rcu(), then it should invoke rcu_barrier_sched() instead of rcu_barrier(). Similarly, if it uses call_rcu_bh() instead of call_rcu(), then it should invoke rcu_barrier_bh() instead of rcu_barrier(). If the module uses more than one of call_rcu()call_rcu_sched(), and call_rcu_bh(), then it must invoke more than one of rcu_barrier()rcu_barrier_sched(), and rcu_barrier_bh().

What other mechanism could be used? I cannot think of one that it safe. For example, a module that tried to count the number of RCU callbacks in flight would be vulnerable to races as follows:

  1. CPU 0: RCU callback decrements the counter.
  2. CPU 1: module-exit function notices that the counter is zero, so removes the module.
  3. CPU 0: attempts to execute the code returning from the RCU callback, and dies horribly due to that code no longer being in memory.

If there was an easy solution (or even a hard solution) to this problem, then I do not believe that Nikita Danilov would have asked Dipankar Sarma for rcu_barrier(). Therefore, I do not expect anyone to be able to come up with an alternative to rcu_barrier() and friends. Always happy to learn something by being proven wrong, of course!!!

So unless someone can show me some other safe mechanism, every unloadable module that uses call_rcu()call_rcu_sched(), or call_rcu_bh() must use rcu_barrier()rcu_barrier_sched(), and/or rcu_barrier_bh() in its module-exit function.

So if you have a module that uses one of the call_rcu() functions, please use the corresponding rcu_barrier()function in the module-exit code!

Update: Peter Zijlstra rightly points out that the issue is not whether your module invokes call_rcu(), but rather whether the corresponding RCU callback invokes a function that is in a module. So, if there is a call_rcu()call_rcu_sched(), or call_rcu_bh() anywhere in the kernel whose RCU callback either directly or indirectly invokes a function in your module, then your module's exit function needs to invoke rcu_barrier()rcu_barrier_sched(), and/or rcu_barrier_bh(). Thanks to Peter for pointing this out!

关于call_rcu在内核模块退出时可能引起kernel panic的问题的更多相关文章

  1. Android退出时关闭所有Activity的方法

    Android退出时,有的Activity可能没有被关闭.为了在Android退出时关闭所有的Activity,设计了以下的类: //关闭Activity的类 public class CloseAc ...

  2. Qt 程序退出时断言错误——_BLOCK_TYPE_IS_VALID(pHead->nBlockUse),由setAttribute(Qt::WA_DeleteOnClose)引起

    最近在学习QT,自己仿写了一个简单的QT绘图程序,但是在退出时总是报错,断言错误: 报错主要问题在_BLOCK_TYPE_IS_VALID(pHead->nBlockUse),是在关闭窗口时报的 ...

  3. Android设置Activity启动和退出时的动画

    业务开发时遇到的一个小特技,要求实现Activity启动时自下向上弹出,退出时自上向下退出. 此处不关注启动和退出时其他Activity的动画效果,实现方法有两种: 1.代码方式,通过Activity ...

  4. HP平台由于变量声明冲突导致程序退出时的core

    最近遇到一个莫名的问题,在HP-UX B.11.31 U ia64平台下,程序PetriService在接收到产品化退出或Ctrl-C时,程序在main函数返回后析构全局的CTQueue<SMs ...

  5. Android 编程下 Activity 的创建和应用退出时的销毁

    为了确保对应用中 Activity 的创建和销毁状态进行控制,所以就需要一个全局的变量来记录和销毁这些 Activity.这里的大概思路是写一个类继承 Application,并使获取该 Applic ...

  6. 解决log4cxx退出时的异常

    解决log4cxx退出时的异常(金庆的专栏)如果使用log4cxx的FileWatchdog线程来监视日志配置文件进行动态配置,就可能碰到程序退出时产生的异常.程序退出时清理工作耗时很长时,该异常很容 ...

  7. 神奇的bug,退出时自动更新时间

    遇到一个神奇的bug,用户退出时,上次登录时间会变成退出时的时间. 于是开始跟踪,发现Laravel在退出时,会做一次脏检查,这时会更新rember_token,这时就会有update操作如下. 而粗 ...

  8. java实现创建临时文件然后在程序退出时自动删除文件(转)

    这篇文章主要介绍了java实现创建临时文件然后在程序退出时自动删除文件,从个人项目中提取出来的,小伙伴们可以直接拿走使用. 通过java的File类创建临时文件,然后在程序退出时自动删除临时文件.下面 ...

  9. os.waitpid()无法获取sys.exit()退出时的status code

    [目的] 父进程使用os.waitpid()等待子进程退出,并检测子进程的exit code,以决定是否重启子进程. (常见的应用场景是:子进程接收外部命令,收到"stop"时退出 ...

随机推荐

  1. python3 得到a.txt中有的而b.txt中没有的汉字

    已知两个文本文档,求a.txt中有的而b.txt中没有的汉字 #读取list1中的汉字 f1=open('/Users/tanchao/Documents/pythonwork/tensorflow/ ...

  2. tensorflow 训练cifar10报错

    1.AttributeError: 'module' object has noattribute 'random_crop' 解决方案: 将distorted_image= tf.image.ran ...

  3. 手机端head部分

    <!doctype html> <html lang="zh"> <head> <meta charset="utf-8&quo ...

  4. Html----开头

     Html开头 *<meta http-equiv='content-type' content='text/html;charset=utf-8'>*定义字符编码,这是必须有的 后另存为 ...

  5. CentOS 6.3开机启动服务及自动联网

    虚拟机设置选择NAT模式,默认情况下,CentOS不是自动连接上网的,要点击右上角有个电脑图标,选择system eth0进行连接, 可以修改开机启动配置只需修改:/etc/sysconfig/net ...

  6. app.route()

    [app.route()] 可使用 app.route() 创建路由路径的链式路由句柄.由于路径在一个地方指定,这样做有助于创建模块化的路由,而且减少了代码冗余和拼写错误.请参考 Router() 文 ...

  7. Opengl库函数列表

    http://www.cnblogs.com/GameDeveloper/archive/2012/01/07/2315867.html

  8. Python程序打包—pyinstaller

    简介:PyInstaller是一个十分有用的第三方库,通过对源文件打包,Python程序可以在没有安装 Python的环境中运行,也可以作为一个独立文件方便传递和管理. PyInstaller的官方网 ...

  9. centos 7.2 安装域名服务器(bind9.9 集群--主从架构),私有域名服务器+缓存

    1.安装组件 yum install bind bind-utils -y 2.启动域名服务 service named start chkconfig named on ss -unlt |grep ...

  10. wasserstein 距离

    https://blog.csdn.net/nockinonheavensdoor/article/details/82055147 注明:直观理解而已,正儿八经的严谨证明看最下面的参考. Earth ...