关于call_rcu在内核模块退出时可能引起kernel panic的问题
http://paulmck.livejournal.com/7314.html
RCU的作者,paul在他的blog中有提到这个问题,也明确提到需要在module exit的地方使用rcu_barrier来等待保证call_rcu的回调函数callback能够执行完成,然后再正式卸载模块,方式快速卸载之后call_back回调发现空指针的问题,从而导致kernel panic的问题。
RCU and unloadable modules
- Jun. 8th, 2009 at 1:38 PM
The rcu_barrier() function was described some time back in an article on Linux Weekly News. This rcu_barrier() function solves the problem where a given module invokes call_rcu() using a function in that module, but the module is removed before the corresponding grace period elapses, or at least before the callback can be invoked. This results in an attempt to call a function whose code has been removed from the Linux kernel. Oops!!!
Since the above article was written, rcu_barrier_bh() and rcu_barrier_sched() have been accepted into the Linux kernel, for use with call_rcu_bh() and call_rcu_sched(), respectively. These functions have seen relatively little use, which is no surprise, given that they are quite specialized. However, Jesper Dangaard recently discovered that they need to be used a bit more heavily. This lead to the question of exactly when they needed to be used, to which I responded as follows:
Unless there is some other mechanism to ensure that all the RCU callbacks have been invoked before the module exit, there needs to be code in the module-exit function that does the following:
- Prevents any new RCU callbacks from being posted. In other words, make sure that no future
call_rcu()invocations happen from this module unless thosecall_rcu()invocations touch only functions and data that outlive this module.- Invokes
rcu_barrier().- Of course, if the module uses
call_rcu_sched()instead ofcall_rcu(), then it should invokercu_barrier_sched()instead ofrcu_barrier(). Similarly, if it usescall_rcu_bh()instead ofcall_rcu(), then it should invokercu_barrier_bh()instead ofrcu_barrier(). If the module uses more than one ofcall_rcu(),call_rcu_sched(), andcall_rcu_bh(), then it must invoke more than one ofrcu_barrier(),rcu_barrier_sched(), andrcu_barrier_bh().What other mechanism could be used? I cannot think of one that it safe. For example, a module that tried to count the number of RCU callbacks in flight would be vulnerable to races as follows:
- CPU 0: RCU callback decrements the counter.
- CPU 1: module-exit function notices that the counter is zero, so removes the module.
- CPU 0: attempts to execute the code returning from the RCU callback, and dies horribly due to that code no longer being in memory.
If there was an easy solution (or even a hard solution) to this problem, then I do not believe that Nikita Danilov would have asked Dipankar Sarma for
rcu_barrier(). Therefore, I do not expect anyone to be able to come up with an alternative torcu_barrier()and friends. Always happy to learn something by being proven wrong, of course!!!So unless someone can show me some other safe mechanism, every unloadable module that uses
call_rcu(),call_rcu_sched(), orcall_rcu_bh()must usercu_barrier(),rcu_barrier_sched(), and/orrcu_barrier_bh()in its module-exit function.
So if you have a module that uses one of the call_rcu() functions, please use the corresponding rcu_barrier()function in the module-exit code!
Update: Peter Zijlstra rightly points out that the issue is not whether your module invokes call_rcu(), but rather whether the corresponding RCU callback invokes a function that is in a module. So, if there is a call_rcu(), call_rcu_sched(), or call_rcu_bh() anywhere in the kernel whose RCU callback either directly or indirectly invokes a function in your module, then your module's exit function needs to invoke rcu_barrier(), rcu_barrier_sched(), and/or rcu_barrier_bh(). Thanks to Peter for pointing this out!
关于call_rcu在内核模块退出时可能引起kernel panic的问题的更多相关文章
- Android退出时关闭所有Activity的方法
Android退出时,有的Activity可能没有被关闭.为了在Android退出时关闭所有的Activity,设计了以下的类: //关闭Activity的类 public class CloseAc ...
- Qt 程序退出时断言错误——_BLOCK_TYPE_IS_VALID(pHead->nBlockUse),由setAttribute(Qt::WA_DeleteOnClose)引起
最近在学习QT,自己仿写了一个简单的QT绘图程序,但是在退出时总是报错,断言错误: 报错主要问题在_BLOCK_TYPE_IS_VALID(pHead->nBlockUse),是在关闭窗口时报的 ...
- Android设置Activity启动和退出时的动画
业务开发时遇到的一个小特技,要求实现Activity启动时自下向上弹出,退出时自上向下退出. 此处不关注启动和退出时其他Activity的动画效果,实现方法有两种: 1.代码方式,通过Activity ...
- HP平台由于变量声明冲突导致程序退出时的core
最近遇到一个莫名的问题,在HP-UX B.11.31 U ia64平台下,程序PetriService在接收到产品化退出或Ctrl-C时,程序在main函数返回后析构全局的CTQueue<SMs ...
- Android 编程下 Activity 的创建和应用退出时的销毁
为了确保对应用中 Activity 的创建和销毁状态进行控制,所以就需要一个全局的变量来记录和销毁这些 Activity.这里的大概思路是写一个类继承 Application,并使获取该 Applic ...
- 解决log4cxx退出时的异常
解决log4cxx退出时的异常(金庆的专栏)如果使用log4cxx的FileWatchdog线程来监视日志配置文件进行动态配置,就可能碰到程序退出时产生的异常.程序退出时清理工作耗时很长时,该异常很容 ...
- 神奇的bug,退出时自动更新时间
遇到一个神奇的bug,用户退出时,上次登录时间会变成退出时的时间. 于是开始跟踪,发现Laravel在退出时,会做一次脏检查,这时会更新rember_token,这时就会有update操作如下. 而粗 ...
- java实现创建临时文件然后在程序退出时自动删除文件(转)
这篇文章主要介绍了java实现创建临时文件然后在程序退出时自动删除文件,从个人项目中提取出来的,小伙伴们可以直接拿走使用. 通过java的File类创建临时文件,然后在程序退出时自动删除临时文件.下面 ...
- os.waitpid()无法获取sys.exit()退出时的status code
[目的] 父进程使用os.waitpid()等待子进程退出,并检测子进程的exit code,以决定是否重启子进程. (常见的应用场景是:子进程接收外部命令,收到"stop"时退出 ...
随机推荐
- 使用tf.print()打印tensor内容
使用tf.Print()打印tensor内容,这是tensorflow中调试bug的一个手段,例子如下所示: import tensorflow as tf a = tf.Variable(tf.ra ...
- CSS----盒子模型与浮动
盒模型(框模型) 页面上任何一个元素我们都可以看成是一个盒子,盒子会占用一定的空间和位置他们之间相互制约,就形成了网页的布局 w3c的盒模型的构成:content border padding ma ...
- depth: working copy\infinity\immediates\files\empty
depth: working copy\infinity\immediates\files\empty 有时间,需要整理下,svn 合并深度这四项:具体的意思.
- with check(转)
假如我要为一个表中添加一个外键约束.语法如下 alter table dbo.employee with check add constraint [FK_employeeno] foreign ...
- Centos新增group和user
新增group groupadd hadoop 新增用户 useradd -d /usr/hadoop -g hadoop -m hadoop 设置密码 passwd hadoop
- spring boot IDEA 开发微服务
本文是参考:https://blog.csdn.net/u011001084/article/details/79040701 的基础上自己实际操作编写. 在我们开始创建微服务之前,需要安装Cons ...
- JMeter学习(一)工具简单介绍(转载)
转载自 http://www.cnblogs.com/yangxia-test 一.JMeter 介绍 Apache JMeter是100%纯JAVA桌面应用程序,被设计为用于测试客户端/服务端结构的 ...
- hibernate中调用query.list()而出现的黄色警告线
使用hibernate的时候会用到hql语句查询数据库, 那就一定会用到query.list();这个方法, 那就一定会出现一个长长的黄色的警告线, 不管你想尽什么办法, 总是存在, 虽然说这个黄色的 ...
- feign的hystrix不起作用.
在springCloud中使用feign内嵌的断路器hystrix时.feign中的hystrix不起作用.这可能是由于springCloud的版本原因造成的.需要在application.prope ...
- PTA 7-50 畅通工程之局部最小花费问题(最小生成树Kruskal)
某地区经过对城镇交通状况的调查,得到现有城镇间快速道路的统计数据,并提出“畅通工程”的目标:使整个地区任何两个城镇间都可以实现快速交通(但不一定有直接的快速道路相连,只要互相间接通过快速路可达即可). ...