Understanding a Kernel Oops!
Depending on type of error detected by the kernel, panics in the Linux kernel are classified as hard panics (Aiee!) and soft panics (Oops!). This article explains the workings of a Linux kernel ‘Oops’, helps to create a simple version, and then debug it. It
is mainly intended for beginners getting into Linux kernel development, who need to debug the kernel. Knowledge of the Linux kernel, and C programming, is assumed.
An “Oops” is what the kernel throws at us when it finds something faulty, or an exception, in the kernel code. It’s somewhat like the segfaults of user-space. An Oops dumps its message on the console; it contains the processor status and the CPU registers
of when the fault occurred. The offending process that triggered this Oops gets killed without releasing locks or cleaning up structures. The system may not even resume its normal operations sometimes; this is called an unstable state. Once an Oops has occurred,
the system cannot be trusted any further.
Let’s try to generate an Oops message with sample code, and try to understand the dump.
Setting up the machine to capture an Oops
The running kernel should be compiled with CONFIG_DEBUG_INFO
,
and syslogd
should
be running. To generate and understand an Oops message, Let’s write a sample kernel module, oops.c
:
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/init.h> staticvoidcreate_oops() {
*(int*)0 = 0;
} staticint__init my_oops_init(void) {
printk("oops from the module\n");
create_oops();
return(0);
}
staticvoid__exit my_oops_exit(void) {
printk("Goodbye world\n");
} module_init(my_oops_init);
module_exit(my_oops_exit);
The associated Makefile
for
this module is as follows:
obj-m := oops.o
KDIR := /lib/modules/$(shelluname-r)/build
PWD := $(shell pwd)
SYM=$(PWD) all:
$(MAKE) -C $(KDIR) SUBDIRS=$(PWD) modules
Once executed, the module generates the following Oops:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
PGD 7a719067 PUD 7b2b3067 PMD 0
Oops: 0002 [#1] SMP
last sysfs file: /sys/devices/virtual/misc/kvm/uevent
CPU 1
Pid: 2248, comm: insmod Tainted: P 2.6.33.3-85.fc13.x86_64
RIP: 0010:[<ffffffffa03e1012>] [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RSP: 0018:ffff88007ad4bf08 EFLAGS: 00010292
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010
FS: 00007fb79dadf700(0000) GS:ffff880001e80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 000000007a0f1000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process insmod (pid: 2248, threadinfo ffff88007ad4a000, task ffff88007a222ea0)
Stack:
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060
0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9
ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000
Call Trace:
[<ffffffff8100205f>] do_one_initcall+0x59/0x154
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00
RIP [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RSP <ffff88007ad4bf08>
CR2: 0000000000000000
Understanding the Oops dump
Let’s have a closer look at the above dump, to understand some of the important bits of information.
BUG: unable to handle kernel NULL pointer dereference at (null)
The first line indicates a pointer with a NULL value.
IP: [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
IP is the instruction pointer.
Oops: 0002 [#1] SMP
This is the error code value in hex. Each bit has a significance of its own:
bit
== 0 means no page found, 1 means a protection fault
0bit
== 0 means read, 1 means write
1bit
== 0 means kernel, 1 means user-mode
2[#1]
—
this value is the number of times the Oops occurred. Multiple Oops can be triggered as a cascading effect of the first one.
CPU 1
This denotes on which CPU the error occurred.
Pid: 2248, comm: insmod Tainted: P 2.6.33.3-85.fc13.x86_64
The Tainted
flag
points to P
here.
Each flag has its own meaning. A few other flags, and their meanings, picked up from kernel/panic.c
:
P
—
Proprietary module has been loaded.F
—
Module has been forcibly loaded.S
—
SMP with a CPU not designed for SMP.R
—
User forced a module unload.M
—
System experienced a machine check exception.B
—
System has hit bad_page.U
—
Userspace-defined naughtiness.A
—
ACPI table overridden.W
—
Taint on warning.
RIP: 0010:[<ffffffffa03e1012>] [<ffffffffa03e1012>] my_oops_init+0x12/0x21 [oops]
RIP
is
the CPU register containing the address of the instruction that is getting executed. 0010
comes
from the code segment register. my_oops_init+0x12/0x21
is
the <symbol>
+ the offset/length.
RSP: 0018:ffff88007ad4bf08 EFLAGS: 00010292
RAX: 0000000000000018 RBX: ffffffffa03e1000 RCX: 00000000000013b7
RDX: 0000000000000000 RSI: 0000000000000046 RDI: 0000000000000246
RBP: ffff88007ad4bf08 R08: ffff88007af1cba0 R09: 0000000000000004
R10: 0000000000000000 R11: ffff88007ad4bd68 R12: 0000000000000000
R13: 00000000016b0030 R14: 0000000000019db9 R15: 00000000016b0010
This is a dump of the contents of some of the CPU registers.
Stack:
ffff88007ad4bf38 ffffffff8100205f ffffffffa03de060 ffffffffa03de060
0000000000000000 00000000016b0030 ffff88007ad4bf78 ffffffff8107aac9
ffff88007ad4bf78 00007fff69f3e814 0000000000019db9 0000000000020000
The above is the stack trace.
Call Trace:
[<ffffffff8100205f>] do_one_initcall+0x59/0x154
[<ffffffff8107aac9>] sys_init_module+0xd1/0x230
[<ffffffff81009b02>] system_call_fastpath+0x16/0x1b
The above is the call trace — the list of functions being called just before the Oops occurred.
Code: <c7> 04 25 00 00 00 00 00 00 00 00 31 c0 c9 c3 00 00 00 00 00 00 00
The Code
is
a hex-dump of the section of machine code that was being run at the time the Oops occurred.
Debugging an Oops dump
The first step is to load the offending module into the GDB debugger, as follows:
[root@DELL-RnD-India oops]# gdb oops.ko
GNU gdb (GDB) Fedora (7.1-18.fc13)
Reading symbols from /code/oops/oops.ko...done.
(gdb) add-symbol-file oops.o 0xffffffffa03e1000
add symbol table from file "oops.o" at
.text_addr = 0xffffffffa03e1000
Next, add the symbol file to the debugger. The add-symbol-file
command’s
first argument isoops.o
and
the second argument is the address of the text section of the module. You can obtain this address from /sys/module/oops/sections/.init.text
(where oops
is
the module name):
(gdb) add-symbol-file oops.o 0xffffffffa03e1000
add symbol table from file "oops.o" at
.text_addr = 0xffffffffa03e1000
(y or n) y
Reading symbols from /code/oops/oops.o...done.
From the RIP
instruction
line, we can get the name of the offending function, and disassemble it.
(gdb) disassemble my_oops_init
Dump of assembler code for function my_oops_init:
0x0000000000000038 <+0>: push %rbp
0x0000000000000039 <+1>: mov $0x0,%rdi
0x0000000000000040 <+8>: xor %eax,%eax
0x0000000000000042 <+10>: mov %rsp,%rbp
0x0000000000000045 <+13>: callq 0x4a <my_oops_init+18>
0x000000000000004a <+18>: movl $0x0,0x0
0x0000000000000055 <+29>: xor %eax,%eax
0x0000000000000057 <+31>: leaveq
0x0000000000000058 <+32>: retq
End of assembler dump.
Now, to pin point the actual line of offending code, we add the starting address and the offset. The offset is available in the same RIP
instruction
line. In our case, we are adding0x0000000000000038
. This points to the
+ 0x012 = 0x000000000000004amovl
instruction.
(gdb) list *0x000000000000004a
0x4a is in my_oops_init (/code/oops/oops.c:6).
1 #include <linux/kernel.h>
2 #include <linux/module.h>
3 #include <linux/init.h>
4
5 static void create_oops() {
6 *(int *)0 = 0;
7 }
This gives the code of the offending function.
References
The kerneloops.org website
can be used to pick up a lot of Oops messages to debug. The Linux kernel documentation directory has information about Oops — kernel/Documentation/oops-tracing.txt
.
This, and numerous other online resources, were used while creating this article.
Understanding a Kernel Oops!的更多相关文章
- [fw]Understanding a Kernel Oops!
An “Oops” is what the kernel throws at us when it finds something faulty, or an exception, in the ke ...
- Linux Kernel Oops异常分析
1.PowerPC小系统内核异常分析 1.1 异常打印 Unable to handle kernel paging request for data at address 0x36fef31eFa ...
- 如何解读Linux Kernel OOPS信息
OOPS信息解读 root@firefly:~/mnt/module# insmod oops_module.ko [ 867.140514] Unable to handle kernel NULL ...
- Understanding Linux Kernel version 3 读书笔记
P30, preemptive kernel .kernel threading 和Multithreaded application support没太好理解,我想如果设计个多线程的程序来运行运行 ...
- Linux Kernel PANIC(三)--Soft Panic/Oops调试及实例分析【转】
转自:https://blog.csdn.net/gatieme/article/details/73715860 版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原 ...
- Kernel boot options
There are three ways to pass options to the kernel and thus control its behavior: When building the ...
- 学习kernel编程的建议
我把我学习kernel编程的过程介绍给大家,希望大家有个参考. 学习kernel编程需要阅读大量的kernel方面的书籍,在此我列举一下我读过的kernel书籍(按时间先后顺序),并给一些建议. 1. ...
- 绕过kernel模块版本校验检测
kernel module version check bypass . 举例说明 . 内核是怎么实现的 . 怎样去突破 . 总结 . 举例说明 Linux内核版本很多,升级很快,2个小内核版本中内核 ...
- 依据linux Oops信息准确定位错误代码所在行
在linux下调tvp5150am1的过程中,遇到了一kernel oops,内容如下: [ 66.714603] Unable to handle kernel paging request a ...
随机推荐
- Visual Studio 2015/2017 与ASP.NET CORE 联合创建具有SPA模式的Angular2模板
虽然注册博客园很久,但是一直没有什么可写的,真心感觉好尴尬了,这次终于找到了一点可以写,有点小兴奋和小害羞呢. 进入主题,前端SPA模式越来越受到欢迎,Core 也开始被很多企业提上日程,但是因为这个 ...
- JAVA发送邮件的DEMO
最近有朋友问邮件怎么发送,就简单写了个demo,因为懒得找jar包,所以项目是创建的maven工程,具体的maven引用的jar如下: <dependency> <groupId&g ...
- gradient渐变IE兼容处理
根据caniuse(http://caniuse.com/#search=gradient),rgba兼容性为IE10以及以上浏览器. 实例代码: <!doctype html> < ...
- salesforce 零基础学习(六十七)SingleEmailMessage 那点事
在salesforce开发中,发送邮件是一个很常见的功能.比如在进入审批流以后的通过和拒绝的操作需要发送邮件给记录的owner,和其他系统交互以后更改了某些状态通知相关的User或者Contact等等 ...
- uploadify上传图片的类型错误的解决办法
大家在做开发的过程中,相信很多人都会使用到uploadify插件来上传图片,但是这个插件也有不完美的地方. 我曾多次遇到过这样一个问题:上传的图片类型明明是没有问题的,但是在上传的时候总是会报错:图片 ...
- selenium自动化--(JAVA方法写的)第一章 源代码工程的导入
1.首先打开eclipse,找到eclipse的工程窗口界面,依次找到"import-->import"功能 2.在弹出来的导入对话框中,选择导入已存在的工程"Ex ...
- C++ 拷贝构造函数、拷贝赋值运算符、析构函数
每一次都会忘,做个笔记吧.想到哪里写到哪里. 拷贝构造函数 第一个参数必须是自身类类型的引用,且任何额外参数都有默认值.(为什么必须是引用?见后解释) 合成拷贝构造函数:如果我们没有为一个类定义拷贝构 ...
- POST和GET的详细解释以及区别
Http定义了与服务器交互的不同方法,最基本的方法有4种,分别是GET,POST,PUT,DELETE.URL全称是资源描述符,我们可以这样认为:一个URL地址,它用于描述一个网络上的资源,而HTTP ...
- 老李推荐:第4章1节《MonkeyRunner源码剖析》ADB协议及服务: ADB协议概览 1
老李推荐:第4章1节<MonkeyRunner源码剖析>ADB协议及服务: ADB协议概览 poptest是国内唯一一家培养测试开发工程师的培训机构,以学员能胜任自动化测试,性能测试, ...
- Mybatis基础学习(一)—初识MyBatis
一.MyBatis是什么? MyBatis 本是apache的一个开源项目iBatis, 2010年这个项目由apache software foundation 迁移到了google co ...