关于Linux的core dump
core dump简介
core dump就是在进程crash时把包括内存在内的现场保留下来,以备故障分析。 但有时候,进程crash了却没有输出core,因为有一些因素会影响输出还是不输出core文件。 常见的一个coredump开关是ulimit -c,它限制允许输出的coredump文件的最大size,如果要输出的core文件大小超过这个值将不输出core文件。
ulimit -c的输出为0,代表关闭core dump输出。
[root@srdsdevapp69 ~]# ulimit -c
0
设置ulimit -c unlimited,将不对core文件大小做限制
[root@srdsdevapp69 ~]# ulimit -c unlimited
[root@srdsdevapp69 ~]# ulimit -c
unlimited
这样设置的ulimit值只在当前会话中有效,重开一个终端起进程是不受影响的。
ulimit -c只是众多影响core输出因素中的一个,其它因素可以参考man。
$ man core
...
There are various circumstances in which a core dump file is not produced: * The process does not have permission to write the core file. (By default the core file is called core,
and is created in the current working directory. See below for details on naming.) Writing the core file
will fail if the directory in which it is to be created is non-writable, or if a file with the same name
exists and is not writable or is not a regular file (e.g., it is a directory or a symbolic link). * A (writable, regular) file with the same name as would be used for the core dump already exists, but there
is more than one hard link to that file. * The file system where the core dump file would be created is full; or has run out of inodes; or is mounted
read-only; or the user has reached their quota for the file system. * The directory in which the core dump file is to be created does not exist. * The RLIMIT_CORE (core file size) or RLIMIT_FSIZE (file size) resource limits for the process are set to
zero; see getrlimit(2) and the documentation of the shell’s ulimit command (limit in csh(1)). * The binary being executed by the process does not have read permission enabled. * The process is executing a set-user-ID (set-group-ID) program that is owned by a user (group) other than
the real user (group) ID of the process. (However, see the description of the prctl(2) PR_SET_DUMPABLE
operation, and the description of the /proc/sys/fs/suid_dumpable file in proc(5).)
其实还漏了一个,进程可以捕获那些本来会出core的信号,然后自己来处理,比如MySQL就是这么干的。
abrtd
RHEL/CentOS下默认开启abrtd进行故障现场记录(包括生成coredump)和故障报告
此时abrtd进程是启动的,
[root@srdsdevapp69 ~]# service abrtd status
abrtd (pid 8711) is running...
core文件的生成位置被重定向到了abrt-hook-ccpp
[root@srdsdevapp69 ~]# cat /proc/sys/kernel/core_pattern
|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e
测试coredump
生成以下产生coredump的程序,并执行。
testcoredump.c:
int main()
{
return 1/0;
}
编译并执行
$gcc testcoredump.c -o testcoredump
$./testcoredump
查看系统日志,中途临时产生了core文件,但最后又被删掉了。
$tail
-f /var/log/messages
...
Dec 8 09:54:44 srdsdevapp69 kernel: testcoredump[4028] trap divide
error ip:400489 sp:7fff5a54b200 error:0 in testcoredump[400000+1000]
Dec 8 09:54:44 srdsdevapp69 abrtd: Directory
'ccpp-2016-12-08-09:54:44-4028' creation detected
Dec 8 09:54:44 srdsdevapp69 abrt[4029]: Saved core dump of pid 4028
(/root/testcoredump) to /var/spool/abrt/ccpp-2016-12-08-09:54:44-4028
(184320 bytes)
Dec 8 09:54:44 srdsdevapp69 abrtd: Executable '/root/testcoredump'
doesn't belong to any package
Dec 8 09:54:44 srdsdevapp69 abrtd: 'post-create' on
'/var/spool/abrt/ccpp-2016-12-08-09:54:44-4028' exited with 1
Dec 8 09:54:44 srdsdevapp69 abrtd: Corrupted or bad directory
/var/spool/abrt/ccpp-2016-12-08-09:54:44-4028, deleting
abrtd默认只保留软件包里的程序产生的core文件,修改下面的参数可以让其记录所有程序的core文件。
$vi /etc/abrt/abrt-action-save-package-data.conf
...
ProcessUnpackaged = yes
再执行一次测试程序就好生成core文件了
Dec
8 10:04:30 srdsdevapp69 kernel: testcoredump[9189] trap divide error
ip:400489 sp:7fff99973b30 error:0 in testcoredump[400000+1000]
Dec 8 10:04:30 srdsdevapp69 abrtd: Directory
'ccpp-2016-12-08-10:04:30-9189' creation detected
Dec 8 10:04:30 srdsdevapp69 abrt[9190]: Saved core dump of pid 9189
(/root/testcoredump) to /var/spool/abrt/ccpp-2016-12-08-10:04:30-9189
(184320 bytes)
Dec 8 10:04:31 srdsdevapp69 kernel: Bridge firewalling registered
Dec 8 10:04:44 srdsdevapp69 abrtd: Sending an email...
Dec 8 10:04:44 srdsdevapp69 abrtd: Email was sent to: root@localhost
Dec 8 10:04:44 srdsdevapp69 abrtd: New problem directory
/var/spool/abrt/ccpp-2016-12-08-10:04:30-9189, processing
Dec 8 10:04:44 srdsdevapp69 abrtd: No actions are found for event
'notify'
abrtd可以识别出是重复问题,并能够去重,这可以防止core文件生成的过多把磁盘用光。
Dec
8 10:18:35 srdsdevapp69 kernel: testcoredump[16598] trap divide error
ip:400489 sp:7fff26cc9f50 error:0 in testcoredump[400000+1000]
Dec 8 10:18:35 srdsdevapp69 abrtd: Directory
'ccpp-2016-12-08-10:18:35-16598' creation detected
Dec 8 10:18:35 srdsdevapp69 abrt[16599]: Saved core dump of pid 16598
(/root/testcoredump) to /var/spool/abrt/ccpp-2016-12-08-10:18:35-16598
(184320 bytes)
Dec 8 10:18:45 srdsdevapp69 abrtd: Sending an email...
Dec 8 10:18:45 srdsdevapp69 abrtd: Email was sent to: root@localhost
Dec 8 10:18:45 srdsdevapp69 abrtd: Duplicate: UUID
Dec 8 10:18:45 srdsdevapp69 abrtd: DUP_OF_DIR:
/var/spool/abrt/ccpp-2016-12-08-10:04:30-9189
Dec 8 10:18:45 srdsdevapp69 abrtd: Problem directory is a duplicate of
/var/spool/abrt/ccpp-2016-12-08-10:04:30-9189
Dec 8 10:18:45 srdsdevapp69 abrtd: Deleting problem directory
ccpp-2016-12-08-10:18:35-16598 (dup of ccpp-2016-12-08-10:04:30-9189)
Dec 8 10:18:45 srdsdevapp69 abrtd: No actions are found for event
'notify_dup'
abrtd对crash报告的大小(主要是core文件)有限制(参数MaxCrashReportsSize设置),超过了也不会生成core文件,相应的日志如下。
Dec
8 14:10:32 srdsdevapp69 abrt[10548]: Saved core dump of pid 10527
(/usr/local/Percona-Server-5.6.29-rel76.2-Linux.x86_64.ssl101/bin/mysqld)
to /var/spool/abrt/ccpp-2016-12-08-14:10:00-10527 (10513362944 bytes)
Dec 8 14:10:32 srdsdevapp69 abrtd: Directory
'ccpp-2016-12-08-14:10:00-10527' creation detected
Dec 8 14:10:32 srdsdevapp69 abrtd: Size of '/var/spool/abrt' >= 1000
MB, deleting 'ccpp-2016-12-08-14:05:43-8080'
Dec 8 14:10:32 srdsdevapp69 abrt[10548]: /var/spool/abrt is 25854515653
bytes (more than 1279MiB), deleting 'ccpp-2016-12-08-14:05:43-8080'
Dec 8 14:10:32 srdsdevapp69 abrt[10548]: Lock file
'/var/spool/abrt/ccpp-2016-12-08-14:05:43-8080/.lock' is locked by
process 7893
Dec 8 14:10:32 srdsdevapp69 abrt[10548]:
'/var/spool/abrt/ccpp-2016-12-08-14:05:43-8080' does not exist
Dec 8 14:10:41 srdsdevapp69 abrtd: Sending an email...
Dec 8 14:10:41 srdsdevapp69 abrtd: Email was sent to: root@localhost
Dec 8 14:10:41 srdsdevapp69 abrtd: New problem directory
/var/spool/abrt/ccpp-2016-12-08-14:10:00-10527, processing
Dec 8 14:10:41 srdsdevapp69 abrtd: No actions are found for event
'notify'
abrtd如何工作
abrtd是监控/var/spool/abrt/目录触发的,做个copy操作也会触发abrtd。
[root@srdsdevapp69 abrt]# cp -rf ccpp-2016-12-08-10:04:30-9189 ccpp-2016-12-08-10:04:30-91891
下面是产生的系统日志:
Dec
8 10:35:33 srdsdevapp69 abrtd: Directory
'ccpp-2016-12-08-10:04:30-91891' creation detected
Dec 8 10:35:33 srdsdevapp69 abrtd: Duplicate: UUID
Dec 8 10:35:33 srdsdevapp69 abrtd: DUP_OF_DIR:
/var/spool/abrt/ccpp-2016-12-08-10:04:30-9189
Dec 8 10:35:33 srdsdevapp69 abrtd: Problem directory is a duplicate of
/var/spool/abrt/ccpp-2016-12-08-10:04:30-9189
Dec 8 10:35:33 srdsdevapp69 abrtd: Deleting problem directory
ccpp-2016-12-08-10:04:30-91891 (dup of ccpp-2016-12-08-10:04:30-9189)
Dec 8 10:35:33 srdsdevapp69 abrtd: No actions are found for event
'notify_dup'
如果修改core生成目录,不使用abrt-hook-ccpp回调程序等于禁用了abrtd
echo "/data/core-%e-%p-%t">/proc/sys/kernel/core_pattern
再发生coredump时/var/log/messages中没有abrtd相关的记录
Dec
8 10:30:24 srdsdevapp69 kernel: testcoredump[23050] trap divide error
ip:400489 sp:7fff9f01dfb0 error:0 in testcoredump[400000+1000]
此时core文件会被直接生成到/proc/sys/kernel/core_pattern指定的位置
/data/core-testcoredump-23050-1481164224
由于/proc/sys/kernel/core_pattern中未使用abrt-hook-ccpp回调程序,检查abrt-ccpp服务状态也会相应的返回服务未启动。
[root@srdsdevapp69 ~]# service abrt-ccpp status
[root@srdsdevapp69 ~]# echo $?
3
恢复/proc/sys/kernel/core_pattern之后,abrt-ccpp服务变回正常
[root@srdsdevapp69 ~]# echo "|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e">/proc/sys/kernel/core_pattern
[root@srdsdevapp69 ~]# service abrt-ccpp status
[root@srdsdevapp69 ~]# echo $?
0
如果停止abrtd
/proc/sys/kernel/core_pattern为"|/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e"
会在生成当前目录生成core文件
Dec
8 10:46:21 srdsdevapp69 kernel: testcoredump[31364] trap divide error
ip:400489 sp:7fff15d6f450 error:0 in testcoredump[400000+1000]
Dec 8 10:46:21 srdsdevapp69 abrt[31365]: abrtd is not running. If it
crashed, /proc/sys/kernel/core_pattern contains a stale value, consider
resetting it to 'core'
Dec 8 10:46:21 srdsdevapp69 abrt[31365]: Saved core dump of pid 31364
to /root/core.31364 (184320 bytes)
开启MySQL的coredump
MySQL的服务进程mysqld会自己捕获可能引起crash的信号,默认会输出调用栈后异常退出不会生成core文件。
2016-12-08 11:14:51 14034 [Note] /usr/local/mysql/bin/mysqld: ready for connections.
Version: '5.6.29-76.2-debug-log' socket: '/mysqlrds/data/mysql.sock' port: 3306 Source distribution
03:18:43 UTC - mysqld got signal 8 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/ key_buffer_size=33554432
read_buffer_size=2097152
max_used_connections=2
max_threads=100001
thread_count=1
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 307242932 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x2427ca20
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fd53066bca8 thread_stack 0x40000
/usr/local/mysql/bin/mysqld(my_print_stacktrace+0x35)[0xaf23c9]
/usr/local/mysql/bin/mysqld(handle_fatal_signal+0x42e)[0x74d42a]
/lib64/libpthread.so.0[0x3805a0f7e0]
/usr/local/mysql/bin/mysqld(_Z19mysql_rename_tablesP3THDP10TABLE_LISTb+0x6c)[0x82fa64]
/usr/local/mysql/bin/mysqld(_Z21mysql_execute_commandP3THD+0x2aab)[0x8079e9]
/usr/local/mysql/bin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x588)[0x810ce3]
/usr/local/mysql/bin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0xd8b)[0x80228a]
/usr/local/mysql/bin/mysqld(_Z10do_commandP3THD+0x3bd)[0x801087]
/usr/local/mysql/bin/mysqld(_Z26threadpool_process_requestP3THD+0x71)[0x8ec721]
/usr/local/mysql/bin/mysqld[0x8ef363]
/usr/local/mysql/bin/mysqld[0x8ef5a0]
/usr/local/mysql/bin/mysqld(pfs_spawn_thread+0x159)[0xe14049]
/lib64/libpthread.so.0[0x3805a07aa1]
/lib64/libc.so.6(clone+0x6d)[0x32286e893d] Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (7fd508004d80): is an invalid pointer
Connection ID (thread ID): 1
Status: NOT_KILLED You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
要使其产生core文件必须打开--core-file开关
mysqld --defaults-file=/home/mysql/etc/my.cnf --core-file &
也可以将这个参数加入到my.cnf文件中
core_file
core文件的大小
关于core文件的大小有个奇怪的现象,其实际占用的磁盘空间可能远小于文件大小。
比如下面的core文件,文件大小10GB,但实际占用磁盘只有2GB(1940984 * 512B)。
[root@srdsdevapp69 ccpp-2016-12-08-14:10:00-10527]# stat coredump
File: `coredump'
Size: 10513362944 Blocks: 1940984 IO Block: 4096 regular file
Device: fd03h/64771d Inode: 14990 Links: 1
Access: (0640/-rw-r-----) Uid: ( 173/ abrt) Gid: ( 512/ mysql)
Access: 2016-12-08 14:10:41.886280668 +0800
Modify: 2016-12-08 14:10:27.704523443 +0800
Change: 2016-12-08 14:10:27.704523443 +0800
这是由于系统在生成core文件时,skip了部分全零的块,即文件中有hole(用dd的seek可以模拟这个现象)。不管是在/proc/sys/kernel/core_pattern中设置abrt-hook-ccpp程序还是直接设置文件目录,都是这个现象。这其实是一个不错的优化,节省了磁盘空间也加快了core文件生成速度。
关于Linux的core dump的更多相关文章
- Linux 设置core dump
Linux 设置core dump
- linux下core dump
1.前言 一直在从事linux下后台开发,经常与core文件打交道.还记得刚开始从事linux下开发时,程序突然崩溃了,也没有任何日志.我不知所措,同事叫我看看core,我却问什么是core,怎么看. ...
- linux下core dump【总结】
1.前言 一直在从事linux下后台开发,经常与core文件打交道.还记得刚开始从事linux下开发时,程序突然崩溃了,也没有任何日志.我不知所措,同事叫我看看core,我却问什么是core,怎么看. ...
- linux c: core dump
1. core dump文件系统设置 http://www.cnblogs.com/no7dw/archive/2013/02/18/2915819.html 编译时需要输入-g才会生成coredum ...
- linux中core dump开启使用教程【转】
转自:http://www.111cn.net/sys/linux/67291.htm 一.什么是coredump 我们经常听到大家说到程序core掉了,需要定位解决,这里说的大部分是指对应程序由于各 ...
- 【Linux】Core dump故障分析
引入: Q:如果一个程序运行3天后才会出错,这个时候难道需要我们一直用GDB调试程序3天吗? A:答案当然是否定的. 我们有更厉害的工具--Core dump 一.Coredump定义 Core Du ...
- Linux上Core Dump文件的形成和分析
原文: http://baidutech.blog.51cto.com/4114344/904419 Core,又称之为Core Dump文件,是Unix/Linux操作系统的一种机制,对于线上服务而 ...
- linux (core dump)调试
转自 http://www.cnblogs.com/hazir/p/linxu_core_dump.html Linux Core Dump 当程序运行的过程中异常终止或崩溃,操作系统会将程序当时的内 ...
- linux 平台core dump文件生成
1. 在终端中输入ulimit -c 如果结果为0,说明当程序崩溃时,系统并不能生成core dump. root@hbg:/# ulimit -c0root@hbg:/# 2.使用ulimit -c ...
- Linux 打开core dump功能
系统打开core dump功能 在终端中输入命令 ulimit -c ,输出的结果为 0,说明默认是关闭 core dump 的,即当程序异常终止时,也不会生成 core dump 文件: 使用命令 ...
随机推荐
- [LC1302] 层数最深叶子节点的和
题目概述 给你一棵二叉树的根节点 root ,请你返回 层数最深的叶子节点的和 . 基本思路 这是一个简单的树的遍历的问题,可以用bfs或者dfs来解题.这里采用dfs来解,在遍历的过程中,只需要用全 ...
- 【译】GitHub Copilot Free 在 Visual Studio 中
可能您还没有听说过,GitHub 刚刚宣布了 Copilot Free(免费版)!好消息是:您现在已经可以在 Visual Studio 中开始使用 Copilot Free 了.它现在已经可用了,我 ...
- 单点认证(SSO)方案调研总结
SSO方案 SSO介绍 单点登录(SSO)是一种身份验证解决方案,可让用户通过一次性用户身份验证登录多个应用程序和网站.这意味着用户只需输入一次用户名和密码,即可访问所有相互信任的系统,而无需在每个系 ...
- 前端(四)-jQuery
1.jQuery的基本用法 1.1 jQuery引入 <script src="js/jquery-3.4.1.min.js" type="text/javascr ...
- 开源的分布式事务解决方案-Seata
Seata 是什么? (1)Seata 是一款开源的分布式事务解决方案,致力于在微服务架构下提供高性能和简单易用的分布式事务服务. (2)在 Seata 开源之前,Seata 对应的内部版本在阿里经济 ...
- bullyBox pg walkthrough Intermediate
nmap 发现80 和 22端口 访问80 端口发现 跳转 http://bullybox.local/ 在/etc/hosts 里面加上这个域名 dirsearch 扫描的时候发现了.git泄露 用 ...
- FLink参数pipeline.operator-chaining介绍
1.当使用flink提交一个任务,没有给算子设置并行度情况下,默认所有算子会chain在一起,整个DAG图只会显示一个算子,虽然有利于数据传输,提高程序性能,但是无法看到数据的输入和疏忽,业绩反压相关 ...
- Codeforces 1536B Prinzessin der Verurteilung 题解 [ 紫 ] [ 后缀自动机 ] [ 动态规划 ] [ 拓扑排序 ]
Prinzessin der Verurteilung:最短未出现字符串的板子. 思路 考虑在 SAM 上 dp,定义 \(dp_i\) 表示从 \(i\) 节点走到 NULL 节点所花费的最少步数. ...
- WPF DevExpress GridColumn ComboBox 显示选择内容的 TooTip
实现显示当前选择的ComboBox中项的ToolTip信息: 1. 设置 GridColumn 的 CellTemplate 为 ComboBoxEdit , 然后自定义他的 ItemContaine ...
- WPF IValueConverter and IMultiValueConverter
1. 实现DataGrid column的显示和隐藏功能: (1). 定义ContextMenu ,该ContextMenu仅可使用于DataGrid的DataGridColumnHeader: &l ...