ASMB的BUG(ORA-04030 kfmditer)导致数据库宕机
ASMB的BUG(ORA-04030 kfmditer)导致数据库宕机
现象:
客户的一个重要生产系统RAC的一个实例宕机,查看alert日志:
Fri Jun 21 17:05:52 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc (incident=31397):
ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31397/jyj1_asmb_11391_i31397.trc
Fri Jun 21 17:05:55 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_rbal_11389.trc (incident=31389):
ORA-04030: out of process memory when trying to allocate bytes (,)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31389/jyj1_rbal_11389_i31389.trc
Fri Jun 21 17:06:14 2013
Instance terminated by ASMB, pid = 11391
查看asmb trace文件:
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc (incident=31397):
ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31397/jyj1_asmb_11391_i31397.trc
Fri Jun 21 17:05:52 2013
Trace dumping is performing id=[cdmp_20130621170552]
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc:
ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)
ASMB (ospid: 11391): terminating the instance due to error 4030
System state dump is made for local instance
System State dumped to trace file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_diag_11345.trc
Fri Jun 21 17:05:53 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_lms0_11363.trc (incident=31301):
ORA-04030: out of process memory when trying to allocate bytes (,)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31301/jyj1_lms0_11363_i31301.trc
Fri Jun 21 17:05:53 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_lmon_11359.trc (incident=31277):
ORA-04030: out of process memory when trying to allocate bytes (,)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31277/jyj1_lmon_11359_i31277.trc
Fri Jun 21 17:05:53 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_lms1_11367.trc (incident=31309):
ORA-04030: out of process memory when trying to allocate bytes (,)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31309/jyj1_lms1_11367_i31309.trc
Fri Jun 21 17:05:54 2013
ORA-1092 : opitsk aborting process
Fri Jun 21 17:05:54 2013
License high water mark = 327
Fri Jun 21 17:05:55 2013
Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_rbal_11389.trc (incident=31389):
ORA-04030: out of process memory when trying to allocate bytes (,)
Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31389/jyj1_rbal_11389_i31389.trc
Fri Jun 21 17:06:14 2013
Instance terminated by ASMB, pid
jyj1_asmb_11391_i31397.trc:
Dump file /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31397/jyj1_asmb_11391_i31397.trc
Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /opt/app/ora11gR1db
System name: Linux
Node name: KSJYJ_DB01
Release: 2.6.18-164.el5
Version: #1 SMP Thu Sep 3 04:15:13 EDT 2009
Machine: x86_64
Instance name: jyj1
Redo thread mounted by this instance: 1
Oracle process number: 24
Unix process pid: 11391, image: oracle@KSJYJ_DB01 (ASMB)
*** 2013-06-21 17:05:52.045
*** SESSION ID:(532.1) 2013-06-21 17:05:52.046
*** CLIENT ID:() 2013-06-21 17:05:52.046
*** SERVICE NAME:(SYS$BACKGROUND) 2013-06-21 17:05:52.046
*** MODULE NAME:() 2013-06-21 17:05:52.046
*** ACTION NAME:() 2013-06-21 17:05:52.046
Dump continued from file: /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc
ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)
========= Dump for incident 31397 (ORA 4030) ========
*** 2013-06-21 17:05:52.046
----- SQL Statement (None) -----
Current SQL information unavailable - no cursor.
skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp
<- ksfdmp <- dbgexPhaseII <- dbgexProcessError <- dbgeExecuteForError <- dbgePostErrorKGE
<- 1774 <- dbkePostKGE_kgsf <- kgesev <- kgesec3 <- kghnospc
<- kghalf <- kfmdIterInit <- kfkIterInit <- kfnbIostatiterOp <- 110
<- kfnbRun <- ksbrdp <- opirip <- opidrv <- sou2o
Process state
-----------------------
SO: 0x940dd1b98, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x940dd1b98, name=process, file=ksu.h LINE:10286, pg=0
(process) Oracle pid:24, ser:1, calls cur/top: 0x920f28eb8/0x920f28eb8
flags: (0x6) SYSTEM
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 34
last post received-location: ksr2.h LINE:594 ID:ksrpublish
last process to post me: 950dfd540 47 2
last post sent: 0 0 64
last post sent-location: kso2.h LINE:316 ID:ksoreq_reply
last process posted by me: 930e5c948 1 0
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: 0x950e4c060
O/S info: user: oracle, term: UNKNOWN, ospid: 11391
OSD pid info: Unix process pid: 11391, image: oracle@KSJYJ_DB01 (ASMB)
Dump of memory from 0x00000009D0DC0A10 to 0x00000009D0DC0C18
分析:
从报错信息(ORA-04030)看来,怀疑是Oracle的BUG导致的,因为以前碰到过类似的ASMB进程内存泄露的BUG,
于是搜索metalink关键词:asmb 04030
发现第一篇就跟客户的问题吻合。
ASMB process grows raising ora-4030 intermittently (Doc ID 735180.1)
ASMB process grows on memory, eventually leading to ora-4030 errors
which causes DB crash.
The reported error:
ORA-04030: out of process memory when trying to allocate 552 Bytes (callheap,kfmditer)
In the ASMB process heapdump we can see most of memory chunks are for 'kfmditer',
example:
BreakDown
~~~~~~~~~
Type Count Sum Average
~~~~ ~~~~~ ~~~ ~~~~~~~
Free 285684 142841492 500.00
kfmditer 285685 157698132 552.00 <-- 在ASMB的HEAPDUMP中也看到了绝大多数都为kfmditer的内存片
Total = 300539624 bytes 293495.73k 286.62MB
这个BUG在11.1以后的大版本中都有出现,但是在以下的patchset中被修复:
This issue is fixed in
11.2.0.1 (Base Release)
11.1.0.7.1 (Patch Set Update)
10.2.0.5 (Server Patch Set)
11.1.0.7 Patch 11 on Windows Platforms
11.1.0.7 RAC Recommended Patch Bundle #1
11.1.0.6 Patch 11 on Windows Platforms
如果不想做patchset升级的话,也可以直接打个小Patch 6851110可以解决这个问题。
You can check if Patch 6851110 is available for your patchset release and
O/S environment.: Patch 6851110
解决方法:
在客户的数据库上打patch 6851110,经过持续观察一段时间,该问题未再现。
ASMB的BUG(ORA-04030 kfmditer)导致数据库宕机的更多相关文章
- rac库grid目录权限(6751)导致数据库宕机案例 此方法仅用于紧急救助
问题: 我的rac环境不小心通过chown命令改变了/u01目录及其子目录的权限,导致rac节点2数据库宕掉,sqlplus下打开数据库报错如下: [oracle@node2 ~]$ sqlplus ...
- oracle 归档模式开启后数据库宕机解决过程
首先按照网友说的shutdown immediately,结果hang了半个小时也么反应. 然后检查日志,全盘搜索.trc,发现 (D:\app\oracle\diag\rdbms\cms1u\cms ...
- 记-ItextPDF+freemaker 生成PDF文件---导致服务宕机
摘要:已经上线的项目,出现服务挂掉的情况. 介绍:该服务是专门做打印的,业务需求是生成PDF文件进行页面预览,主要是使用ItextPDF+freemaker技术生成一系列PDF文件,其中生成流程有:解 ...
- 关于解决Tomcat服务器Connection reset by peer 导致的宕机
org.apache.catalina.connector.ClientAbortException: java.io.IOException: Connection reset by peer at ...
- MySQL Bug导致异常宕机的分析流程
原文链接:http://click.aliyun.com/m/42521/ 摘要: 本文主要通过一个bug来记录一下如何分析一个MySQL bug的崩溃信息. 版本:Percona 5.7.17-11 ...
- RabbitMQ消息队列阻塞导致服务器宕机
最近工作中存储服务器由于压力太大无法及时消费消息.这个过程中,导致RabbitMQ意外挂掉,无法访问.下面是部分问题分析过程. 麒麟系统服务器分析 1.服务器异常信息: [root@localhost ...
- 同时大量连接导致的DDOS攻击,导致收发器宕机,用户大面积超时掉线
前段时间一个客户改成电信网通自动路由后(当然和这个没有关系,但是客户一般没有分析能力,会多想),用户经常大面积掉线,用户才180多个,在线最多也才120多,十分苦恼,原先帮其维护的技术人员,只是远程诊 ...
- 11gR2 RAC启用iptables导致节点宕机问题处理
通常,在安装数据库时,绝大多数都是要求把selinux及iptables关闭,然后再进行安装的.但是在运营商的系统中,很多安全的因素,需要将现网的数据库主机上的iptables开启的. 在开启ipta ...
- 一次SQLServer数据库宕机问题
数据库采用SQL Server 2005版本, 数据库文件约为6G,而LDF日志文件已经高达36G. 服务器开始变的不太稳定 .数据没有成功保存. 打开事件查看器发现很多信息日志 数据库 '' 中的文 ...
随机推荐
- uva580Critical Mass
递推. 用f[i]代表i个盒子的放法,设g[i]=2^n-f[i],代表i个盒子不满足条件的放法. 枚举第一个U所在的位置j.则方法有g[j-2]*(2^(i-j-2))种,j-1必须是L. 所以 ...
- PHP运行模式的深入理解
PHP运行模式有4钟:1)cgi 通用网关接口(Common Gateway Interface))2) fast-cgi 常驻 (long-live) 型的 CGI3) cli 命令行运行 ( ...
- QQ在线图标 离线 QQ开通在线QQ服务 QQ陌生人直接聊天
如图 永远都显示离线,即使QQ在线也显示离线的原因和解决方法 1:打开 这个页面 提示你开通 你就点击一下开通 这样头像就可以正常显示 离线 和在线了 http://wp.q ...
- ubuntu12.04上搭建darwin streaming server6.03
个人建议:使用DarwinStreamingSrvr5.5.5,因为DarwinStreamingSrvr6.0.3安装过程中有很多问题需要解决!而且安装只需执行./Install就可以! 1:下载d ...
- 【转】cocos2d-x Lua
Call custom c++ from Lua cocos2d-x lua binds c++ class, class functions ,enum and some global functi ...
- Meta标签详解(转)
引言 您的个人网站即使做得再精彩,在“浩瀚如海”的网络空间中,也如一叶扁舟不易为人发现,如何推广个人网站,人们首先想到的方法无外乎以下几种: ● 在搜索引擎中登录自己的个人网站 ● 在知名网站加入你个 ...
- Java 如何防止线程意外中止
Thread的run方法是不抛出任何检查型异常(checked exception)的,但是它自身却可能因为一个异常而被终止,导致这个线程的终结.最麻烦的是,在线程中抛出的异常即使使用try...ca ...
- 关于join算法的四篇文章
MySQL Join算法与调优白皮书(一) MySQL Join算法与调优白皮书(二) MySQL Join算法与调优白皮书(三) MySQL Join算法与调优白皮书(四) MariaDB Join ...
- How to easily create popup menu for DevExpress treelist z
http://www.itjungles.com/how-to-easily-create-popup-menu-for-devexpress-treelist.html Adding popup m ...
- C++ STL算法系列1---count函数
一.count函数 algorithm头文件定义了一个count的函数,其功能类似于find.这个函数使用一对迭代器和一个值做参数,返回这个值出现次数的统计结果. 编写程序读取一系列int型数据,并将 ...