ASMB的BUG(ORA-04030 kfmditer)导致数据库宕机
现象:

客户的一个重要生产系统RAC的一个实例宕机,查看alert日志:

Fri Jun 21 17:05:52 2013

Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc (incident=31397):

ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)

Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31397/jyj1_asmb_11391_i31397.trc

Fri Jun 21 17:05:55 2013

Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_rbal_11389.trc (incident=31389):

ORA-04030: out of process memory when trying to allocate bytes (,)

Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31389/jyj1_rbal_11389_i31389.trc

Fri Jun 21 17:06:14 2013

Instance terminated by ASMB, pid = 11391

查看asmb trace文件:

Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc (incident=31397):

ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)

Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31397/jyj1_asmb_11391_i31397.trc

Fri Jun 21 17:05:52 2013

Trace dumping is performing id=[cdmp_20130621170552]

Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc:

ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)

ASMB (ospid: 11391): terminating the instance due to error 4030

System state dump is made for local instance

System State dumped to trace file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_diag_11345.trc

Fri Jun 21 17:05:53 2013

Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_lms0_11363.trc (incident=31301):

ORA-04030: out of process memory when trying to allocate bytes (,)

Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31301/jyj1_lms0_11363_i31301.trc

Fri Jun 21 17:05:53 2013

Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_lmon_11359.trc (incident=31277):

ORA-04030: out of process memory when trying to allocate bytes (,)

Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31277/jyj1_lmon_11359_i31277.trc

Fri Jun 21 17:05:53 2013

Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_lms1_11367.trc (incident=31309):

ORA-04030: out of process memory when trying to allocate bytes (,)

Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31309/jyj1_lms1_11367_i31309.trc

Fri Jun 21 17:05:54 2013

ORA-1092 : opitsk aborting process

Fri Jun 21 17:05:54 2013

License high water mark = 327

Fri Jun 21 17:05:55 2013

Errors in file /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_rbal_11389.trc (incident=31389):

ORA-04030: out of process memory when trying to allocate bytes (,)

Incident details in: /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31389/jyj1_rbal_11389_i31389.trc

Fri Jun 21 17:06:14 2013

Instance terminated by ASMB, pid

jyj1_asmb_11391_i31397.trc:

Dump file /opt/app/diag/rdbms/jyj/jyj1/incident/incdir_31397/jyj1_asmb_11391_i31397.trc

Oracle Database 11g Enterprise Edition Release 11.1.0.6.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options

ORACLE_HOME = /opt/app/ora11gR1db

System name: Linux

Node name: KSJYJ_DB01

Release: 2.6.18-164.el5

Version: #1 SMP Thu Sep 3 04:15:13 EDT 2009

Machine: x86_64

Instance name: jyj1

Redo thread mounted by this instance: 1

Oracle process number: 24

Unix process pid: 11391, image: oracle@KSJYJ_DB01 (ASMB)

*** 2013-06-21 17:05:52.045

*** SESSION ID:(532.1) 2013-06-21 17:05:52.046

*** CLIENT ID:() 2013-06-21 17:05:52.046

*** SERVICE NAME:(SYS$BACKGROUND) 2013-06-21 17:05:52.046

*** MODULE NAME:() 2013-06-21 17:05:52.046

*** ACTION NAME:() 2013-06-21 17:05:52.046

 

Dump continued from file: /opt/app/diag/rdbms/jyj/jyj1/trace/jyj1_asmb_11391.trc

ORA-04030: out of process memory when trying to allocate 592 bytes (callheap,kfmditer)

========= Dump for incident 31397 (ORA 4030) ========

*** 2013-06-21 17:05:52.046

----- SQL Statement (None) -----

Current SQL information unavailable - no cursor.

skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp

 <- ksfdmp <- dbgexPhaseII <- dbgexProcessError <- dbgeExecuteForError <- dbgePostErrorKGE

 <- 1774 <- dbkePostKGE_kgsf <- kgesev <- kgesec3 <- kghnospc

 <- kghalf <- kfmdIterInit <- kfkIterInit <- kfnbIostatiterOp <- 110

 <- kfnbRun <- ksbrdp <- opirip <- opidrv <- sou2o

Process state

-----------------------

SO: 0x940dd1b98, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3

 proc=0x940dd1b98, name=process, file=ksu.h LINE:10286, pg=0

 (process) Oracle pid:24, ser:1, calls cur/top: 0x920f28eb8/0x920f28eb8

 flags: (0x6) SYSTEM

 int error: 0, call error: 0, sess error: 0, txn error 0

 (post info) last post received: 0 0 34

 last post received-location: ksr2.h LINE:594 ID:ksrpublish

 last process to post me: 950dfd540 47 2

 last post sent: 0 0 64

 last post sent-location: kso2.h LINE:316 ID:ksoreq_reply

 last process posted by me: 930e5c948 1 0

 (latch info) wait_event=0 bits=0

 Process Group: DEFAULT, pseudo proc: 0x950e4c060

 O/S info: user: oracle, term: UNKNOWN, ospid: 11391

 OSD pid info: Unix process pid: 11391, image: oracle@KSJYJ_DB01 (ASMB)

Dump of memory from 0x00000009D0DC0A10 to 0x00000009D0DC0C18

分析:

从报错信息(ORA-04030)看来,怀疑是Oracle的BUG导致的,因为以前碰到过类似的ASMB进程内存泄露的BUG,

于是搜索metalink关键词:asmb 04030

发现第一篇就跟客户的问题吻合。

ASMB process grows raising ora-4030 intermittently (Doc ID 735180.1)

ASMB process grows on memory, eventually leading to ora-4030 errors

which causes DB crash.

The reported error:

ORA-04030: out of process memory when trying to allocate 552 Bytes (callheap,kfmditer)

 

In the ASMB process heapdump we can see most of memory chunks are for 'kfmditer',

example:

BreakDown

 ~~~~~~~~~

 Type     Count   Sum        Average

 ~~~~     ~~~~~   ~~~        ~~~~~~~

 Free     285684  142841492  500.00

 kfmditer 285685  157698132  552.00   <-- 在ASMB的HEAPDUMP中也看到了绝大多数都为kfmditer的内存片

Total = 300539624 bytes 293495.73k 286.62MB

 

 这个BUG在11.1以后的大版本中都有出现,但是在以下的patchset中被修复:

 

 This issue is fixed in

11.2.0.1 (Base Release)

11.1.0.7.1 (Patch Set Update)

10.2.0.5 (Server Patch Set)

11.1.0.7 Patch 11 on Windows Platforms

11.1.0.7 RAC Recommended Patch Bundle #1

11.1.0.6 Patch 11 on Windows Platforms

如果不想做patchset升级的话,也可以直接打个小Patch 6851110可以解决这个问题。

You can check if Patch 6851110 is available for your patchset release and

O/S environment.:  Patch 6851110

解决方法:

在客户的数据库上打patch  6851110,经过持续观察一段时间,该问题未再现。

ASMB的BUG(ORA-04030 kfmditer)导致数据库宕机的更多相关文章

  1. rac库grid目录权限(6751)导致数据库宕机案例 此方法仅用于紧急救助

    问题: 我的rac环境不小心通过chown命令改变了/u01目录及其子目录的权限,导致rac节点2数据库宕掉,sqlplus下打开数据库报错如下: [oracle@node2 ~]$ sqlplus ...

  2. oracle 归档模式开启后数据库宕机解决过程

    首先按照网友说的shutdown immediately,结果hang了半个小时也么反应. 然后检查日志,全盘搜索.trc,发现 (D:\app\oracle\diag\rdbms\cms1u\cms ...

  3. 记-ItextPDF+freemaker 生成PDF文件---导致服务宕机

    摘要:已经上线的项目,出现服务挂掉的情况. 介绍:该服务是专门做打印的,业务需求是生成PDF文件进行页面预览,主要是使用ItextPDF+freemaker技术生成一系列PDF文件,其中生成流程有:解 ...

  4. 关于解决Tomcat服务器Connection reset by peer 导致的宕机

    org.apache.catalina.connector.ClientAbortException: java.io.IOException: Connection reset by peer at ...

  5. MySQL Bug导致异常宕机的分析流程

    原文链接:http://click.aliyun.com/m/42521/ 摘要: 本文主要通过一个bug来记录一下如何分析一个MySQL bug的崩溃信息. 版本:Percona 5.7.17-11 ...

  6. RabbitMQ消息队列阻塞导致服务器宕机

    最近工作中存储服务器由于压力太大无法及时消费消息.这个过程中,导致RabbitMQ意外挂掉,无法访问.下面是部分问题分析过程. 麒麟系统服务器分析 1.服务器异常信息: [root@localhost ...

  7. 同时大量连接导致的DDOS攻击,导致收发器宕机,用户大面积超时掉线

    前段时间一个客户改成电信网通自动路由后(当然和这个没有关系,但是客户一般没有分析能力,会多想),用户经常大面积掉线,用户才180多个,在线最多也才120多,十分苦恼,原先帮其维护的技术人员,只是远程诊 ...

  8. 11gR2 RAC启用iptables导致节点宕机问题处理

    通常,在安装数据库时,绝大多数都是要求把selinux及iptables关闭,然后再进行安装的.但是在运营商的系统中,很多安全的因素,需要将现网的数据库主机上的iptables开启的. 在开启ipta ...

  9. 一次SQLServer数据库宕机问题

    数据库采用SQL Server 2005版本, 数据库文件约为6G,而LDF日志文件已经高达36G. 服务器开始变的不太稳定 .数据没有成功保存. 打开事件查看器发现很多信息日志 数据库 '' 中的文 ...

随机推荐

  1. Qt之自定义界面(窗体缩放)

    简述 通过前两节内容,我们实现了自定义窗体的移动,以及自定义标题栏-用来显示窗体的图标.标题,以及控制窗体最小化.最大化.关闭. 在这之后,我们还缺少窗体的缩放-当鼠标移动到窗体的边框-左.上.右.下 ...

  2. UVALive 3486/zoj 2615 Cells(栈模拟dfs)

    这道题在LA是挂掉了,不过还好,zoj上也有这道题. 题意:好大一颗树,询问父子关系..考虑最坏的情况,30w层,2000w个点,询问100w次,貌似连dfs一遍都会TLE. 安心啦,这肯定是一道正常 ...

  3. phonegap archive 报错 Cordova/CDVViewController.h' file not found

    在BuildSettings->Header Search Paths  增加如下路径,问题解决 $(OBJROOT)/UninstalledProducts/include "$(O ...

  4. HTMLayout界面CSSS样式解析笔记

    HTMLayout学习笔记 by BBDXF 一.界面篇 学习界面需要有一定的HTML.CSS认知,如果你问为什么,那就当我白说. 由于界面库官方没有给一个完善的User guide,所有的学习都靠自 ...

  5. HDU 5303 Delicious Apples 美味苹果 (DP)

    题意: 给一个长为L的环,起点在12点钟位置,其他位置上有一些苹果,每次带着一个能装k个苹果的篮子从起点出发去摘苹果,要将全部苹果运到起点需要走多少米? 思路: 无论哪处地方,只要苹果数超过k个,那么 ...

  6. jQuery的威力

    jQuery如此之好用,和其在获取对象时使用与CSS选择器兼容的语法有很大关系,毕竟CSS选择器大家都很熟悉(关于CSS选择器可以看看十分钟搞定CSS选择器),但其强大在兼容了CSS3的选择器,甚至多 ...

  7. linux - markdown编辑器

    1. linux可以用web-qq,http://web2.qq.com,[我们从未放弃成长,这句话挺感动我的.] (禽兽!你怎么在一开始就跑题!?) ————我只要“及时预览”———— 2. htt ...

  8. Darwin Streaming Server 安裝操作備忘

    Darwin Streaming Server 安裝操作 Darwin Streaming Server是蘋果公司推出的開放源碼.跨平台多媒體串流伺服器, 提供音樂 (mp3) 與影音 (3gp.mp ...

  9. grep的-A-B-选项详解(转)

    grep的-A-B-选项详解(转)[@more@] grep能找出带有关键字的行,但是工作中有时需要找出该行前后的行,下面是解释 1. grep -A1 keyword filename 找出file ...

  10. hdu 3038 How Many Answers Are Wrong(种类并查集)2009 Multi-University Training Contest 13

    了解了种类并查集,同时还知道了一个小技巧,这道题就比较容易了. 其实这是我碰到的第一道种类并查集,实在不会,只好看着别人的代码写.最后半懂不懂的写完了.然后又和别人的代码进行比较,还是不懂,但还是交了 ...