http://www.dadbm.com/database-hang-row-cache-lock-concurrency-troubleshooting/

Issue background
This post will help to analyze Oracle database instance slowdown that can happen due to considerable row cache lock (enqueue) wait events. It’s is based on a real case of a database hang that I worked on recently. I must admit this type of situation does not appear often but it’s very dangerous since it can considerably slow down a database instance or even freeze it for a short period of time. In most cases SQL against ASH view and Systemstate dumps can help to nail down the problem unless this is an Oracle bug.

Usually it occurs suddenly and disappears quickly. See an example ASH graph below with brown peak that represents this type of concurrency: row cache lock wait events.

ASH graph - Row Cache Lock concurrency
ASH graph – Row Cache Lock concurrency

What is a Row Cache Enqueue Lock?
The Row Cache or Data Dictionary Cache is a memory area in the shared pool that holds data dictionary information. Row cache holds data as rows instead of buffers. A Row cache enqueue lock is a lock on the data dictionary rows. It is used primarily to serialize changes to the data dictionary and to wait for a lock on a data dictionary cache. The enqueue will be on a specific data dictionary object. This is called the enqueue type and can be found in the v$rowcache view. Waits on this event usually indicate some form of DDL occurring, or possibly recursive operations such as storage management, sequence numbers incrementing frequently, etc. Diagnosing the cause of the contention

Diagnosing the cause of the contention

Check database alert.log
If the Row cache enqueue cannot be obtained within a certain predetermined time period, a systemstate dump will be generated in the user_dump_dest or background_dump_dest depending on whether a user or background process created the trace file. The alert log is usually updated accordingly with the warning and the location of the trace file. The message in the alert.log and the trace file generated will tend to contain the message:

Wed Sep 21 13:39:19 2011 > WAITED TOO LONG FOR A ROW CACHE ENQUEUE LOCK! pid=37
System State dumped to trace file /oracle/diag/rdbms/..../.trc

However in my case I could not find that alert.log message.

Current session waits
Check if there are current similar session waits with following SQL:

select sid,username,sql_id,event current_event,p1text,p1,p2text,p2,p3text,p3
from v$session
where event='row cache lock'
/

Active Session History (ASH)
Check Active Session History (ASH) view to identify exact period when Row Cache Lock wait event:

select *
from dba_hist_active_sess_history
where sample_time between to_date('26-MAR-14 12:49:00','DD-MON-YY HH24:MI:SS')
and to_date('26-MAR-14 12:54:00','DD-MON-YY HH24:MI:SS')
and event = 'row cache lock'
order by sample_id
/

AWR, ADDM and ASH Reports
Run AWR and ASH reports for the time when the problem is reported as well as a report leading up to the problem as these can sometimes help build a picture of when a problem actually started.

ASH Report - Row Cach Lock top wait event
ASH Report – Row Cach Lock top wait event

SGA Shrink/Resize Operations
When the SGA is dynamically resized, various latches need to be held to prevent changes from being made while the operation completes. If the resize takes a while or occurs frequently you can see Row Cache Lock waits occurring. The key identifiers for this is high waits for ‘SGA: allocation forcing component growth’ or similar waits at the top of waits in AWR. You can also use following SQL for the same check:

select component, oper_type, initial_size, final_size, to_char (start_time, 'dd/mm/yy hh24:mi') start_date, to_char (end_time, 'dd/mm/yy hh24:mi') end_date
from v$memory_resize_ops
where status = 'complete'
order by start_time desc, component
/

Issues by Row Cache Lock Enqueue Type
For each enqueue type, there are a limited number of operations that require each enqueue. The enqueue type therefore may give an indication as the type of operation that may be causing the issue. As such some common reasons are outlined below along with SQL that helps to find the qnqueue type:

select *
from v$rowcache
where cache# IN (select P1
from dba_hist_active_sess_history
where sample_time between to_date('26-MAR-14 12:49:00','DD-MON-YY HH24:MI:SS')
and to_date('26-MAR-14 12:54:00','DD-MON-YY HH24:MI:SS')
and event = 'row cache lock' )
/

DC_SEQUENCES
Caused by using sequences in simultaneous insert operations. =>
Consider caching sequences using the cache option. Especially important on RAC instances!
Bug 6027068 – Contention on ORA_TQ_BASE sequence -fixed in 10.2.0.5 and 11.2.0.1

DC_OBJECTS
Look for any object compilation activity which might require an exclusive lock, blocking other activities. If object compiles are occurring this can require an exclusive lock which will block other activity. Tune by examining invalid objects and dependencies with following SQL:

select * from dba_objects order by last_ddl_time desc;
select * from dba_objects where status = 'INVALID';

Can be a bug like the following ones: Bug 11070004 – High row cache objects latch contention w/ oracle text queries Bug 11693365 – Concurrent Drop table and Select on Reference constraint table hangs(deadlock) – fixed in 12.1 DC_SEGMENTS This is most likely due to segment allocation. Identify what the session holding the enqueue is doing and use errorstacks to diagnose.

DC_USERS
– This may occur if a session issues a GRANT to a user, and that user is in the process of logging on to the database.
– Excessive calls to dc_users can be a symptom of “set role XXXX”
– You can check the presents of massive login attempts, even the failed ones by analyzing listener.log (use OEM 12c-> All Metrics or by checking database AUDIT if available or using own tools).
– Bug 7715339 – Logon failures causes “row cache lock” waits – Allow disable of logon delay

DC_TABLESPACES
Probably the most likely cause is the allocation of new extents. If extent sizes are set low then the application may constantly be requesting new extents and causing contention. Do you have objects with small extent sizes that are rapidly growing? (You may be able to spot these by looking for objects with large numbers of extents). Check the trace for insert/update activity, check the objects inserted into for number of extents.

DC_USED_EXTENTS and DC_FREE_EXTENTS
This row cache lock wait may occur similar during space management operations where tablespaces are fragmented or have inadequate extent sizes. Tune by checking whether tablespaces are fragmented, extent sizes are too small, or tablespaces are managed manually.

DC_ROLLBACK_SEGMENTS
– This is due to rollback segment allocation. Just like dc_segments, identify what is holding the enqueue and also generate errorstacks.
Possible Bugs:
– Bug 7313166 Startup hang with self deadlock on dc_rollback_segments (Versions BELOW 11.2)
– Bug 7291739 Contention Under Auto-Tuned Undo Retention (Doc ID 742035.1)

DC_TABLE_SCNS
Bug 5756769 – Deadlock between Create MVIEW and DML – fixed in 10.2.0.5 ,11.1.07 and 11.2.0.1

DC_AWR_CONTROL
This enqueue is related to control of the Automatic Workload Repository. As such any operation manipulating the repository may hold this so look for processes blocking these.

Possible blockers – ASH & Systemstate dump
Often and in my case the wait for a Row Cache Lock is the culmination of a chain of events and the lock being held is a symptom of another issue where a process holding the requested row cache enqueue is being blocked by other processes. If you see a lot of different sessions doing different things are blocked and waiting on Row Cache Lock, it is often a symptom, not the cause. What can be a real cause then? Some examples are below:
– LGWR process can cause massive Row Cache Lock contention while waiting for a long redo log switch
– Messy application system triggers on user LOGON
– Massive session kill – … How to find a blocker:

a) Use again Active Session History (ASH) view and the following SQL:

select *
from dba_hist_active_sess_history
where sample_time between to_date('26-MAR-14 12:49:00','DD-MON-YY HH24:MI:SS')
and to_date('26-MAR-14 12:54:00','DD-MON-YY HH24:MI:SS')
and event = 'row cache lock' order by sample_id
/

b) Systemstate dumps can help to find which row cache is being requested and may help find the blocking process. To generate Systemstate dump, run the following SQL in case the issue reoccurs:

conn / as sysdba
alter session set max_dump_file_size=unlimited;
alter session set events 'immediate trace name SYSTEMSTATE level 266';
alter session set events 'immediate trace name SYSTEMSTATE level 266';
alter session set events 'immediate trace name SYSTEMSTATE level 266';

Find and analyze trace file generated in UDUMP directory or ask for MyOracleSupport help.
The challenge here, as mentioned above, to catch the issue since it’s usually disappears quickly.
There are a few ways though how you can accomplish that. I’ll share them in the next posts.

Notes:
– I saw more often Row Cache Lock issues on Oracle 11.1 -> Upgrade to at least 11.2.0.3
– Be careful doing systemdumps on production system. They might cause additional system instability.

So that was Oracle database row cache lock concurrency troubleshooting using SQL, ASH view and Systemstate dump.

Related Posts:
Oracle Listener refused connection ORA-12519 troubleshooting
Oracle 12c Pluggable Database (PDB) – SQL code examples
Oracle slow SQL query against dba_segments solved
Oracle database restrictions and workarounds at daily DBA work

Database hang and Row Cache Lock concurrency troubleshooting的更多相关文章

  1. Sessions Hang on row cache lock

    Sessions Hang on "row cache lock" (dc_objects) While Creating & Dropping a Table Concu ...

  2. 【转载】row cache lock

    转自:http://blog.itpub.net/26736162/viewspace-2139754/   定位的办法: --查询row cache lock等待 select event,p1   ...

  3. row cache lock

    SQL> col name format a30 SQL> select * from (select SAMPLE_TIME, SESSION_ID, NAME, P1, P2, P3, ...

  4. 关于library cache lock和row cache lock产生的常见原因

    这两个等待事件其实很少出现在top5列表中,一般都没什么印象,在此整理记录以便以后查阅. 常见的library cache lock产生的原因在<高级OWI与Oracle性能调查>这本书和 ...

  5. 【等待事件】序列等待事件总结(enq: SQ - contention、row cache lock、DFS lock handle和enq: SV - contention)

    [等待事件]序列等待事件总结(enq: SQ - contention.row cache lock.DFS lock handle和enq: SV -  contention) 1  BLOG文档结 ...

  6. bug 7715339 登录失败触发 ‘row cache lock’ 等待

    Bug 7715339 - Logon failures causes "row cache lock" waits - Allow disable of logon delay ...

  7. Library cache lock/pin详解

    Library cache lock/pin 一.概述 ---本文是网络资料加metalink 等整理得来一个实例中的library cache包括了不同类型对象的描述,如:游标,索引,表,视图,过程 ...

  8. library cache lock和cursor: pin S wait on X等待

    1.现象: 客户10.2.0.4 RAC环境,出现大量的library cache lock和cursor: pin S wait on X等待,经分析是由于统计信息收集僵死导致的.数据库在8点到9点 ...

  9. 如何使用event 10049分析定位library cache lock and library cache pin

    Oracle Library Cache 的 lock 与 pin 说明 一. 相关的基本概念 之前整理了一篇blog,讲了Library Cache 的机制,参考: Oracle Library c ...

随机推荐

  1. asp.net mvc5轻松实现插件式开发

    在研究Nopcommece项目代码的时候,发现Nop.Admin是作为独立项目开发的,但是部署的时候却是合在一起的,感觉挺好 这里把他这个部分单独抽离出来, 主要关键点: 确保你的项目是MVC5 而不 ...

  2. 【Ueditor】富文本编辑使用

    前提准备: 在http://ueditor.baidu.com/website/官网下载需要使用的版本.(我选用的1.4.3.1最新版本)因为这是以前做过的一个记录,现在移动到博客园保存记录.所有现在 ...

  3. [PHP] 算法-数组重复数字统计的PHP实现

    在一个长度为n的数组里的所有数字都在0到n-1的范围内. 数组中某些数字是重复的,但不知道有几个数字是重复的.也不知道每个数字重复几次.请找出数组中任意一个重复的数字. 例如,如果输入长度为7的数组{ ...

  4. java多线程关键字volatile、lock、synchronized

    --------------------- 本文来自 旭日Follow_24 的CSDN 博客 ,全文地址请点击:https://blog.csdn.net/xuri24/article/detail ...

  5. opencv学习系列:连通域参考处理

    OpenCV里提取目标轮廓的函数是findContours,它的输入图像是一幅二值图像,输出的是每一个连通区域的轮廓点的集合:vector<vector<Point>>. 外层 ...

  6. unity相机跟随Player常用方式

    固定跟随,无效果(意义不大) public class FollowPlayer : MonoBehaviour { public Transform Player; private Vector3 ...

  7. css中元素border属性的构成以及配合属性值transparent可得到一些特殊形状1.0

    css中我们经常使用到元素的border属性和属性值transparent,可能好多人还不太了解border的构成以及配合transparent的一些效果: 1.border的构成如下所示:   ht ...

  8. jenkins无法获取插件的解决办法

    很多同学在初次配置Jenkins时,是需要安装一些插件的,但是在可选插件和已安装插件里,全都是空白的. 这是为什么呢? 是因为,Jenkins默认的更新站点服务器在国外,但我们身处天朝,所以这个站点已 ...

  9. 《Inside C#》笔记(九) 表达式和运算符

    赋值和比较操作是一门语言最基本的功能. 一 基本概念 a)基本的运算符有加.减.乘.除.取余.赋值. 运算结果需要保存在内存的某个区域,有时直接保存在操作数本身,不管怎样,如果没有保存运算结果,编译器 ...

  10. Handler消息处理机制详解

    之前一直只知道handler如何使用,不知道其中的工作原理,趁着新版本提测阶段比较空闲,及时做一个总结. 先看一下Google官方文档关于handler的解释: A Handler allows yo ...