Oracle hidden costs revealed, Part2 – Using DTrace to find why writes in SYSTEM tablespace are slower than in others
http://blog.tanelpoder.com/2008/09/02/oracle-hidden-costs-revealed-part2-using-dtrace-to-find-why-writes-in-system-tablespace-are-slower-than-in-others/
I have written two posts in one, about a performance issue with writes in system tablespace and introduction of a little DTrace stack sampling script.
Have you noticed that DML on tables residing in SYSTEM tablespace is slower than tables in other tablespaces?
Here’s an example, I’ll create two similar tables, one in USERS tablespace, other in SYSTEM, and inset into the first one (Oracle 10.2.0.3 on Solaris x64):
SQL> create table t1(a int) tablespace USERS; Table created. SQL> create table t2(a int) tablespace SYSTEM; Table created. SQL> exec for i in 1..100000 loop insert into t1 values(i); end loop; PL/SQL procedure successfully completed. Elapsed: 00:00:03.09
Insert into table in USERS tablespace took 3 seconds.
Ok, let’s commit and flush dirty buffers that they wouldn’t get on the way of next insert.
SQL> commit; Commit complete. Elapsed: 00:00:00.02 SQL> alter system checkpoint; -- checkpointing to flush dirty buffers from previous inserts System altered. Elapsed: 00:00:01.34
SQL>
And now to the insert into the SYSTEM tablespace table:
SQL> exec for i in 1..100000 loop insert into t2 values(i); end loop; PL/SQL procedure successfully completed. Elapsed: 00:00:08.98
SQL>
What?! The same insert took 3 times longer, almost 9 seconds?
Fine! Let’s troubleshoot it!
First (as usual) I start from my Snapper for getting a performance snapshot of the session. I ran both inserts again and used Snapper from another session to monitor the insert activity (for brevity I didn’t monitor all stat counters, but used the sinclude filter to leave only various “gets” in from V$SESSTAT).
This is the insert to USERS tablespace table:
SQL> @snapper out,gather=tsw,sinclude=gets 10 1 148 -- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com ) ---------------------------------------------------------------------------------------------------------------------------------------------
HEAD, SID, SNAPSHOT START , SECONDS, TYPE, STATISTIC , DELTA, DELTA/SEC, HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA, 148, 20080901 17:20:53, 10, STAT, db block gets , 103448, 10345, 103.45k, 10.34k
DATA, 148, 20080901 17:20:53, 10, STAT, db block gets from cache , 103448, 10345, 103.45k, 10.34k
DATA, 148, 20080901 17:20:53, 10, STAT, consistent gets , 495, 49, 495, 49.5
DATA, 148, 20080901 17:20:53, 10, STAT, consistent gets from cache , 495, 49, 495, 49.5
DATA, 148, 20080901 17:20:53, 10, STAT, consistent gets - examination , 219, 22, 219, 21.9
DATA, 148, 20080901 17:20:53, 10, STAT, no work - consistent read gets , 89, 9, 89, 8.9
DATA, 148, 20080901 17:20:53, 10, STAT, rollbacks only - consistent read gets , 18, 2, 18, 1.8
DATA, 148, 20080901 17:20:53, 10, STAT, cluster key scan block gets , 65, 6, 65, 6.5
DATA, 148, 20080901 17:20:53, 10, TIME, hard parse elapsed time , 4090, 409, 4.09ms, 409us
DATA, 148, 20080901 17:20:53, 10, TIME, PL/SQL compilation elapsed time , 3139, 314, 3.14ms, 313.9us
DATA, 148, 20080901 17:20:53, 10, TIME, parse time elapsed , 5131, 513, 5.13ms, 513.1us
DATA, 148, 20080901 17:20:53, 10, TIME, PL/SQL execution elapsed time , 174288, 17429, 174.29ms, 17.43ms
DATA, 148, 20080901 17:20:53, 10, TIME, DB CPU , 3108026, 310803, 3.11s, 310.8ms
DATA, 148, 20080901 17:20:53, 10, TIME, sql execute elapsed time , 3104607, 310461, 3.1s, 310.46ms
DATA, 148, 20080901 17:20:53, 10, TIME, hard parse (sharing criteria) elapsed ti, 4110, 411, 4.11ms, 411us
DATA, 148, 20080901 17:20:53, 10, TIME, DB time , 3108026, 310803, 3.11s, 310.8ms
DATA, 148, 20080901 17:20:53, 10, WAIT, log file switch completion , 71104, 7110, 71.1ms, 7.11ms
DATA, 148, 20080901 17:20:53, 10, WAIT, SQL*Net message to client , 3, 0, 3us, .3us
DATA, 148, 20080901 17:20:53, 10, WAIT, SQL*Net message from client , 6684030, 668403, 6.68s, 668.4ms
-- End of snap 1 PL/SQL procedure successfully completed.
You see, the DB CPU equals to DB time (in other words all the time spent servicing my request in database was spent on CPU, thus there were no waits, blocks involved). And the insert still took roughly 3 seconds.
This is the insert to SYSTEM tablespace table:
SQL> @snapper out,gather=tsw,sinclude=gets 10 1 148 -- Session Snapper v1.07 by Tanel Poder ( http://www.tanelpoder.com ) ---------------------------------------------------------------------------------------------------------------------------------------------
HEAD, SID, SNAPSHOT START , SECONDS, TYPE, STATISTIC , DELTA, DELTA/SEC, HDELTA, HDELTA/SEC
---------------------------------------------------------------------------------------------------------------------------------------------
DATA, 148, 20080901 17:21:23, 11, STAT, db block gets , 102616, 9329, 102.62k, 9.33k
DATA, 148, 20080901 17:21:23, 11, STAT, db block gets from cache , 102616, 9329, 102.62k, 9.33k
DATA, 148, 20080901 17:21:23, 11, STAT, consistent gets , 384, 35, 384, 34.91
DATA, 148, 20080901 17:21:23, 11, STAT, consistent gets from cache , 384, 35, 384, 34.91
DATA, 148, 20080901 17:21:23, 11, STAT, consistent gets - examination , 218, 20, 218, 19.82
DATA, 148, 20080901 17:21:23, 11, STAT, no work - consistent read gets , 94, 9, 94, 8.55
DATA, 148, 20080901 17:21:23, 11, STAT, rollbacks only - consistent read gets , 17, 2, 17, 1.55
DATA, 148, 20080901 17:21:23, 11, STAT, cluster key scan block gets , 65, 6, 65, 5.91
DATA, 148, 20080901 17:21:23, 11, TIME, hard parse elapsed time , 2350, 214, 2.35ms, 213.64us
DATA, 148, 20080901 17:21:23, 11, TIME, PL/SQL compilation elapsed time , 1751, 159, 1.75ms, 159.18us
DATA, 148, 20080901 17:21:23, 11, TIME, parse time elapsed , 3359, 305, 3.36ms, 305.36us
DATA, 148, 20080901 17:21:23, 11, TIME, PL/SQL execution elapsed time , 172153, 15650, 172.15ms, 15.65ms
DATA, 148, 20080901 17:21:23, 11, TIME, DB CPU , 8751163, 795560, 8.75s, 795.56ms
DATA, 148, 20080901 17:21:23, 11, TIME, sql execute elapsed time , 8749139, 795376, 8.75s, 795.38ms
DATA, 148, 20080901 17:21:23, 11, TIME, hard parse (sharing criteria) elapsed ti, 2367, 215, 2.37ms, 215.18us
DATA, 148, 20080901 17:21:23, 11, TIME, DB time , 8751163, 795560, 8.75s, 795.56ms
DATA, 148, 20080901 17:21:23, 11, WAIT, SQL*Net message to client , 3, 0, 3us, .27us
DATA, 148, 20080901 17:21:23, 11, WAIT, SQL*Net message from client , -1442849, -131168, -1.44s, -131.17ms
-- End of snap 1 PL/SQL procedure successfully completed.
You see the DB Time is still equal to DB CPU, thus all response was spent on CPU. Moreover, the time model stats tell me that all the elapsed time was spent executing SQL statements (which makes sense as I’m running INSERT SQL in a tight loop).
So, for some reason the INSERT SQL statements take almost 3 times longer when inserting to SYSTEM tablespace.
In cases where there is no waiting going on (things like Oracle Wait Interface / WaitProf would not help here), but only difference in CPU time, I normally look into V$SESSTAT counter differences first. But as seen from above, there is no major difference in counters such logical IOs (db block gets, consistent gets) to justify 3x increase in CPU usage (also there was no major difference in other counters which I have excluded for brevity).
So, the obvious next step is to sample the process stack during execution of both statements and see where the difference is. In my previous posts I’ve used pstack for getting a few stack samples out of a process and a pstack loop could be used here as well.
But for demo purposes (and convenience) I will use my DTrace stack sampling script today.
It’s dead easy to sample user process and kernel stacks using DTrace in Solaris 10 (and MacOS X and FreeBSD too). However the DTrace stack aggregator also takes account of the current program counter offset inside functions when distinguishing between stacks (this means that CPU program counter location main() + 0x10 will be different from main() + 0x12). But I don’t want such granularity, I want to see where the execution was at function level, not instruction level.
So I wrote my script to strip off the PC function offsets from DTrace output and order the stacks the way that most frequently sampled stack “branch” is displayed last, in bottom of the output. So, whatever stack and function you see outputted last, is probably the “troublemaker”. I still recommend scrolling up and reviewing few other stacks too.
So, I ran the insert to USERS tablespace table again and ran my dstackprof script on the server process with SPID 859.
This is from the “well behaving” process, so normally you should just compare this stack profile to the “bad behaving” process stack profile and see if you can spot any major differences in these. I will explain few of the functions on the go.
# ./dstackprof.sh 859
DStackProf v1.02 by Tanel Poder ( http://www.tanelpoder.com )
Sampling pid 859 for 5 seconds with stack depth of 100 frames... 147 samples with stack below
__________________
kcbchg1_main
kcbchg1 <- buffer change function in buffer cache layer
ktuchg <- undo layer generate change
ktbchg2 <- block (header) change
kdtchg <- data layer generate change
kdtwrp
kdtInsRow <- data layer row insert function
insrowFastPath
insdrvFastPath
inscovexe
insExecStmtExecIniEngine
insexe <- OPI insert execution entry point
opiexe <- OPI execute (executing my recursive INSERT INTO statement)
opipls
opiodr
rpidrus <- all the RPI functions belong to recursive program interface (which allows making recursive calls in db)
skgmstack
rpidru <- these RPI functions (and few OPI ones above) are what PL/SQL context switches physically are
rpiswu2
rpidrv
psddr0 <- most of the functions starting with "p" are PL/SQL execution ones
psdnal
pevm_EXECC
pfrinstr_EXECC
pfrrun_no_tool
pfrrun
plsql_run
peicnt <- PL/SQL execution engine entry point
kkxexe <- cursor to PL/SQL execution interface
opiexe <- OPI call execution dispatcher
kpoal8
opiodr
ttcpip
opitsk
opiino
opiodr
opidrv
sou2o
opimai_real
main
0xe54ffc 193 samples with stack below
__________________
libc.so.1`times <-- during these 193 samples the execution was in opiexe (note that
opiexe -- it's a recursive OPI call execution (as there's the RPI layer below in the
opipls -- stack and there's another opiexe below. The lower opiexe is for executing
opiodr -- my PL/SQL anonymous block and upper is for the recursive INSERT statement
rpidrus
skgmstack
rpidru
rpiswu2
rpidrv
psddr0
psdnal
pevm_EXECC
pfrinstr_EXECC
pfrrun_no_tool
pfrrun
plsql_run
peicnt
kkxexe
opiexe
kpoal8
opiodr
ttcpip
opitsk
opiino
opiodr
opidrv
sou2o
opimai_real
main
0xe54ffc 3147 Total samples captured
Anyway, the two stacks which I have copied here don’t show anything obvious. And also, see that there were total 3147 samples taken when the process was on CPU. But the execution point was in the “top” stack during only 193 samples, which is around 6% of all samples.
As the above stacks come from a well-behaving process, let’s leave their deeper analysis and just compare them to the bad-behaving process. So, I ran the SYSTEM tablespace insert again (you still remember, we were troubleshooting why inserts into system tablespace were slower, right ;)
The top stack in profile was following:
# ./dstackprof.sh 859 DStackProf v1.02 by Tanel Poder ( http://www.tanelpoder.com )
Sampling pid 859 for 5 seconds with stack depth of 100 frames... [...some output snipped...]
917 samples with stack below
__________________
libc.so.1`qsort
kd4_entries_overlap
kdb4chk
kd4chk
kcbchk
kco_blkchk <-- what the heck is this? the name suggests that it's block checking of some kind
kcoapl
kcbapl
kcrfw_redo_gen <-- also note this function, looks like the block change function (kcbchg) has called
kcbchg1_main <-- a function to generate redo on its behalf
kcbchg1
ktuchg
ktbchg2
kdtchg
kdtwrp
kdtInsRow
insrowFastPath
insdrvFastPath
inscovexe
insExecStmtExecIniEngine
insexe
opiexe
opipls
opiodr
rpidrus
skgmstack
rpidru
rpiswu2
rpidrv
psddr0
psdnal
pevm_EXECC
pfrinstr_EXECC
pfrrun_no_tool
pfrrun
plsql_run
peicnt
kkxexe
opiexe
kpoal8
opiodr
ttcpip
opitsk
opiino
opiodr
opidrv
sou2o
opimai_real
main
0xe54ffc 6202 Total samples captured
As most of the top stacks from the bad behaving process were spending their time somewhere in kco_blkchk() function, it rang a bell – database block checking!
However it was weird as the db_block_checking parameter was set false in my instance:
SQL> show parameter db_block_checking NAME TYPE VALUE
------------------------------------ ----------- -----
db_block_checking string FALSE
SQL>
After a little digging, I found this parameter:
NAME VALUE DESCRIPTION
----------------------------- --------- -------------------------------------------------------------
_db_always_check_system_ts TRUE Always perform block check and checksum for System tablespace
This parameter makes Oracle (since version 8.1.6) do block structure checking in SYSTEM tablespace even if db_block_checking itself is set to FALSE. Considering the criticality of data dictionary objects, it’s a good idea.
So, now you might say: so what that there’s a performance issue, you shouldn’t keep your application tables in SYSTEM tablespace anyway!
Well, I have three words for you: AUD$, FGA_LOG$, DEF$_AQCALL :)
If you rely on auditing functionality heavily and especially if you use advanced replication, having the crucial tables (and associated LOB segments) in SYSTEM tablespace can mean quite a hit to your CPUs and system throughput. In which cases you probably would want to explore the option of moving these tables out of system tablespace.
Warning: before moving anything, consult Metalink or Oracle Support as only few strictly defined ways for moving these object are supported. See notes 731908.1 for Audit and 1037317.6 for Advanced Replication.
So, the key point of this article is that I could have started guessing or setting various events or undocumented parameters for probing Oracle here and there, for figuring out what’s going on. But I used a simple systematic approach instead. When Oracle Wait Interface and V$SESSTAT counters didn’t help, looked into process stack next. And this gave me the vital clue (kdo_blkchk) that there’s block checking going on. And if I hadn’t figured out the cause myself, I would have had some solid evidence for support.
Note that you don’t have to use DTrace for this kind of stack profiling (even though it is very convenient). On non-Solaris 10 OS’es I would use whatever safe stack dump mechanism is available (pstack for example) and post-process the results. I probably wouldn’t take thousands of samples though, but probably 100 or even less.
Also, if you want to get a complete overview of the process execution profile, you would need to do much more complex aggregations, as you’d need to aggregate “child” branches of stack samples with their “parents” (currently the samples of a backtrace “a->b->c” are not accounted in “a->b” for example). But for my case (just checking if there’s any major difference between execution profiles) this approach was good enough.
By the way (if you haven’t given up yet :), check the differences in LatchProfX output. The top is from well-behaving process and bottom one is the bad one:
SQL> @latchprofx sid,name,ksllwnam,ksllwlbl 128 % 1000000
-- LatchProfX 1.06 by Tanel Poder ( http://www.tanelpoder.com )
SID NAME KSLLWNAM KSLLWLBL Held Gets Held % Held ms Avg hold ms
---------- ----------------------------------- ---------------------------------------- -------------------- ---------- ---------- ------- ----------- -----------
128 redo copy kcrfw_redo_gen: nowait latch 49399 48937 4.94 674.790 .014
128 cache buffers chains kcbchg1: kslbegin: bufs not pinned 4574 2320 .46 62.481 .027
128 cache buffers chains kcbchg1: kslbegin: call CR func 2101 1445 .21 28.700 .020
128 cache buffers chains kcbgcur_2 1323 1060 .13 18.072 .017
128 cache buffers chains kcbnew: new latch again 269 243 .03 3.675 .015
128 simulator lru latch kcbs_simulate: simulate set setid 170 170 .02 2.322 .014
128 redo allocation kcrfw_redo_gen: redo allocation 1 strand # 68 13 .01 .929 .071
128 cache buffers chains kcbrls_2 29 29 .00 .396 .014
128 cache buffers chains kcbgtcr: fast path 25 25 .00 .342 .014
128 cache buffers chains kcbgcur: fast path (shr) 8 8 .00 .109 .014
128 cache buffers lru chain kcbzgws_1 7 7 .00 .096 .014
SQL> @latchprofx sid,name,ksllwnam,ksllwlbl 128 % 1000000
-- LatchProfX 1.06 by Tanel Poder ( http://www.tanelpoder.com )
SID NAME KSLLWNAM KSLLWLBL Held Gets Held % Held ms Avg hold ms
---------- ----------------------------------- ---------------------------------------- -------------------- ---------- ---------- ------- ----------- -----------
128 redo copy kcrfw_redo_gen: nowait latch 356465 79839 35.65 6088.422 .076
128 cache buffers chains kcbchg1: kslbegin: bufs not pinned 4704 2233 .47 80.344 .036
128 cache buffers chains kcbchg1: kslbegin: call CR func 3542 1997 .35 60.497 .030
128 cache buffers chains kcbgcur_2 1929 1448 .19 32.947 .023
128 redo allocation kcrfw_redo_gen: redo allocation 1 strand # 1018 34 .10 17.387 .511
128 shared pool simulator kglsim_unpin_simhp: fast path 713 713 .07 12.178 .017
128 simulator lru latch kcbs_simulate: simulate set setid 266 266 .03 4.543 .017
128 cache buffers chains kcbnew: new latch again 182 171 .02 3.109 .018
128 cache buffers chains kcbgtcr: fast path (cr pin) 29 29 .00 .495 .017
128 cache buffers chains kcbgtcr: fast path
Oracle hidden costs revealed, Part2 – Using DTrace to find why writes in SYSTEM tablespace are slower than in others的更多相关文章
- Linux平台 Oracle 11gR2 RAC安装Part2:GI安装
三.GI(Grid Infrastructure)安装 3.1 解压GI的安装包 3.2 安装配置Xmanager软件 3.3 共享存储LUN的赋权 3.4 使用Xmanager图形化界面安装GI 3 ...
- Linux平台 Oracle 12cR2 RAC安装Part2:GI配置
Linux平台 Oracle 12cR2 RAC安装Part2:GI配置 三.GI(Grid Infrastructure)安装 3.1 解压GI的安装包 3.2 安装配置Xmanager软件 3.3 ...
- 极端恢复时用到的几个隐藏参数 Oracle hidden corruption recovery parameters
Oracle hidden corruption recovery parameters Note: Oracle hidden parameters are undocumented and uns ...
- Linux平台 Oracle 18c RAC安装Part2:GI配置
三.GI(Grid Infrastructure)安装 3.1 解压GI的安装包 3.2 安装配置Xmanager软件 3.3 共享存储LUN的赋权 3.4 使用Xmanager图形化界面配置GI 3 ...
- Linux平台 Oracle 19c RAC安装Part2:GI配置
三.GI(Grid Infrastructure)安装 3.1 解压GI的安装包 3.2 安装配置Xmanager软件 3.3 共享存储LUN的赋权 3.4 使用Xmanager图形化界面配置GI 3 ...
- oracle linux 7.3 下用dtrace 跟踪 mysql-community-server-5.6.20
第一部分:oracle linux 7.3 dtrace 安装: http://public-yum.oracle.com/ 内核升极到 [root@server1 SPECS]# uname - ...
- [转]Oracle Hidden Parameter:_allow_resetlogs_corruption
本文转自eygle博客,原文地址:http://www.eygle.com/archives/2005/10/oracle_hidden_allow_resetlogs_corruption.html ...
- Oracle Linux 7.3下载与 dtrace安装
https://docs.oracle.com/cd/E52668_01/E86280/html/section_edm_dvp_hz.html http://www.oracle.com/techn ...
- OCP读书笔记(22) - 题库(ExamB)
101.Identify two situations in which you can use Data Recovery Advisor for recovery. (Choose two.) A ...
随机推荐
- HDU 5399 数学 Too Simple
题意:有m个1~n的映射,而且对于任意的 i 满足 f1(f2(...fm(i))) = i 其中有些映射是知道的,有些是不知道的,问一共有多少种置换的组合. 分析: 首先这些置换一定是1~n的一个置 ...
- JS页面快捷键库hotkeys.js
网友提供了一个好用的快捷键库,没有任何依赖,这是一个强健的 Javascript 库用于捕获键盘输入和输入的组合键,它没有依赖,压缩只有只有(~3kb). 这里也要特别感谢园友kacper的提醒与提供 ...
- xml编辑无提示?这么破!
在学习testng这个单元测试框架时,如果咱们碰到了编辑测试套件xml,不提示的情况(有提示方便咱们学习,并且testng的测试套件定义必须按照他的dtd文件约束来),咱们可以按照下面的步骤去解决这个 ...
- Leetcode 462.最少移动次数使数组元素相等
最少移动次数使数组元素相等 给定一个非空整数数组,找到使所有数组元素相等所需的最小移动数,其中每次移动可将选定的一个元素加1或减1. 您可以假设数组的长度最多为10000. 例如: 输入: [1,2, ...
- 漫谈登录桩(MockStub)的实现
2014年6月4日,6月的第一个星期三,我正式入职百度,开始baiduer的工作.这不到2个月的时间,因为人力资源这边原因,我从INF部门离开,拉到了百度Hi-Server团队中来.2个完全不着调的岗 ...
- 【Luogu】P2445动物园(最大流)
题目链接 题目本身还是比较水的吧……容易发现是最大流套上dinic跑一遍就好了,并不会超时. 比较不偷税的一点是关于某动物的所有目击报告都符合才能连边……qwqqwqqwq #include<c ...
- HDU——1019Least Common Multiple(多个数的最小公倍数)
Least Common Multiple Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 65536/32768 K (Java/Ot ...
- ACM程序设计选修课——1036: Hungar的菜鸟赛季(YY)
1036: Hungar的菜鸟赛季 Time Limit: 1 Sec Memory Limit: 64 MB Submit: 20 Solved: 14 [Submit][Status][Web ...
- [luoguP3231] [HNOI2013]消毒(最小点覆盖 + 状压)
传送门 考虑贪心,控制某一维为1,另两位最大是最优的,也就是一次选一个厚度为1的面 那么对于每个点,可以有3种面是可以选到它的 然后gg 考虑二维的状态,一个平面,有些点,一次选一行或一列最优 那么每 ...
- SOAP UI(ReadyAPI)学习——第一步:资源帖
SoapUI的参数说明:http://www.soapui.org/Working-with-soapUI/preferences.html 进一步了解可以阅读:http://www.51testin ...