log file sync 等侍值高的一般通用解决办法

log file sync等待时间发生在redo log从log buffer写入到log file期间。

下面对log file sync做个详细的解释。

何时发生日志写入：

1.commit或者rollback

2.每3秒

3.log buffer 1/3满或者已经有1M的redo数据。

更精确的解释：_LOG_IO_SIZE 大小默认是LOG_BUFFER的1/3,当log buffer中redo数据达到_LOG_IO_SIZE 大小时，发生日志写入。

4.DBWR写之前

_log_io_size隐含参数：

LOG_BUFFER（bytes）写入的数量超过_LOG_IO_SIZE会触发lgwr写日志的条件,缺省值为LOG BUFFER的1/3或1M。

但是这个说法通过查询并不能验证，隐含参数尽量不要修改。

col name for a25

col VALUE for a20

col DESCRIB for a50

SELECT x.ksppinm NAME, y.ksppstvl VALUE, x.ksppdesc describ

FROM SYS.x$ksppi x, SYS.x$ksppcv y

WHERE x.inst_id = USERENV ('Instance')

AND y.inst_id = USERENV ('Instance')

AND x.indx = y.indx

AND x.ksppinm LIKE '_log_io_size';

NAME VALUE DESCRIB

------------------------- -------------------- --------------------------------------------------

_log_io_size 0 automatically initiate log write if this many redo

blocks in buffer

log file sync发生的过程：

此等待事件用户发出提交或回滚声明后,等待提交完成的事件,提交命令会去做日志同步,也就是写日志缓存到日志文件，在提交命令未完成前,用户将会看见此等待事件.

注意,它专指因提交,回滚而造成的写缓存到日志文件的等待.当发生此等待事件时,有时也会伴随log file parallel write.因为此等待事件将会写日志缓存，如果日志的I/O系统较为缓慢的话,

这必将造成log file parallel write 等待.当发生log file sync等待后,判断是否由于缓慢的日志I/O造成的,可以查看两个等待事件的等待时间,如果比较接近,就证明日志I/O比较缓慢或重做日志过多,这时,造成log file sync的原因是因为log file parallel write,可以参考解决log file parallel write的方法解决问题,

**如果log file sync的等待时间很高,而log file parallel write的等待时间并不高,这意味着log file sync的原因并不是缓慢的日志I/O,而是应用程序过多的提交造成。

当log file sync的等待时间和 log file parallel write等待时间基本相同，说明是IO问题造成的log file sync等待事件。

-----

更好理解的解释：

回顾一下单机数据库中的'log file sync' 等待事件，当user session 提交（commit）时，user session会通知LGWR进程将redo buffer中的信息写入到redo log file，当LGWR进程完成写操作后，LGWR再post（通知）user session 写操作已经完成，user session 接收到LGWR的通知后提交操作才完成。因此user session 在没有收到LGWR post（通知）之前一致处于等待状态，具体的等待事件为'log file sync'。

-----

引起log file sync的原因：

1.频繁提交或者rollback,检查应用是否有过多的短小的事务，如果有，可以使用批处理来缓解。

2.OS的IO缓慢：解决办法是将日志文件放裸设备上或绑定在RAID 0或RAID 1+0中，而不是绑定在RAID 5中。

3.过大的日志缓冲区（log_buffer ）

过大的log_buffer,允许LGWR变得懒惰，因为log buffer中的数据量无法达不到_LOG_IO_SIZE，导致更多的重做条目堆积在日志缓冲区中。

当事务提交或者3s醒来时，LGWR才会把所有数据都写入到redo log file中。

由于数据很多，LGWR要用更多时间等待redo写完毕。

这种情况，可以调小参数_LOG_IO_SIZE参数，其默认值是LOG_BUFFER的1/3或1MB，取两者之中较小的值。

换句话说，你可以具有较大的日志缓冲区，但较小的_LOG_IO_SIZE将增加后台写入次数，从而减少log file sync的等待时间。

4.CPU负载高。详见下面的描述。

5.RAC私有网络性能差，导致LMS同步commit SCN慢。

如何诊断log file sync：

1.AWR：发生log file sync时，先做个snapshot，然后做AWR，AWR时间选择在10-30分钟。

已发生的log file sync，那么通过AWR依然可以分析，也要保持在10-30分钟。

2.Lgwr trace file（10.2.0.4开始），大于500ms会写入

trace文件中如果有Warning: log write time 1000ms, size 2KB，很有可能IO慢。

3.分析CPU资源使用情况的工具，CPU过于繁忙，lgwr无法及时获取CPU调度，出现log file sync。

vmstat，关注r是否大于CPU核数，大于说明cpu繁忙。

OSW:OSWatcher,同上。

4.Alert：确认log file 15到20分钟切换一次

5.Script to Collect Log File Sync Diagnostic Information (lfsdiag.sql) [Document 1064487.1]

解决办法：

1.如果确实是因为频繁提交造成的log file sync,那么减少commit。

2.如果确实是因为io引起的，那么解决办法是将日志文件放裸设备上或绑定在RAID 1+0中，而不是放在在RAID 5中（切记，redo log file一定不要放在SSD上！！！）。

3.确保CPU资源充足。CPU资源不足，LGWR通知user session后，user session无法及时获得CPU调度，不能正常工作。

4.是否有些表可以使用nologging，会减少redo产生量

5.检查redo log file足够大，确保redo log file每15到20分钟切换一次。

更深入分析log file sync：

如果上面的分析没有解决log file sync等待事件，那么需要做下面的分析。

The log file sync wait may be broken down into the following components:

log file sync 能拆解为一下步骤：

1. Wakeup LGWR if idle 1.唤醒LGWR进程

2. LGWR gathers the redo to be written and issue the I/O 2.LGWR进程收集redo，然后发给I/O

3. Time for the log write I/O to complete 3.等待log写入I/O完成

4. LGWR I/O post processing 4.LGWR I/O post processing

5. LGWR posting the foreground/user session that the write has completed 5.LGWR通知前台/用户回话，redo写入完成

6. Foreground/user session wakeup 6.前台/用户会话唤醒

Steps 2 and 3 are accumulated in the "redo write time" statistic. (i.e. as found under STATISICS section of Statspack and AWR)

步骤2和3消耗的时间在AWR中的"redo write time"中有所体现。(AWR中 Instance Activity Stats )

Step 3 is the "log file parallel write" wait event. (Document:34583.1 "log file parallel write" Reference Note)

步骤3产生"log file parallel write"等待事件。

另外：如果是最大保护模式的DATAGUARD(SYNC传输),这一步骤还包含网络写、RFS/redo写入到备库的standby log file sync的时间。

Steps 5 and 6 may become very significant as the system load increases. This is because even after the foreground has been posted it may take a some time for the OS to schedule it to run. May require monitoring from O/S level.

在系统负载高时（尤其是CPU高的情况，看vmstat r值），步骤5和6会变得非常明显。因为，前台收到LGWR写入完成的通知后，操作系统需要消耗一些时间调度Foreground/user session进程唤醒（也就是CPU调度）。需要系统级别监控。

几个技术指标：

log file sync 等待时间小于20ms算正常

log file parallel write 等待时间小于20ms算正常

log file parallel wirte 和log file sync等待时间很接近，说明就是IO问题，因为大部分时间都花在了log写入到磁盘上。

相关脚本：

--等待时间平均等待时间

select EVENT,TOTAL_WAITS,TOTAL_TIMEOUTS,TIME_WAITED,AVERAGE_WAIT
from   v$system_event
where  event in ('log file sync','log file parallel write');
select value from v$parameter where name = 'log_buffer';

---------------新特性：log file sync 两种方式--------------

Adaptive Log File Sync

Adaptive Log File sync was introduced in 11.2. The parameter controlling this feature, _use_adaptive_log_file_sync, is set to false by default in 11.2.0.1 and 11.2.0.2.

_use_adaptive_log_file_sync参数在11gR2提出。11.2.0.1和11.2.0.2两个版本该参数默认是false。

从11.2.0.3开始，这个参数默认值是true，也就是开始启用“自适应日志同步机制”。

11.2.0.1和11.2.0.2也可以开启改参数

ALTER SYSTEM SET "_use_adaptive_log_file_sync"= scope=;

开启改参数后，日志同步机制会在2种方式中切换。

该参数决定了，foreground/user session 和LGWR进程通过什么方式获知commit操作已完成（也就是redo写log file完成）。

Post/wait, traditional method for posting completion of writes to redo log

传统方式，在11.2.0.3之前，user session等待LGWR通知redo写入到log file完毕，被动方式。

优点：post/wait方式，user session几乎能立即发现redo已刷到磁盘。

Polling, a new method where the foreground process checks if the LGWR has completed the write.

新方式，主动监测LGWR是否完成写入，主动方式。这种方式比Post/wait方式响应速度慢，但是可以节约CPU资源。

优点：当commit完成后，LGWR会把commit完成的消息通知给很多user session，这个过程消耗大量CPU。

Polling方式采用朱勇监测LGWR释放写入redo完成，所以释放了LGWR占用的CPU资源。

系统负载高（CPU繁忙）采用Polling方式更好。

系统负载低（CPU清闲）采用post/wait方式更好，它能够提供比polling方式更好的响应时间。

ORACLE根据内部统计信息决定采用何种方式。post/wait和polling方式互相切换能引起过热，为了确保安全，切换不要太频繁。

LGWR的trace文件记录了switch记录，关键字是 "Log file sync switching to ...":

Switch to polling:

*** 2015-01-21 08:19:04.077
kcrfw_update_adaptive_sync_mode: post->poll long#=2 sync#=5 sync=62 poll=1056 rw=454 ack=0 min_sleep=1056
*** 2015-01-21 08:19:04.077
Log file sync switching to polling
Current scheduling delay is 1 usec
Current approximate redo synch write rate is 1 per sec
kcrfw_update_adaptive_sync_mode: poll->post current_sched_delay=0 switch_sched_delay=1 current_sync_count_delta=1 switch_sync_count_delta=5

Switch to post/wait:

*** 2015-01-21 08:46:09.428
Log file sync switching to post/wait
Current approximate redo synch write rate is 0 per sec
*** 2015-01-21 08:47:46.473
kcrfw_update_adaptive_sync_mode: post->poll long#=2 sync#=11 sync=228 poll=1442 rw=721 ack=0 min_sleep=1056

相关脚本：

查询当前log file sync 方式是post-wait还是poll

SQL> select name,value from v$sysstat where name in ('redo sync poll writes','redo synch polls');
NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo synch polls                                                  325355850

每小时采用poll log file sync方式的次数

col begin_interval_time format a25
col instance_number format 99 heading INST
col stat_name format a25
select snap.BEGIN_INTERVAL_TIME,hist.instance_number , hist.stat_name,hist.redo_synch_polls
from ( select snap_id,instance_number,stat_name,value -lag(value,1,null) over ( order by snap_id,instance_number,stat_name) redo_synch_polls
from dba_hist_sysstat
where stat_name='redo synch polls'
and dbid=(select dbid from v$database)
and instance_number = nvl('&instance_number',1)) hist,
dba_hist_snapshot snap
where redo_synch_polls >0
and hist.snap_id=snap.snap_id
and hist.instance_number=snap.instance_number
order by 1,2
/
BEGIN_INTERVAL_TIME       INST STAT_NAME                 REDO_SYNCH_POLLS
------------------------- ---- ------------------------- ----------------
06-JAN-15 07.00.02.884 AM    2 redo synch polls                       734
06-JAN-15 08.00.08.425 AM    2 redo synch polls                     23767
06-JAN-15 09.00.13.770 AM    2 redo synch polls                     39827
06-JAN-15 10.00.19.233 AM    2 redo synch polls                     48479
06-JAN-15 11.00.24.431 AM    2 redo synch polls                     41541
06-JAN-15 12.00.29.670 PM    2 redo synch polls                     47566
06-JAN-15 01.00.35.029 PM    2 redo synch polls                     32169
06-JAN-15 02.00.04.159 PM    2 redo synch polls                     37405
06-JAN-15 02.59.04.536 PM    2 redo synch polls                     41469
06-JAN-15 04.00.08.556 PM    2 redo synch polls                     38683
06-JAN-15 05.00.12.523 PM    2 redo synch polls                     51618
06-JAN-15 06.00.16.584 PM    2 redo synch polls                     52511
06-JAN-15 07.00.03.352 PM    2 redo synch polls                     42229
06-JAN-15 08.00.08.663 PM    2 redo synch polls                     35229
06-JAN-15 09.00.13.882 PM    2 redo synch polls                     18499

log file sync 等侍值高的一般通用解决办法的更多相关文章

log file sync 事件（转）
log file sync log file sync等待时间发生在redo log从log buffer写入到log file期间. 下面对log file sync做个详细的解释. 何时发 ...
RAC 性能分析 - 'log file sync' 等待事件
简介本文主要讨论 RAC 数据库中的'log file sync' 等待事件.RAC 数据库中的'log file sync' 等待事件要比单机数据库中的'log file sync' 等待事件复杂 ...
log file sync 因为数据线有问题而造成高等侍的表现
这是3月份某客户的情况,原因是服务器硬件故障后进行更换之后,业务翻译偶尔出现提交缓慢的情况.我们先来看下awr的情况. 我们可以看到,该系统的load profile信息其实并不高,每秒才21个tra ...
log file sync等待超高一例
这是3月份某客户的情况,原因是server硬件故障后进行更换之后,业务翻译偶尔出现提交缓慢的情况.我们先来看下awr的情况. 我们能够看到,该系统的load profile信息事实上并不高,每秒才21 ...
完全揭秘log file sync等待事件-转自itpub
原贴地址:http://www.itpub.net/thread-1777234-1-1.html 谢谢 guoyJoe 老大这里先引用一下tanel poder大师的图: 什么是log fil ...
oracle之等待事件LOG FILE SYNC （awr）优化
log file sycn是ORACLE里最普遍的等待事件之一,一般log file sycn的等待时间都非常短 1-5ms,不会有什么问题,但是一旦出问题,往往都比较难解决.什么时候会产生log f ...
Oracle之等待事件log file sync + log file parallel write (awr优化)
这是3月份某客户的情况,原因是server硬件故障后进行更换之后,业务翻译偶尔出现提交缓慢的情况.我们先来看下awr的情况. 我们能够看到,该系统的load profile信息事实上并不高,每秒才21 ...
理解LGWR,Log File Sync Waits以及Commit的性能问题[转]
理解LGWR,Log File Sync Waits以及Commit的性能问题一．概要: 1. Commit和log filesync的工作机制 2. 为什么log file wait太久 3. ...
log file sync等待超高案例浅析
监控工具DPA发现海外一台Oracle数据库服务器DB Commit Time指标告警,超过红色告警线(40毫秒左右,黄色告警是10毫秒,红色告警线是20毫秒),如下截图所示,生成了对应的时段的AWR ...

随机推荐

解决-Django使用filter过滤时间，无法获取月份的问题
django中的filter日期查询属性有:year.month.day.week_day.hour.minute.second 但是但我在使用过滤查询是却总是无法过滤出月份,各种查资料,最后才发现是 ...
JavaScript, 函数是实现异步的基础
昨天一朋友和我聊到JS中的异步和同步, 后来从异步和同步的问题中得出了函数的另一面, 觉得挺不错, 特此分享一下 ==== 追梦子: 聊天是同步还是异步小A: 异步小A: 和你聊还可以和别人聊追 ...
[LeetCode] Beautiful Arrangement II 优美排列之二
Given two integers n and k, you need to construct a list which contains n different positive integer ...
openSUSE设置局域网的时间同步
[SDOI2009]HH的项链
题目描述 HH 有一串由各种漂亮的贝壳组成的项链.HH 相信不同的贝壳会带来好运,所以每次散步完后,他都会随意取出一段贝壳,思考它们所表达的含义.HH 不断地收集新的贝壳,因此,他的项链变得越来越长. ...
●POJ 2828 Buy Tickets
题链: http://poj.org/problem?id=2828 题解: 线段树. 逆向考虑这个过程.最后的序列S共有n个元素. 先看最后一个人,如果他插入到第i位,那么他最终的位置就是当前序列S ...
【bzoj4570 scoi2016】妖怪
题目描述邱老师是妖怪爱好者,他有n只妖怪,每只妖怪有攻击力atk和防御力dnf两种属性.邱老师立志成为妖怪大师,于是他从真新镇出发,踏上未知的旅途,见识不同的风景. 环境对妖怪的战斗力有很大影响,在 ...
hdu 2243 考研路茫茫——单词情结(AC自动+矩阵)
考研路茫茫——单词情结 Time Limit: 2000/1000 MS (Java/Others) Memory Limit: 32768/32768 K (Java/Others)Total ...
BZOJ4943 [NOI2017] 蚯蚓
题目描述蚯蚓幼儿园有nn 只蚯蚓.幼儿园园长神刀手为了管理方便,时常让这些蚯蚓们列队表演. 所有蚯蚓用从11 到nn 的连续正整数编号.每只蚯蚓的长度可以用一个正整数表示,根据入园要求,所有蚯蚓的长 ...
Thinkphp中的 I 函数（Thinkphp3.2.3版本）
I 函数的作用是获取系统变量,必要时还可以对变量值进行过滤及强制转化,I 函数的语法格式: I('变量类型.变量名/修饰符',['默认值'],['过滤方法或正则'],['额外数据源']) 一.获取变量 ...

log file sync 等侍值高的一般通用解决办法

log file sync 等侍值高的一般通用解决办法的更多相关文章

随机推荐

热门专题