How to determine what causes a particular wait type
[Edit 2016: Check out my new resource – a comprehensive library of all wait types and latch classes – see here.]
Wait statistics, as you know, are one of my favorite things to do with SQL Server, along with corruption, the transaction log, and Kimberly (but not necessarily in that order :-)
One of the things that really frustrates me about wait statistics is that there is hardly any documentation about what the various wait types actually mean. For example, the Microsoft documentation for the WRITE_COMPLETION wait is ‘Occurs when a write operation is in progress.’ That’s not very illuminating. What kind of writes? Under what circumstances?
There is a relatively easy way to figure out when particular wait types are occurring, using Extended Events to track the SQL Server code call stacks when the wait occurs. Also, this information and the methodology for it are required for quite a few of the people I’m working with to analyze their 24-hours of wait statistics (see this post).
Here’s what you need to do:
- Download the symbol files for the SQL Server instance you’re interested in
- Create an Extended Event session to track the wait type you’re interested in
- Run your workload
- Examine the call stacks that you’re collected
And in this post I’ll show you how to do these steps.
Symbol files and call stacks
Whenever an executable is compiled, you can optionally have the compiler generate symbols that can help with debugging. The symbols effectively correlate offsets in the compiled code with the human-readable names of code functions, class methods, and variables. This allows us to look at what’s called a call stack and figure out what SQL Server is doing.
As an example of a call stack, consider the following C++ pseudo-code:
bool Thing::ValidateThing (ThingPtr thing){ // blah if (thing->m_Eggs != GREEN && thing->m_side != HAM) { __ThrowError (THING_NOT_VALID); } return TRUE;}void CountManager::AddNextThingToTotal (ThingPtr thing){ // blah if (TRUE == thing->ValidateThing (ThingPtr thing)) { m_ThingCount++; }}int CountManager::CountListOfThings (ThingListPtr things){ // blah AddNextThingToTotal (ThingPtr thing); // blah return m_ThingCount;} |
And we wanted to see all the call stacks that end up with an error being thrown, one such call stack might look like:
__ThrowErrorThing::ValidateThing+0x26CountManager::AddNextThingToTotal+0x441CountManager::CountListOfThings+0x104 |
It lets us see at what point in the executable something happens, and we can make sense of the call stack if the words in there make sense in our context. You might be concerned that you don’t know the internals of SQL Server, but most of the time the names of the classes and methods have enough information for you to be able to work out what’s happening.
We need the SQL Server symbol files for this to work. You can get them freely from Microsoft and I have a blog post with instructions to do it: How to download a sqlserver.pdb symbol file. If you have trouble with this, let me know as it can be tricky.
Make sure your call stacks look correct – see the example in the ‘how to’ post.
Extended Event Session
The Extended Event session to use is pretty simple. It uses the histogram target (called the ‘asynchronous bucketizer’ in earlier versions) so capture unique call stacks when the wait type we’re interested in occurs.
We do have to make sure that we use the correct wait type value in the session though, as some of the wait type values changed in Extended Events from version to version. The code to get the correct value to use is as follows, using the WRITE_COMPLETION wait type as the example:
|
1
2
3
4
5
6
7
|
-- Figure out the wait type that we needSELECT [map_key]FROM sys.dm_xe_map_valuesWHERE [name] = N'wait_types' AND [map_value] = N'WRITE_COMPLETION';GO |
One problem you may have is that some wait types aren’t listed by the name that shows up in sys.dm_os_wait_stats. Jonathan has a handy blog post that does the mapping for these – see here if you run the query above and don’t get a result.
On 2012 I get the following result:
map_key-----------628 |
You MUST make sure to check the map value for your build, as it changes from release to release, including some service packs.
We then plug that value into an Extended Event session that will give us all the call stacks and how many time each was hit:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
|
-- Drop the session if it exists.IF EXISTS ( SELECT * FROM sys.server_event_sessions WHERE [name] = N'InvestigateWaits') DROP EVENT SESSION [InvestigateWaits] ON SERVERGO-- Create the event session-- Note that before SQL 2012, the wait_type to use may be-- a different value.-- On SQL 2012 the target name is 'histogram' but the old-- name still works.CREATE EVENT SESSION [InvestigateWaits] ON SERVERADD EVENT [sqlos].[wait_info]( ACTION ([package0].[callstack]) WHERE [wait_type] = 628 -- WRITE_COMPLETION only AND [opcode] = 1 -- Just the end wait events --AND [duration] > X-milliseconds)ADD TARGET [package0].[asynchronous_bucketizer]( SET filtering_event_name = N'sqlos.wait_info', source_type = 1, -- source_type = 1 is an action source = N'package0.callstack' -- bucketize on the callstack)WITH( MAX_MEMORY = 50 MB, MAX_DISPATCH_LATENCY = 5 SECONDS)GO-- Start the sessionALTER EVENT SESSION [InvestigateWaits] ON SERVERSTATE = START;GO-- TF to allow call stack resolutionDBCC TRACEON (3656, -1);GO |
If the trace flag isn’t enabled, the call stacks will not be resolved by SQL Server. The trace flag also pins dbghelp.dll in memory – don’t worry about it causing a perf issue.
Note that there’s also a new Extended Event called wait_completed that was added in SQL Server 2014 – I’m using wait_info as it’s available in all versions with Extended Events.
If you want to grab the call stacks and the wait duration for every wait that occurs (e.g. so you can identify the cause of long-duration waits), change the session to:
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
|
CREATE EVENT SESSION [InvestigateWaits] ON SERVERADD EVENT [sqlos].[wait_info]( ACTION ([package0].[callstack]) WHERE [wait_type] = 628 -- WRITE_COMPLETION only AND [opcode] = 1 -- Just the end wait events --AND [duration] > X-milliseconds)ADD TARGET [package0].[ring_buffer]WITH( MAX_MEMORY = 50 MB, MAX_DISPATCH_LATENCY = 5 SECONDS)GO |
Workload
Now the Extended Event session exists, you need to run your workload. This may be just your regular workload, or a few example commands that you think may be involved.
As an example, for the WRITE_COMPLETION wait type, I’ll do something simple like creating a database.
Be careful: depending on the wait type you’re investigating, the Extended Event session may cause a small performance issue (e.g. a locking wait,CXPACKET wait, or PAGEIOLATCH_XX wait) so be prepared to stop the session. If you stop the session though, the information in the session histogram target disappears, so grab the data from it (see the Analysis section below) before stopping the session using the following code.
|
1
2
3
4
|
-- Stop the event sessionALTER EVENT SESSION [InvestigateWaits] ON SERVERSTATE = STOP;GO |
But the longer you can run the session for, the more likely you’ll get the causes of the wait type you’re interested in.
Analysis
To get the data out of the histogram target, use the following code:
|
1
2
3
4
5
6
7
8
9
10
11
|
-- Get the callstacks from the bucketizer targetSELECT [event_session_address], [target_name], [execution_count], CAST ([target_data] AS XML)FROM sys.dm_xe_session_targets [xst]INNER JOIN sys.dm_xe_sessions [xs] ON [xst].[event_session_address] = [xs].[address]WHERE [xs].[name] = N'InvestigateWaits';GO |
In the example I’ve created, where I’m looking at WRITE_COMPLETION waits occurring, I get back a bunch of call stacks. Here are the first few call stacks collected on SQL Server 2012 SP1:
XeSosPkg::wait_info::Publish+138 [ @ 0+0x0SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0SOS_Task::PostWait+90 [ @ 0+0x0EventInternal::Wait+1f9 [ @ 0+0x0FCB::SyncWrite+104 [ @ 0+0x0DBMgr::CopyModel+fe [ @ 0+0x0DBMgr::CreateAndFormatFiles+966 [ @ 0+0x0CStmtCreateDB::CreateLocalDatabaseFragment+682 [ @ 0+0x0DBDDLAgent::CreateDatabase+f7 [ @ 0+0x0CStmtCreateDB::XretExecute+8fc [ @ 0+0x0CMsqlExecContext::ExecuteStmts<1,1>+400 [ @ 0+0x0CMsqlExecContext::FExecute+a33 [ @ 0+0x0CSQLSource::Execute+866 [ @ 0+0x0process_request+73c [ @ 0+0x0process_commands+51c [ @ 0+0x0SOS_Task::Param::Execute+21e [ @ 0+0x0SOS_Scheduler::RunTask+a8 [ @ 0+0x0SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0SystemThread::RunWorker+8f [ @ 0+0x0SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0BaseThreadInitThunk+d [ @ 0+0x0RtlUserThreadStart+21 [ @ 0+0x0 XeSosPkg::wait_info::Publish+138 [ @ 0+0x0SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0SOS_Task::PostWait+90 [ @ 0+0x0EventInternal::Wait+1f9 [ @ 0+0x0FCB::SyncWrite+104 [ @ 0+0x0FCB::PageWriteInternal+55 [ @ 0+0x0InitGAMIntervalPages+4cb [ @ 0+0x0InitDBAllocPages+d0 [ @ 0+0x0FileMgr::CreateNewFile+137 [ @ 0+0x0AsynchronousDiskAction::ExecuteDeferredAction+8f [ @ 0+0x0AsynchronousDiskWorker::ThreadRoutine+15c [ @ 0+0x0SubprocEntrypoint+a21 [ @ 0+0x0SOS_Task::Param::Execute+21e [ @ 0+0x0SOS_Scheduler::RunTask+a8 [ @ 0+0x0SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0SystemThread::RunWorker+8f [ @ 0+0x0SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0BaseThreadInitThunk+d [ @ 0+0x0RtlUserThreadStart+21 [ @ 0+0x0 XeSosPkg::wait_info::Publish+138 [ @ 0+0x0SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0SOS_Task::PostWait+90 [ @ 0+0x0EventInternal::Wait+1f9 [ @ 0+0x0FCB::SyncWrite+104 [ @ 0+0x0FCB::PageWriteInternal+55 [ @ 0+0x0DirectlyMarkPFSPage+154 [ @ 0+0x0InitGAMIntervalPages+52a [ @ 0+0x0InitDBAllocPages+d0 [ @ 0+0x0FileMgr::CreateNewFile+137 [ @ 0+0x0AsynchronousDiskAction::ExecuteDeferredAction+8f [ @ 0+0x0AsynchronousDiskWorker::ThreadRoutine+15c [ @ 0+0x0SubprocEntrypoint+a21 [ @ 0+0x0SOS_Task::Param::Execute+21e [ @ 0+0x0SOS_Scheduler::RunTask+a8 [ @ 0+0x0SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0SystemThread::RunWorker+8f [ @ 0+0x0SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0BaseThreadInitThunk+d [ @ 0+0x0RtlUserThreadStart+21 [ @ 0+0x0 XeSosPkg::wait_info::Publish+138 [ @ 0+0x0SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0SOS_Task::PostWait+90 [ @ 0+0x0EventInternal::Wait+1f9 [ @ 0+0x0FCB::SyncWrite+104 [ @ 0+0x0SQLServerLogMgr::FormatVirtualLogFile+175 [ @ 0+0x0SQLServerLogMgr::FormatLogFile+c3 [ @ 0+0x0FileMgr::CreateNewFile+106 [ @ 0+0x0AsynchronousDiskAction::ExecuteDeferredAction+8f [ @ 0+0x0AsynchronousDiskWorker::ThreadRoutine+15c [ @ 0+0x0SubprocEntrypoint+a21 [ @ 0+0x0SOS_Task::Param::Execute+21e [ @ 0+0x0SOS_Scheduler::RunTask+a8 [ @ 0+0x0SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0SystemThread::RunWorker+8f [ @ 0+0x0SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0BaseThreadInitThunk+d [ @ 0+0x0RtlUserThreadStart+21 [ @ 0+0x0 XeSosPkg::wait_info::Publish+138 [ @ 0+0x0SOS_Scheduler::UpdateWaitTimeStats+30c [ @ 0+0x0SOS_Task::PostWait+90 [ @ 0+0x0EventInternal::Wait+1f9 [ @ 0+0x0FCB::SyncWrite+104 [ @ 0+0x0FCB::PageWriteInternal+55 [ @ 0+0x0GlobalFileHeader::CreateInitialPage+395 [ @ 0+0x0GlobalFileHeader::WriteInitialPage+50 [ @ 0+0x0FCB::InitHeaderPage+25c [ @ 0+0x0FileMgr::CreateNewFile+144 [ @ 0+0x0AsynchronousDiskAction::ExecuteDeferredAction+8f [ @ 0+0x0AsynchronousDiskWorker::ThreadRoutine+15c [ @ 0+0x0SubprocEntrypoint+a21 [ @ 0+0x0SOS_Task::Param::Execute+21e [ @ 0+0x0SOS_Scheduler::RunTask+a8 [ @ 0+0x0SOS_Scheduler::ProcessTasks+29a [ @ 0+0x0SchedulerManager::WorkerEntryPoint+261 [ @ 0+0x0SystemThread::RunWorker+8f [ @ 0+0x0SystemThreadDispatcher::ProcessWorker+3c8 [ @ 0+0x0SchedulerManager::ThreadEntryPoint+236 [ @ 0+0x0BaseThreadInitThunk+d [ @ 0+0x0RtlUserThreadStart+21 [ @ 0+0x0... |
In this example, we can see from the collected call stacks that a WRITE_COMPLETION wait occurs when the following operations occur (and there are many more, of course):
- Copying the pages from the model database into our new database (call stack 1)
- Creating and formatting the GAM, SGAM, DIFF_MAP, and ML_MAP allocation bitmaps (call stack 2)
- Creating and formatting the PFS allocation byte-maps (call stack 3)
- Creating and formatting the transaction log VLF headers (call stack 4)
- Creating and formatting the data and log file header pages (call stack 5)
How cool is that? :-)
Summary
Now you have a method to investigate any wait type that you’re seeing in your workload. I’ll be posting a bunch of information about wait types and when they occur through the year.
If you have call stacks that you’d like to know what they are, feel free to send me an email and I’ll respond within a week or so.
Enjoy!
How to determine what causes a particular wait type的更多相关文章
- 7.1数据注解属性--Key【Code-First系列】
Key特性可以被用到类的属性中,Code-First默认约定,创建一个主键,是以属性的名字“Id”,或者是类名+Id来的. Key特性重写了这个默认的约定,你可以应用Key特性到一个类的属性上面,不管 ...
- Matlab与C/C++联合编程之Matlab以MEX方式调用C/C++代码(三)
最近写了个Matlab程序,好慢呐……所以开始学习Matlab与C/C++混合编程.下面写了个测试代码,显示一个Double类型矩阵中的元素. 源代码 #include "mex.h&quo ...
- Java里面instanceof怎么实现的
开始完全一头雾水呀,后面看了Java指令集的介绍,逐渐理解了. https://www.zhihu.com/question/21574535/answer/18998914 下面这个答案比较直白 你 ...
- SOSEx ReadMe
Quick Ref:--------------------------------------------------bhi [filename] BuildHeapIndex - Builds a ...
- Data Types
原地址: Home / Database / Oracle Database Online Documentation 11g Release 2 (11.2) / Database Administ ...
- RealtimeRendering III
[RealtimeRendering III] 1.砖块渲染实例. 1)brick & mortar diffuse texture. 2)brick & mortar gloss t ...
- doc.getElementById(id); null
Open Declaration Element org.w3c.dom.Document.getElementById(String elementId) Returns the Element t ...
- SAP BAPI一览 史上最全
全BADI一览 List of BAPI's BAPI WG Component Function module name Description Description Obj. Ty ...
- 达芬奇TI DVSDK之视频数据流过程分析
作者:openwince@gmail.com 博客:http://www.cnblogs.com/tinz 本文的copyright归openwince@gmail.com所有,使用GPL发布, ...
随机推荐
- Java序列化与反序列化是什么?为什么需要序列化与反序列化?如何实现Java序列化与反序列化?
Java序列化与反序列化是什么?为什么需要序列化与反序列化?如何实现Java序列化与反序列化?本文围绕这些问题进行了探讨. 1.Java序列化与反序列化 Java序列化是指把Java对象转换为字节 ...
- [bzoj2763][JLOI2011]飞行路线——分层图最短路
水题.不多说什么. #include <bits/stdc++.h> using namespace std; const int maxn = 10010; const int maxk ...
- 正则表达式 re模块 collections模块
根据手机号码一共11位并且是只以13.14.15.18开头的数字这些特点,我们用python写了如下代码: while True: phone_number = input('please input ...
- Git服务器安装详解及安装遇到问题解决方案【转】
转自:http://www.cnblogs.com/grimm/p/5368777.html git是一个不错的版本管理的工具.现在自己在搞一个简单的应用程序开发,想使用git来进行管理.在Googl ...
- malloc和new的区别 end
3. c++中new的几种用法 c++中,new的用法很灵活,这里进行了简单的总结: 1. new() 分配这种类型的一个大小的内存空间,并以括号中的值来初始化这个变量; 2. new[] 分配这种类 ...
- 关于background
background目前有size; color; image; repeat;position;attachtment; 作用分别是一:调整背景大小. 语法:background-size:a ...
- 网站js埋点
js埋点 1.埋点作用: 页面埋点的作用:其实就是用于流量分析.而流量的意思,包含了很多:页面浏览数(PV).独立访问者数量(UV).IP.页面停留时间.页面操作时间.页面访问次数.按钮点击次数.文 ...
- window10下部署flask系统(apache和wsgi)
公司有一个小系统,通过url和其他系统进行数据交互(有点土). 因此,利用flask写了一个小程序. 现在,考虑到并发问题(flask自身是不会并发的),准备部署在apache+wsgi环境. 网上百 ...
- 《JAVA8实战》读书笔记之传递方法和传递lambda
传递方法: 假设 你有一个Apple类,它 有一个getColor方法,还有一个变量inventory保存着一个Apples的列表.你可能想要选出所 有的绿苹果,并返回一个列表.通常我们用筛选(fil ...
- MyEclipse10.7安装反编译插件
jad是一个使用比较广泛的Java反编译软件,jadClipse是jad在eclipse下的插件,下面像大家介绍下如何将jadclipse加入到MyEclipse10.X,9.X,8.X当中: htt ...