.NET Memory Allocation Profiling with Visual Studio 2012
.NET Memory Allocation Profiling with Visual Studio 2012
This post was written by Stephen Toub, a frequent contributor to the Parallel Programming in .NET blog. He shows us how Visual Studio 2012 and an attention to detail can help you discover unnecessary allocations in your app that can prevent it from achieving higher performance.
Visual Studio 2012 has a wealth of valuable functionality, so much so that I periodically hear developers that already use Visual Studio asking for a feature the IDE already has and that they’ve just never discovered. Other times, I hear developers asking about a specific feature, thinking it’s meant for one purpose, not realizing it’s really intended for another.
Both of these cases apply to Visual Studio’s .NET memory allocation profiler. Many developers that could benefit from it don’t know it exists, and other developers have an incorrect expectation for its purpose. This is unfortunate, as the feature can provide a lot of value for particular scenarios; many developers will benefit from understanding first that it exists, and second the intended scenarios for its use.
Why memory profiling?
When it comes to .NET and memory analysis, there are two primary reasons one would want to use a diagnostics tool:
- To discover memory leaks. Leaks on a garbage-collecting runtime like the CLR manifest differently than do leaks in a non-garbage-collected environment, such as in code written in C/C++. A leak in the latter typically occurs due to the developer not manually freeing some memory that was previously allocated. In a garbage collected environment, however, manually freeing memory isn’t required, as that’s the duty of the garbage collector (GC). However, the GC can only release memory that is provably no longer being used, meaning as long as there are no rooted references to the memory. Leaks in .NET code manifest then when some memory that should have been collected is incorrectly still rooted, e.g. a reference to the object occurs in an event handler registered with a static event. A good memory analysis tool might help you to find such leaks, such as by allowing you to take snapshots of the process at two different points and then comparing those snapshots to see which objects stuck around for the second point, and more importantly, why.
- To discover unnecessary allocations. In .NET, allocation is often quite cheap. This cost is deceptive, however, as there are more costs later when the GC needs to clean up. The more memory that gets allocated, the more frequently the GC will need to run, and typically the more objects that survive collections, the more work the GC needs to do when it runs to determine which objects are no longer reachable. Thus, the more allocations a program does, the higher the GC costs will be. These GC costs are often negligible to the program’s performance profile, but for certain kinds of apps, especially those on servers that require high-throughput operation, these costs can add up quickly and make a noticeable impact to the performance of the app. As such, a good memory analysis tool might help you to understand all of the allocation being done by the program, in order to help spot allocations you can potentially avoid.
The .NET memory profiler included in Visual Studio 2012 (Professional and higher versions) was designed primarily to address the latter case of helping to discover unnecessary allocations, and it’s quite useful towards that goal, as the rest of this post will explore. The tool is not tuned for the former case of finding and fixing memory leaks, though this is an area the Visual Studio diagnostics team is looking to address in depth for the future (you can see such an experience for JavaScript that was added to Visual Studio as part of VS2012.1). While the tool today does have an advanced option to track when objects are collected, it doesn’t help you to understand why objects weren’t collected or why they were held onto longer than was expected.
There are also other useful tools in this space. The downloadable PerfView tool doesn’t provide as user-friendly an interface as does the .NET memory profiler in Visual Studio 2012, but it is a very powerful tool that supports both tasks of finding memory leaks and discovering unnecessary allocations. It also supports profiling Windows Store apps, which the .NET memory allocation profiler in Visual Studio 2012 does not currently support as of the writing of this post.
Example to be optimized
To better understand the memory profiler’s role and how it can help, let’s walk through an example. We’ll start with the core method that we’ll be looking to optimize (in a real-world case, you’d likely be analyzing your whole application and narrowing in on the particular offending areas, but for the purpose of this example, we’ll keep this constrained):
public static async Task<T> WithCancellation1<T>(this Task<T> task, CancellationToken cancellationToken)
{
var tcs = new TaskCompletionSource<bool>();
using (cancellationToken.Register(() => tcs.TrySetResult(true)))
if (task != await Task.WhenAny(task, tcs.Task))
throw new OperationCanceledException(cancellationToken);
return await task;
}
The purpose of this small method is to enable code to await a task in a cancelable manner, meaning that regardless of whether the task has completed, the developer wants to be able to stop waiting for it. Instead of writing code like:
T result = await someTask;
the developer would write:
T result = await someTask.WithCancellation1(token);
and if cancellation is requested on the relevant CancellationToken before the task completes, an OperationCanceledException will be thrown. This is achieved in WithCancellation1 by wrapping the original task in an async method. The method creates a second task that will complete when cancellation is requested (by Registering a call to TrySetResult with the CancellationToken), and then uses Task.WhenAny to wait for either the original task or the cancellation task to complete. As soon as either does, the async method completes, either by throwing a cancellation exception if the cancellation task completed first, or by propagating the outcome of the original task by awaiting it. (For more details, see the blog post “How do I cancel non-cancelable async operations?”)
To understand the allocations involved in this method, we’ll use a small harness method:
using System;
using System.Threading;
using System.Threading.Tasks; class Harness
{
static void Main()
{
Console.ReadLine(); // wait until profiler attaches
TestAsync().Wait();
}
static async Task TestAsync()
{
var token = CancellationToken.None;
for(int i=0; i<100000; i++)
await Task.FromResult(42).WithCancellation1(token);
}
} static class Extensions
{
public static async Task<T> WithCancellation1<T>(
this Task<T> task, CancellationToken cancellationToken)
{
var tcs = new TaskCompletionSource<bool>();
using (cancellationToken.Register(() => tcs.TrySetResult(true)))
if (task != await Task.WhenAny(task, tcs.Task))
throw new OperationCanceledException(cancellationToken);
return await task;
}
}
The TestAsync method will iterate 100,000 times. Each time, it creates a new task, invokes the WithCancellation1 on it, and awaits the result of that WithCancellation1 call. This await will complete synchronously, as the task created by Task.FromResult is returned in an already completed state, and the WithCancellation1 method itself doesn’t introduce any additional asynchrony such that the task it returns will complete synchronously as well.
Running the .NET memory allocation profiler
To start the memory allocation profiler, in Visual Studio go to the Analyze menu and select “Launch Performance Wizard…”. This will open a dialog like the following:
Choose “.NET memory allocation (sampling)”, click Next twice, followed by Finish (if this is the first time you’ve used the profiler since you logged into Windows, you’ll need to accept the elevation prompt so the profiler can start). At that point, the application will be launched and the profiler will start monitoring it for allocations (the harness code above also requires that you press ‘Enter’, in order to ensure the profiler has attached by the time the program starts the real test). When the app completes, or when you manually choose to stop profiling, the profiler will load symbols and will start analyzing the trace. That’s a good time to go and get yourself a cup of coffee, or lunch, as depending on how many allocations occurred, the tool can take a while to do this analysis.
When the analysis completes, we’re presented with a summary of the allocations that occurred, including highlighting the functions that allocated the most memory, the types that resulted in the most memory allocated, and the types with the most instances allocated:
From there, we can drill in further, by looking at the allocations summary (choose “Allocation” from the “Current View” dropdown):
Here, we get to see a row for each type that was allocated, with the columns showing information about how many allocations were tracked, how much space was associated with those allocations, and what percentage of allocations mapped back to that type. We can also expand an entry to see the stack of method calls that resulted in these allocations:
By selecting the “Functions” view, we can get a different pivot on this data, highlighting which functions allocated the most objects and bytes:
Interpreting and acting on the profiling results
With this capability, we can analyze our example’s results. First, we can see that there’s a substantial number of allocations here, which might be surprising. After all, in our example we were using WithCancellation1 with a task that was already completed, which means there should have been very little work to do (with the task already done, there is nothing to cancel), and yet from the above trace we can see that each iteration of our example is resulting in:
- Three allocations of Task`1 (we ran the harness 100K times and can see there were ~300K allocations)
- Two allocations of Task[]
- One allocation each of TaskCompletionSource`1, Action, a compiler-generated type called <>c_DisplayClass2`1, and some type called CompleteOnInvokePromise
That’s nine allocations for a case where we might expect only one (the task allocation we explicitly asked for in the harness by calling Task.FromResult), with our WithCancellation1 method incurring eight allocations.
For helper operations on tasks, it’s actually fairly common to deal with already completed tasks, as often times operations implemented to be asynchronous will actually complete synchronously (e.g. one read operation on a network stream may buffer into memory enough additional data to fulfill a subsequent read operation). As such, optimizing for the already completed case can be really beneficial for performance. Let’s try. Here’s a second attempt at WithCancellation, one that optimizes for several “already completed” cases:
public static Task<T> WithCancellation2<T>(this Task<T> task,
CancellationToken cancellationToken)
{
if (task.IsCompleted || !cancellationToken.CanBeCanceled)
return task;
else if (cancellationToken.IsCancellationRequested)
return new Task<T>(() => default(T), cancellationToken);
else
return task.WithCancellation1(cancellationToken);
}
This implementation checks:
- First, whether the task is already completed or whether the supplied CancellationToken can’t be canceled; in both of those cases, there’s no additional work needed, as cancellation can’t be applied, and as such we can just return the original task immediately rather than spending any time or memory creating a new one.
- Then whether cancellation has already been requested; if it has, we can allocate a single already-canceled task to be returned, rather than spending the eight allocations we previously paid to invoke our original implementation.
- Finally, if none of these fast paths apply, we fall through to calling the original implementation.
Re-profiling our micro-benchmark while using WithCancellation2 instead of WithCancellation1 provides a much improved outlook (you’ll likely notice that the analysis completes much more quickly than it did before, already a sign that we’ve significantly decreased memory allocation). Now we have just have the primary allocation we expected, the one from Task.FromResult called from our TestAsync method in the harness:
So, we’ve now successfully optimized the case where the task is already completed, where cancellation can’t be requested, or where cancellation has already been requested. What about the case where we do actually need to invoke the more complicated logic? Are there any improvements that can be made there?
Let’s change our benchmark to use a task that’s not already completed by the time we invoke WithCancellation2, and also to use a token that can have cancellation requested. This will ensure we make it to the “slow” path:
static async Task TestAsync()
{
var token = new CancellationTokenSource().Token;
for (int i = 0; i < 100000; i++)
{
var tcs = new TaskCompletionSource<int>();
var t = tcs.Task.WithCancellation2(token);
tcs.SetResult(42);
await t;
}
}
Profiling again provides more insight:
On this slow path, there are now 14 total allocations per iteration, including the 2 from our TestAsync harness (the TaskCompletionSource<int> we explicitly create, and the Task<int> it creates). At this point, we can use all of the information provided by the profiling results to understand where the remaining 12 allocations are coming from and to then address them as is relevant and possible. For example, let’s look at two allocations specifically: the <>c__DisplayClass2`1 instance and one of the two Action instances. These two allocations will likely be logical to anyone familiar with how the C# compiler handles closures. Why do we have a closure? Because of this line:
using(cancellationToken.Register(() => tcs.TrySetResult(true)))
The call to Register is closing over the ‘tcs’ variable. But this isn’t strictly necessary: the Register method has another overload which instead of taking an Action takes an Action<object> and the object state to be passed to it. If we instead rewrite this line to use that state-based overload, along with a manually cached delegate, we can avoid the closure and those two allocations:
private static readonly Action<object> s_cancellationRegistration =
s => ((TaskCompletionSource<bool>)s).TrySetResult(true);
…
using(cancellationToken.Register(s_cancellationRegistration, tcs))
Rerunning the profiler confirms those two allocations are no longer occurring:
Start profiling today!
This cycle of profiling, finding and eliminating hotspots, and then going around again is a common approach towards improving the performance of your code, whether using a CPU profiler or a memory profiler. So, if you find yourself in a scenario where you determine that minimizing allocations is important for the performance of your code, give the .NET memory allocation profiler in Visual Studio 2012 a try. Feel free to download the sample project used in this blog post.
For more on profiling, see the blog of the Visual Studio Diagnostics team, and ask them questions in the Visual Studio Diagnostics forum.
Stephen Toub
.NET Memory Allocation Profiling with Visual Studio 2012的更多相关文章
- 如何在Visual Studio 2012中发布Web应用程序时自动混淆Javascript
同Java..NET实现的应用程序类似,Javascript编写的应用程序也面临一个同样的问题:源代码的保护.尽管对大多数Javascript应用公开源代码不算是很严重的问题,但是对于某些开发者来说, ...
- 在Visual Studio 2012中使用VMSDK开发领域特定语言(二)
本文为<在Visual Studio 2012中使用VMSDK开发领域特定语言>专题文章的第二部分,在这部分内容中,将以实际应用为例,介绍开发DSL的主要步骤,包括设计.定制.调试.发布以 ...
- 在Visual Studio 2012中使用VMSDK开发领域特定语言(一)
前言 本专题主要介绍在Visual Studio 2012中使用Visualization & Modeling SDK进行领域特定语言(DSL)的开发,包括两个部分的内容.在第一部分中,将对 ...
- Visual Studio 2012 trial version
Update: vs2012.5.iso http://download.microsoft.com/download/9/F/1/9F1DEA0F-97CC-4CC4-9B4D-0DB45B8261 ...
- 在Visual Studio 2012 Blue theme下使用Dark theme的文本编辑器颜色设置
Visual Studio 2012 默认提供了3种color theme: blue,light,和dark.其中dark的文本编辑器颜色设定很爽,可是整个菜单项加上一些小的窗口如Find Resu ...
- 分享10条Visual Studio 2012的开发使用技巧
使用Visual Studio 2012有一段时间了,并不是追赶潮流,而是被逼迫无可奈何.客户要求的ASP.NET MVC 4的项目,要用.NET 4.5来运行.经过一段时间的摸索,得到一点经验和体会 ...
- Visual Studio 2012 Update 4 RC 启动调试失败解决方案
以下解决办法适用于任何Visual Studio开发环境,及Windows NT 6.1以上系统. 系统:Windows 8.1 Enterprise x64 RTM 开发环境:Visual Stud ...
- SQL Server Data Tools – Business Intelligence for Visual Studio 2012安装时提示“The CPU architecture....”的解决方法
SQL Server Data Tools – Business Intelligence for Visual Studio 2012,一个很强大的工具,下载地址:http://www.micros ...
- Visual Studio 2012+jQuery-1.7.1
今天用Visual Studio 2012开发一个网站项目,在集成jqplot图表控件并进行调试的时候(使用的是MVC4框架),加载网页绘制图表的时候总是报错(提示$.jqplot.barRender ...
随机推荐
- Linux服务器下Nginx与Apache共存
解决思路: 将nginx作为代理服务器和web服务器使用,nginx监听80端口,Apache监听除80以外的端口,我这暂时使用8080端口. nginx.conf 位置:/etc/nginx/ngi ...
- [ZJOI2010]网络扩容
OJ题号: BZOJ1834.洛谷2604 思路: 对于第一问,直接跑一遍最大流即可. 对于第二问,将每条边分成两种情况,即将每条边拆成两个: 不需扩容,即残量大于零时,相当于这条边费用为$0$: 需 ...
- C# 调用windows api 操作鼠标、键盘、窗体合集...更新中
鼠标操作window窗体合集...更新中 1.根据句柄查找窗体 引自http://www.2cto.com/kf/201410/343342.html 使用SPY++工具获取窗体 首先打开spy+ ...
- pt-query-digest简介使用
简介 pt-query-digest 是用于分析mysql慢查询的一个工具,与mysqldumpshow工具相比,py-query_digest 工具的分析结果更具体,更完善. 有时因为 ...
- [原创]zabbix工具介绍,安装及使用
[原创]zabbix工具介绍,安装及使用 http://waringid.blog.51cto.com/65148/955939/
- 在windows 10下使用docker
准备工作 Windows 10下的Docker是依赖于Hyper-v的,首先我们需要启用它:控制面板 -> 程序 -> 启用或关闭Windows功能 -> 选中Hyper-V 安装D ...
- python 廖雪峰的官方网站
https://www.liaoxuefeng.com/wiki/0014316089557264a6b348958f449949df42a6d3a2e542c000/0014316119884678 ...
- selenium之关于 chromedriver的安装和使用
转自:https://blog.csdn.net/d77808675/article/details/79016271 最近在学习爬虫,用到了selenium 环境:Windows,python3 但 ...
- 你真的会用Gson吗?Gson使用指南(2)
注:此系列基于Gson 2.4. 上一篇文章 你真的会用Gson吗?Gson使用指南(1) 我们了解了Gson的基础用法,这次我们继续深入了解Gson的使用方法. 本次的主要内容: Gson的流式反序 ...
- Shiro基础知识03----shiro授权(编程式授权),Permission详解,授权流程(zz)
授权,也叫访问控制,即在应用中控制谁能访问哪些资源(如访问页面/编辑数据/页面操作等). 在权限认证中,最核心的是:主体/用户(Subject).权限(Permission).角色(Role).资源 ...