HttpClient exception:ExceptionType:System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation ca
error msg:
System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException: Operation canceled --- End of inner exception stack trace --- at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error) at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token) at System.Net.Http.HttpConnection.ReadAsync(Memory`1 destination) at System.Net.Http.HttpConnection.RawConnectionStream.ReadAsync(Memory`1 buffer, CancellationToken cancellationToken) at System.Net.Security.SslStreamInternal.<FillBufferAsync>g__InternalFillBufferAsync|38_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial) at System.Net.Security.SslStreamInternal.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory`1 buffer) at System.Net.Http.HttpConnection.FillAsync() at System.Net.Http.HttpConnection.ReadNextResponseHeaderLineAsync(Boolean foldedHeadersAllowed) at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken) --- End of inner exception stack trace ---
1.
Most / all of these requests were using POST
operations (or other methods that have method bodies).
We suspect, but aren't sure, that this might be related to connection pooling in some manner.
2.
That error message is for SocketError.OperationAborted, which likely means the socket was closed while the send or receive was in progress.
3.
Is there anything here that we can look at?
To reproduce this we need to schedule > 5 runs of 4,000+ tests that takes an hour plus.
In such scenario, we typically see 1 - 3 such failures using SocketsHttpHandler
.
Note that we never use SocketsHttpHandler
directly, so I don't think that we improperly using it.
We got through HttpClient
, a shared instance that is re-used among threads, as recommended.
The code is actually doing something like:
try
{
return await httpHandler.SendAsync(...);
}
catch(HttpRequestException e)
{
}
So we can catch the exception and then check state / logging / something.
Anything you can recommend that we'll look at to get to the root cause?
4.
Is it possible you're hitting a timeout? It shouldn't manifest this way, but it would give us a place to start looking. For example, if you change your HttpClient's Timeout to Timeout.InfiniteTimeSpan, does this still happen?
Or if it's potentially related to connection pooling, what happens if you change the SocketHttpHandler's PooledConnectionLifetime and PooledConnectionIdleTimeout to infinite... does it still repro?
5.
And... I have a repro. On my machine, this reproduce the scenario instantly.
You were correct about the timeout causing this.
using System;
using System.Net.Http;
using System.Threading.Tasks; namespace ConsoleApp15
{
class Program
{
static async Task Main(string[] args)
{ // http://slowwly.robertomurray.co.uk/delay/3000/url/http://www.google.co.uk var client = new HttpClient(new SocketsHttpHandler
{
})
{
Timeout = TimeSpan.FromMilliseconds(250)
}; var tasks = new[]
{
Run(client),
Run(client),
Run(client),
Run(client),
Run(client),
Run(client),
Run(client),
Run(client),
Run(client),
}; await Task.WhenAll(tasks); Console.WriteLine("Done"); } public static async Task Run(HttpClient client)
{
for (int i = 0; i < 25; i++)
{
try
{
await client.GetStringAsync("http://slowwly.robertomurray.co.uk/delay/3000/url/http://www.google.co.uk");
Console.WriteLine("Should timeout");
}
catch (Exception e)
{
}
await client.GetStringAsync("http://slowwly.robertomurray.co.uk/delay/100/url/http://www.google.co.uk");
} }
}
}
6.
That's hitting a timeout and throwing an OperationCanceledException... that's expected.
What's not expected is getting an IOException (not wrapped in an OperationCanceledException) out of SendAsync and when nothing on the request is being canceled / timing out.
7.
I captured System.Net logs and Wireshark traces of another failure. Here's where I see the failure in Wireshark:
1990370 214.439692 127.0.0.1 → 127.0.0.1 TCP 84 51686 → 51687 [FIN, ACK] Seq=1 Ack=10334254 Win=525568 Len=0
1990371 214.439736 127.0.0.1 → 127.0.0.1 TCP 84 51687 → 51686 [ACK] Seq=10334254 Ack=2 Win=525568 Len=0
1990408 274.618073 127.0.0.1 → 127.0.0.1 TCP 84 51687 → 51686 [RST, ACK] Seq=10334254 Ack=2 Win=0 Len=0
The RST looks unusual to me, especially since it's occurring after the connection should have been closed. The timestamp on the RST makes it look like is was sent 60 seconds after the connection was closed, but I'm still investigating whether or not that's accurate or just an issue with the logs I collected.
I'm going to try capture another network trace to see if the issue occurs after a FIN again, since that could be valuable info.
8.
We are seeing the exact same thing when running our applications on Linux. A temporary "workaround" for us has been to set the timeout to 100 seconds(it's not an acceptable work around).
When we have the timeout to something more reasonable, like 10 seconds or something, we get the error every 500 request or something like that.
The httpendpoint that we are calling are under our control and when the error happens, we can't even see that the request has reached the service. So it seems like the request doesn't leave the box.
We are using dotnet core 2.1.
I've started a stopwatch for each requests and then Im catching TaskCanceledException and logs the following(this is when we have the timeout set to 15 seconds):
Task was cancelled after 15000ms System.Threading.Tasks.TaskCanceledException:
The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation canceled. ---> System.Net.Sockets.SocketException: Operation canceled
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
at System.Net.Security.SslStreamInternal.g__InternalFillBufferAsync|38_0[TReadAdapter](TReadAdapter adap, ValueTask`1 task, Int32 min, Int32 initial)
9.
Thanks for the additional info. It's possible the behavior you're seeing is related, but it sounds more like an actual timeout. In this case there are two differences from what I've observed. (1) in all other cases I've seen the connection is actually sending the data, as observed via wireshark and (2) in other cases the failure occurs well before a timeout should have occurred.
There's an open issue right now (#20296) tracking the fact that we throw TaskCanceledException
on an actual request timeout. I think that's probably the actual root cause here.
If you think that your issue isn't actually being caused by timeouts though, feel free to open a new issue with all of the info you shared here.
10.
Interesting development. We have encountered this issue after adding to our test suite a test that spawns a process with powershell script (very simple one - writing 1 line to a file). ~30 seconds after this tests there is a failure in 1-3 random subsequent tests.
Because of that we thought that this might be related to processes inheriting handles from the parent one and it appears that we are right.
We have made a PR that unskipps those tests and switches the implementation of Process to the one that sets 'inherit handles' to 'false'. After doing so everything seems to be fine, it looks like the issue is with handle inheritance and the behavior of those handles after that child process ends.
The PR in question is here: ravendb/ravendb#8028
Our idea is that, at least as a workaround, it would be good to be able to set the
retVal = Interop.Kernel32.CreateProcess(
null, // we don't need this since all the info is in commandLine
commandLine, // pointer to the command line string
ref unused_SecAttrs, // address to process security attributes, we don't need to inherit the handle
ref unused_SecAttrs, // address to thread security attributes.
true, // handle inheritance flag
creationFlags, // creation flags
(IntPtr)environmentBlockPtr, // pointer to new environment block
workingDirectory, // pointer to current directory name
ref startupInfo, // pointer to STARTUPINFO
ref processInfo // pointer to PROCESS_INFORMATION
);
11.
That's very interesting -- thanks for sharing that info. I briefly looked into the possibility that this was related to socket handles a few weeks ago after observing similar behavior with the h2spec process spawned by the Kestrel tests. It's probably worth looking into in more detail if you're seeing the same behavior.
12.
Note that as of 3.0, socket handles are no longer inherited: #32903
13.
that would probably solve the issue, but:
- Can this be applied to 2.1?
or
- Can you add
private bool _inheritHandles = true
field toProcess.cs
that we would be able to set via reflection tofalse
at least?
14.
If that actually "fixes" it, I have trouble believing it's anything other than masking a symptom, and we should actually find the root cause and fix that.
Unless we believe the root cause really is that the socket handle is getting inherited and somehow that child process then closing is causing problems for the socket? In which case we should be able to repro that pretty easily, and if that's actually the problem, then I agree it would be a candidate for back porting.
15.
Fixing the root cause would be best, but till then could you consider applying point 2. so we can workaround this?
16.
Applying it where... as a servicing fix to 2.1? That's unlikely.
17.
Adding that private field in Process.cs
(probably Process.Windows.cs
) and using it instead of
would be a problem?
18.
The bar for servicing fixes is very high. You're suggesting adding unexposed, unvalidated, functionality-accessible-only-via-reflection-and-that-we-encourage-developers-to-use-via-reflection as a servicing fix. Hence "unlikely". If you need it immediately, you can of course make the change in your own local fork.
Let's get to the bottom of the problem and then figure out the right fix.
19.
I agree with @stephentoub.
I feel we are getting closer. @rmkerr invested already weeks of his time trying to get to the bottom of this (incl. extensive discussion with winsock team in Windows). Let's see if this new information can help root cause it and address the root cause.
If it doesn't help too much, we can think about either porting the change #32903 to 2.1, or doing something to enable people work around the suspectible root cause.
20.
I've been trying to build a repro that launches a process and then has several sockets sending and receiving data, but I have not been able to force the issue to occur. @ppekrol, would it be possible to set up a project with just the affected test and the code you're using to spawn the process?
If not I'll put some more time into the repro when I'm back in the office next week.
21.
Hi @rmkerr
Apologies for the delay, I really doubt that this would be doable. We already spend significant amount of hours to isolate the issue, create a smaller subset of tests, but for an unknown reason, it only appears for us on a larger set of tests e.g. our full suite. I do not have an idea what else could be done to isolate this. We tried moving that from SlowTests project (takes around 20 min to complete) to FastsTests (around 90 seconds), but it does not reproduce then.
22.
No problem @ppekrol. The info you shared is still valuable. I'm back in the office today, and will spend some time trying to cut down the size of the repro.
23.
I'm able to reproduce the behavior described by @ppekrol with a modified version of the kestrel test suite.
To start, I disabled all of the tests except for the H2Spec tests (which are launches as a separate process) andConnectionNotClosedWhenClientSatisfiesMinimumDataRateGivenLargeResponseHeaders
, which was the test that I observed failing the most frequently. In order to reproduce the issue more consistently, I set up the ConnectionNotClosed...
test to run five times.
I was able to further minimize the repro by replacing the H2Spec tests with the following code:
[Fact]
public void LaunchProcess()
{
for (int i = 0; i < 100; ++i)
{
//Thread.Sleep(100);
Process.Start("Powershell.exe", @"""C:\Tools\HelloWorld.ps1""").WaitForExit();
}
}
The targeted script just prints "hello world" to the console.
When running the test suites after this modification, I was able to reproduce the failure on 6 out of 10 runs. After replacing Process.Start
with Thread.Sleep
(as commented out above), the tests passed on 10 out of 10 runs.
Obviously this is pretty rough testing, and there isn't a lot of data. But I do think the behavior is interesting.
HttpClient exception:ExceptionType:System.Threading.Tasks.TaskCanceledException: The operation was canceled. ---> System.IO.IOException: Unable to read data from the transport connection: Operation ca的更多相关文章
- .Net多线程编程—System.Threading.Tasks.Parallel
System.Threading.Tasks.Parallel类提供了Parallel.Invoke,Parallel.For,Parallel.ForEach这三个静态方法. 1 Parallel. ...
- 转载 Net多线程编程—System.Threading.Tasks.Parallel
.Net多线程编程—System.Threading.Tasks.Parallel System.Threading.Tasks.Parallel类提供了Parallel.Invoke,Paral ...
- System.Threading.Tasks.Task 引起的 IIS 应用池崩溃
接口服务运行一段时间后,IIS应用池就会突然挂掉,事件查看日志,会有事件日志Event ID为5011的错误 为应用程序池“PokeIn”提供服务的进程在与 Windows Process Activ ...
- System.Threading.Tasks.Task引起的IIS应用程序池崩溃
问题现象 IIS应用程序池崩溃(Crash)的特征如下: 1. 从客户端看,浏览器一直处于连接状态,Web服务器无响应. 2. 从服务器端看(Windows Server 2008 + IIS 7.0 ...
- 一、并行编程 - 数据并行 System.Threading.Tasks.Parallel 类
一.并行概念 1.并行编程 在.NET 4中的并行编程是依赖Task Parallel Library(后面简称为TPL) 实现的.在TPL中,最基本的执行单元是task(中文可以理解为"任 ...
- System.Threading.Tasks.Task 任务引起的IIS应用程序池崩溃
转载:http://www.cnblogs.com/aaa6818162/p/4421305.html 问题现象 IIS应用程序池崩溃(Crash)的特征如下: 1. 从客户端看,浏览器一直处于连接状 ...
- using System.Threading.Tasks;
using System.Threading.Tasks; .Net并行库介绍——Task1
- vs2013c#测试using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; namespace ConsoleApplication1_CXY { class Program { stati
首先安装Unit Test Generator.方法为:工具->扩展和更新->联机->搜索“图标为装有蓝色液体的小试管.Unit Test Generator”, 编写代码,生成一个 ...
- System.Threading.Tasks并发和异步代码使用
main.cs System.Threading.Tasks.Parallel.For(0, 10, i => { TestLock test ...
随机推荐
- vue解决前后端跨域问题
1/在config中index.js中 找到proxyTable在里面添加如下代码 proxyTable: { '/api': { target: 'https://api.douban.com/v2 ...
- mysql本地安装
1.下载地址: https://downloads.mysql.com/archives/community/ 2.安装: 解压目录:D:\mysql\mysql-5.6.36-winx64 2.1. ...
- Uva 12009 平方数尾数与自身同样 dfs 构造
版权声明:本文为博主原创文章,未经博主同意不得转载. https://blog.csdn.net/qq574857122/article/details/25166611 题目链接:点击打开链接 题意 ...
- 连接H3C交换机的Console口连不上
一.故障:使用Console线连接交换机的Console口连不上. 处理步骤: 1.先把Console线连接好,一端连接交换机的Console口,另一端连接电脑的USB口. 2.安装Console线驱 ...
- 学习MySQL过程中的随笔二
MySQL深入学习: 视图 使用视图的目的:多次使用同一张临时表(或者已经存在的表) 视图只是一个虚拟表,其本质为[根据SQL语句获取动态的数据集,并为其命名],用户只需使用别名即可获得实时的 ...
- MATLAB程序控制语句
Matlab中的程序控制语句包括条件控制(Conditional Control).循环(Loop Control).异常处理(Error Control)和程序返回(Program Terminat ...
- typeof 踩坑总结
typeof '123' == 'string' // true typeof ('123' == 'string' ) // "boolean" typeof 123 == N ...
- 在eclipse中的交叉编译
1.硬件是Arm 9的板子,运行的系统是Ubuntu 12.05 2.电脑虚拟机上安装的系统是Ubuntu 10.04,程序的开发都是在虚拟机上完成,开发IDE是eclipse 3.eclipse 默 ...
- django中static的坑
在django搭建网络平台的时候免不了要使用到static来保存静态文件, 在static文件夹里包含两个文件:css和js文件,如果使用不当就会出现很多问题 第一个坑:配置文件settings.py ...
- Linux运维跳槽40道面试精华题
Linux运维跳槽40道面试精华题 运维派 3天前 1.什么是运维?什么是游戏运维? 1)运维是指大型组织已经建立好的网络软硬件的维护,就是要保证业务的上线与运作的正常,在他运转的过程中,对他进行维护 ...