【转载】一次生产环境的NOHTTPRESPONSEEXCEPTION异常的排查记录

https://www.freesion.com/article/41531004212/

环境：

jdk1.8+tomcat8+httpclient4.5.2

主要现象：

项目偶发出现org.apache.http.NoHttpResponseException: The target server failed to respond异常

定位原因：

查阅资料，此异常属于长连接keep-Alive的一种异常现象。当服务端某连接闲置超过keep-Alive超时时间后，服务端会关闭连接，进行四次挥手。如果此时，客户端再次拿此连接来访问服务端就会报NoHttpResponseException错误。

解决过程：

既然已经知道错误导致的原因，就可对症下药。主要解决思路有两种：

方案一：延长务端keep-Alive超时时间，拿tomcat举例，可以配置Connector 元素中的keepAliveTimeout参数；

方案二：降低客户端的keep-Alive时间，在服务端关闭闲置连接前关闭客户端连接。

方案一只能优化问题，但是并不能解决问题。因为keep-Alive超时时间不能设置为-1（永久），如果设置一直保持连接会极大的影响到服务端性能。

下面主要说一下方案二的解决方案，以httpClient4.5.2版本为例：

先贴最终的代码：

SSLContext sslcontext = SslUtils.createIgnoreVerifySSL();
//设置协议http和https对应的处理socket链接工厂的对象
Registry<ConnectionSocketFactory> socketFactoryRegistry = RegistryBuilder.<ConnectionSocketFactory>create()
.register("http", PlainConnectionSocketFactory.INSTANCE)
.register("https", new SSLConnectionSocketFactory(sslcontext))
.build();
ConnectionKeepAliveStrategy connectionKeepAliveStrategy = (final HttpResponse response, final HttpContext context) -> {
Args.notNull(response, "HTTP response");
final HeaderElementIterator it = new BasicHeaderElementIterator(
response.headerIterator(HTTP.CONN_KEEP_ALIVE));
while (it.hasNext()) {
final HeaderElement he = it.nextElement();
final String param = he.getName();
final String value = he.getValue();
if (value != null && param.equalsIgnoreCase("timeout")) {
try {
return Long.parseLong(value) * 1000;
} catch (final NumberFormatException ignore) {
}
}
}
// keep alive 3秒客户端维护这个连接最多3秒的有效期在获取环节超过3秒就会关闭此连接org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking() entry.isExpired(System.currentTimeMillis())
return 3 * 1000;
};
PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager(socketFactoryRegistry);
connManager.setDefaultMaxPerRoute(10);
connManager.setMaxTotal(100);
//获取连接后再次校验是否空闲超时org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking() entry.getUpdated() + this.validateAfterInactivity <= System.currentTimeMillis()
connManager.setValidateAfterInactivity(3000);
//evictIdleConnections 超时之前定期回收空闲连接并发setMaxConnPerRoute=10 最多setMaxConnTotal=100个;注意，evictIdleConnections会在启动时线程sleep一个maxIdle时间
//创建自定义的httpclient对象
CloseableHttpClient client = HttpClients.custom()
// 注意：HttpClients的setDefaultMaxPerRoute和setMaxTotal不会覆盖connManager的值
.setConnectionManager(connManager)
.setConnectionManagerShared(false).evictIdleConnections(3000, TimeUnit.MILLISECONDS)
.setKeepAliveStrategy(connectionKeepAliveStrategy)
// 接口幂等允许重试注释掉disableAutomaticRetries 默认重试3次会从连接池中获取不会直接创建新的连接
.disableAutomaticRetries()
.build();

主要配置参数说明：

org.apache.http.impl.conn.PoolingHttpClientConnectionManager#setValidateAfterInactivity
org.apache.http.impl.client.HttpClientBuilder#setConnectionManagerShared
org.apache.http.impl.client.HttpClientBuilder#evictIdleConnections(long, java.util.concurrent.TimeUnit)
org.apache.http.impl.client.HttpClientBuilder#setKeepAliveStrategy
org.apache.http.impl.client.HttpClientBuilder#disableAutomaticRetries

ORG.APACHE.HTTP.IMPL.CONN.POOLINGHTTPCLIENTCONNECTIONMANAGER#SETVALIDATEAFTERINACTIVITY

从连接池中获取到空闲连接后，在使用之前校验空闲时间是否超过指定的时间，单位毫秒；注意，如果你像楼主一样，使用了

PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager(socketFactoryRegistry);

这块代码，那么请注意此处会把时间设置为2000ms.（ps:楼主在本地环境一直复现不了NoHttpResponseException的罪魁祸首）

方法路径：

org.apache.http.impl.conn.PoolingHttpClientConnectionManager#PoolingHttpClientConnectionManager(org.apache.http.conn.HttpClientConnectionOperator, org.apache.http.conn.HttpConnectionFactory<org.apache.http.conn.routing.HttpRoute,org.apache.http.conn.ManagedHttpClientConnection>, long, java.util.concurrent.TimeUnit)

逻辑上只要配置此处，即可保证连接在超时后关闭并重新从池子中获取（如果还超时，继续关闭连接并重新拿），无论哪一种配置，一定要配置此处，否则都会安装默认2秒过期时间来回收连接。感兴趣的可以看下源码：

private E getPoolEntryBlocking(
final T route, final Object state,
final long timeout, final TimeUnit tunit,
final PoolEntryFuture<E> future)
throws IOException, InterruptedException, TimeoutException {
Date deadline = null;
if (timeout > 0) {
deadline = new Date
(System.currentTimeMillis() + tunit.toMillis(timeout));
}
this.lock.lock();
try {
final RouteSpecificPool<T, C, E> pool = getPool(route);
E entry = null;
while (entry == null) {
Asserts.check(!this.isShutDown, "Connection pool shut down");
for (;;) {
entry = pool.getFree(state);
if (entry == null) {
break;
}
if (entry.isExpired(System.currentTimeMillis())) {
entry.close();
} else if (this.validateAfterInactivity > 0) {
//看这里连接最后修改时间+超时时间是否小于now
if (entry.getUpdated() + this.validateAfterInactivity <= System.currentTimeMillis()) {
if (!validate(entry)) {
entry.close();
}
}
}
if (entry.isClosed()) {
this.available.remove(entry);
pool.free(entry, false);
} else {
break;
}
}
if (entry != null) {
this.available.remove(entry);
this.leased.add(entry);
onReuse(entry);
return entry;
}
// New connection is needed
final int maxPerRoute = getMax(route);
// Shrink the pool prior to allocating a new connection
final int excess = Math.max(0, pool.getAllocatedCount() + 1 - maxPerRoute);
if (excess > 0) {
for (int i = 0; i < excess; i++) {
final E lastUsed = pool.getLastUsed();
if (lastUsed == null) {
break;
}
lastUsed.close();
this.available.remove(lastUsed);
pool.remove(lastUsed);
}
}
if (pool.getAllocatedCount() < maxPerRoute) {
final int totalUsed = this.leased.size();
final int freeCapacity = Math.max(this.maxTotal - totalUsed, 0);
if (freeCapacity > 0) {
final int totalAvailable = this.available.size();
if (totalAvailable > freeCapacity - 1) {
if (!this.available.isEmpty()) {
final E lastUsed = this.available.removeLast();
lastUsed.close();
final RouteSpecificPool<T, C, E> otherpool = getPool(lastUsed.getRoute());
otherpool.remove(lastUsed);
}
}
final C conn = this.connFactory.create(route);
entry = pool.add(conn);
this.leased.add(entry);
return entry;
}
}
boolean success = false;
try {
pool.queue(future);
this.pending.add(future);
success = future.await(deadline);
} finally {
// In case of 'success', we were woken up by the
// connection pool and should now have a connection
// waiting for us, or else we're shutting down.
// Just continue in the loop, both cases are checked.
pool.unqueue(future);
this.pending.remove(future);
}
// check for spurious wakeup vs. timeout
if (!success && (deadline != null) &&
(deadline.getTime() <= System.currentTimeMillis())) {
break;
}
}
throw new TimeoutException("Timeout waiting for connection");
} finally {
this.lock.unlock();
}
}

方法路径：

org.apache.http.pool.AbstractConnPool#getPoolEntryBlocking

ORG.APACHE.HTTP.IMPL.CLIENT.HTTPCLIENTBUILDER#SETCONNECTIONMANAGERSHARED和ORG.APACHE.HTTP.IMPL.CLIENT.HTTPCLIENTBUILDER#EVICTIDLECONNECTIONS(LONG, JAVA.UTIL.CONCURRENT.TIMEUNIT)

启动异步定时线程，关闭回收指定超时时间的空闲连接。如果在获取空闲连接前已经回收就没问题了，但是极端情况下也会出现NoHttpResponseException问题。比如：keep-Alive超时时间是20秒，然后定时配置15秒，假设第一次使用连接并释放时间为x，定时上次结束时间为y，y+15<x+20，也就是定时下次处理时，连接空闲时间还没有超过20秒，那么此处定时不会回收此连接，但是如果5秒后获取这个连接使用，肯定会报NoHttpResponseException异常。

evictIdleConnections需要配合setConnectionManagerShared=false使用，ConnectionManagerShared默认false。关键代码如下：

if (!this.connManagerShared) {
if (closeablesCopy == null) {
closeablesCopy = new ArrayList<Closeable>(1);
}
final HttpClientConnectionManager cm = connManagerCopy;
if (evictExpiredConnections || evictIdleConnections) {
final IdleConnectionEvictor connectionEvictor = new IdleConnectionEvictor(cm,
maxIdleTime > 0 ? maxIdleTime : 10, maxIdleTimeUnit != null ? maxIdleTimeUnit : TimeUnit.SECONDS);
closeablesCopy.add(new Closeable() {
@Override
public void close() throws IOException {
connectionEvictor.shutdown();
}
});
connectionEvictor.start();
}
closeablesCopy.add(new Closeable() {
@Override
public void close() throws IOException {
cm.shutdown();
}
});
}

注意：evictIdleConnections会在启动时，线程sleep一个maxIdle时间。源码如下：

public IdleConnectionEvictor(
final HttpClientConnectionManager connectionManager,
final ThreadFactory threadFactory,
final long sleepTime, final TimeUnit sleepTimeUnit,
final long maxIdleTime, final TimeUnit maxIdleTimeUnit) {
this.connectionManager = Args.notNull(connectionManager, "Connection manager");
this.threadFactory = threadFactory != null ? threadFactory : new DefaultThreadFactory();
this.sleepTimeMs = sleepTimeUnit != null ? sleepTimeUnit.toMillis(sleepTime) : sleepTime;
this.maxIdleTimeMs = maxIdleTimeUnit != null ? maxIdleTimeUnit.toMillis(maxIdleTime) : maxIdleTime;
this.thread = this.threadFactory.newThread(new Runnable() {
@Override
public void run() {
try {
while (!Thread.currentThread().isInterrupted()) {
//此处休眠一个sleepTimeMs时间可追溯代码发现sleepTimeMs来源于maxIdleTime
Thread.sleep(sleepTimeMs);
connectionManager.closeExpiredConnections();
if (maxIdleTimeMs > 0) {
connectionManager.closeIdleConnections(maxIdleTimeMs, TimeUnit.MILLISECONDS);
}
}
} catch (final Exception ex) {
exception = ex;
}
}
});
}

方法路径：

org.apache.http.impl.client.IdleConnectionEvictor#IdleConnectionEvictor(org.apache.http.conn.HttpClientConnectionManager, java.util.concurrent.ThreadFactory, long, java.util.concurrent.TimeUnit, long, java.util.concurrent.TimeUnit)

ORG.APACHE.HTTP.IMPL.CLIENT.HTTPCLIENTBUILDER#SETKEEPALIVESTRATEGY

此方法是设置客户端连接池维护的连接的keep-Alive时间。如果连接空闲时间超过设置的时间，则会关闭此连接并重新获取。主要相关源码如下：在初始化线程池时设置了过期时间expiry是创建时间+keep-Alive时间，已经过期时间在updateExpiry（连接池回收会调接方法）中被修改成最新时间。

//方法路径：org.apache.http.pool.PoolEntry#PoolEntry(java.lang.String, T, C, long, java.util.concurrent.TimeUnit) 此处的timeToLive 就是设置的keep-Alive时间
public PoolEntry(final String id, final T route, final C conn,
final long timeToLive, final TimeUnit tunit) {
super();
Args.notNull(route, "Route");
Args.notNull(conn, "Connection");
Args.notNull(tunit, "Time unit");
this.id = id;
this.route = route;
this.conn = conn;
this.created = System.currentTimeMillis();
if (timeToLive > 0) {
this.validityDeadline = this.created + tunit.toMillis(timeToLive);
} else {
this.validityDeadline = Long.MAX_VALUE;
}
this.expiry = this.validityDeadline;
}
//方法路径：org.apache.http.pool.PoolEntry#updateExpiry
public synchronized void updateExpiry(final long time, final TimeUnit tunit) {
Args.notNull(tunit, "Time unit");
this.updated = System.currentTimeMillis();
final long newExpiry;
if (time > 0) {
newExpiry = this.updated + tunit.toMillis(time);
} else {
newExpiry = Long.MAX_VALUE;
}
this.expiry = Math.min(newExpiry, this.validityDeadline);
}

以上配置不是互斥也不少都需要配置，楼主亲自验证发现，只配置setValidateAfterInactivity或只配置setKeepAliveStrategy都可以。evictIdleConnections极端情况会有问题。

ORG.APACHE.HTTP.IMPL.CLIENT.HTTPCLIENTBUILDER#DISABLEAUTOMATICRETRIES

随便一提，httpclient默认会重试3次。如果接口不支持幂等，请注意不要使用重试。

OK，为了解决个问题，把源码看了一遍，特写博客以备以后注意使用。