ThreadPoolExecutor使用错误导致死锁
背景
- 10月2号凌晨12:08收到报警,所有请求失败,处于完全不可用状态
- 应用服务器共四台resin,resin之前由四台nginx做负载均衡
服务器现象及故障恢复步骤
- 登入服务器,观察resin进程,初看无任何异常,且占用资源正常,有非业务逻辑相关(一些schedule task)的日志输出,但无业务逻辑相关的日志。
- 表明resin服务器没有在处理(新的)用户的请求
- 重启resin,并观察日志,发现resin开始处理业务,基本恢复
- 表明重启可以解决问题
- 继续依次重启剩余的三台resin,并在重启最后一台resin之前取jstack以供分析故障原因
- jstack可以较好的反映进程正在进行的所有逻辑,可以有效的帮助定位问题,并且耗时较少通常5s内便可完成
- 更全面的获取进程信息的方法是取heap dump, 但耗时相对长一些,且分析时没有jstack直观(当然,由dump也可以得到jstack)
- 若在进程有问题时直接重启而不取jstack很可能会丧失定位问题的唯一时机 —— 此时已经基本可认定原因在resin进程内部,因此更有必要取jstack
- 在重启第四台resin时才取jstack的原因是应该尽快恢复线上服务可用,因此重启前三台resin之前不应该做浪费时间的工作。但它们正常工作以后,线上的请求压力可以先由它们来负荷,因此可以对第四台进行一些快速并重要的操作后再重启
- 观察,服务基本恢复
- 看jstack文件,试图找到线索
- 突然又收到报警,发现四台resin服务又恢复原来的故障状态,所有请求均失败。于是再次重启前两台resin
- 同时到nginx处观察access log,发现所有请求均为502 bad gateway或timeout
- 表明是nginx与resin之间的请求转发有些问题,猜测像是请求没发送给resin或是resin直接拒绝执行
- 未确定具体原因,于是尝试直接依次重启四台nginx, 并再重启剩余的resin
- 服务再次恢复,观察一小时多后也不再出现问题
- 至此,认为故障已经恢复
疑问
- resin进程恢复后为何又会再次故障?
- 最后一次上线是9月29号下午,如果是业务实现的问题,为何这之间一直没出问题?
- 重启nginx究竟影响了什么?难道nginx与resin内部会存在什么神奇的逻辑关联?
定位分析
对服务异常时取的jstack进行分析(在机群上用我之前写的一个stack分析脚本stackAnalysis,在命令行直接执行stackAnalysis即可), 发现resin的业务线程部分的summary如下:
name=resin-port-*****-*****
threadsNum=257
hasUserThread=true
blockNum=0
execNum=0
-------------------------------------------- threads groups with different state, size=4, start --------------------------------------------
219 threads at (state = WAITING,
locks_waiting = [0x00000006ed230e60, 0x00000006f0b19db0, ..., 0x00000006ec643d40, 0x00000006ed09ca38, 0x00000006eb0e8600, 0x00000006f1cf2118]) :
"resin-port-*-*" daemon prio=* tid=******** nid=******** waiting on condition [********]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <********> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.xxxxxx.productfront.service.HomeService.getFilteredUserId(HomeService.java:933)
at com.xxxxxx.productfront.service.HomeService.requestLatestTrends(HomeService.java:513)
at com.xxxxxx.productfront.service.HomeService.getNewHomeDataFlow(HomeService.java:140)
at com.xxxxxx.productfront.web.restful.HomeApiController.newInfo(HomeApiController.java:204)
at com.xxxxxx.productfront.web.restful.HomeApiController$$FastClassByCGLIB$$5502b6f1.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:93)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.adapter.MethodBeforeAdviceInterceptor.invoke(MethodBeforeAdviceInterceptor.java:50)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:622)
at com.xxxxxx.productfront.web.restful.HomeApiController$$EnhancerByCGLIB$$80fdc5c.newInfo(<generated>)
at sun.reflect.GeneratedMethodAccessor1384.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:159)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:97)
at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:109)
at com.xxxxxx.productfront.web.filter.FlagFilter.doFilter(FlagFilter.java:56)
at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:89)
at com.xxxxxx.productfront.web.filter.UserInfoFilter.doFilter(UserInfoFilter.java:37)
at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:89)
at com.caucho.server.webapp.WebAppListenerFilterChain.doFilter(WebAppListenerFilterChain.java:114)
at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:156)
at com.caucho.server.webapp.AccessLogFilterChain.doFilter(AccessLogFilterChain.java:95)
at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:289)
at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:838)
at com.caucho.network.listen.TcpSocketLink.dispatchRequest(TcpSocketLink.java:1341)
at com.caucho.network.listen.TcpSocketLink.handleRequest(TcpSocketLink.java:1297)
at com.caucho.network.listen.TcpSocketLink.handleRequestsImpl(TcpSocketLink.java:1281)
at com.caucho.network.listen.TcpSocketLink.handleRequests(TcpSocketLink.java:1189)
at com.caucho.network.listen.TcpSocketLink.handleAcceptTaskImpl(TcpSocketLink.java:985)
at com.caucho.network.listen.ConnectionTask.runThread(ConnectionTask.java:117)
at com.caucho.network.listen.ConnectionTask.run(ConnectionTask.java:93)
at com.caucho.network.listen.SocketLinkThreadLauncher.handleTasks(SocketLinkThreadLauncher.java:169)
at com.caucho.network.listen.TcpSocketAcceptThread.run(TcpSocketAcceptThread.java:61)
at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173)
at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118) 24 threads at (state = WAITING,
locks_waiting = [0x00000006e56c9970, 0x00000006e56c6fd8, 0x00000006e28b14a8, 0x00000006e61dc9a0, 0x00000006e620e9c0, 0x00000006e3bf0648, 0x00000006e56d39e0, 0x00000006e6a051a0, 0x00000006e29fd0f0, 0x00000006e3b03728, 0x00000006e613b090, 0x00000006e5a67fc8, 0x00000006e7de7d90, 0x00000006e693a338, 0x00000006e46fc7c0, 0x00000006e28f41a0, 0x00000006e614a990, 0x00000006e61203d0, 0x00000006e45bd5f8, 0x00000006e5a65e90, 0x00000006e611feb0, 0x00000006e51d3b80, 0x00000006e4f23348, 0x00000006e692f150]) :
"resin-port--" daemon prio=* tid=******** nid=******** waiting on condition []
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.xxxxxx.productfront.service.HomeService.getNewHomeDataFlow(HomeService.java:268)
at com.xxxxxx.productfront.web.restful.HomeApiController.newInfo(HomeApiController.java:204)
at com.xxxxxx.productfront.web.restful.HomeApiController$$FastClassByCGLIB$$5502b6f1.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:93)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.adapter.MethodBeforeAdviceInterceptor.invoke(MethodBeforeAdviceInterceptor.java:50)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:622)
at com.xxxxxx.productfront.web.restful.HomeApiController$$EnhancerByCGLIB$$80fdc5c.newInfo(<generated>)
at sun.reflect.GeneratedMethodAccessor1384.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:159)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:97)
at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:109)
at com.xxxxxx.productfront.web.filter.FlagFilter.doFilter(FlagFilter.java:56)
at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:89)
at com.xxxxxx.productfront.web.filter.UserInfoFilter.doFilter(UserInfoFilter.java:37)
at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:89)
at com.caucho.server.webapp.WebAppListenerFilterChain.doFilter(WebAppListenerFilterChain.java:114)
at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:156)
at com.caucho.server.webapp.AccessLogFilterChain.doFilter(AccessLogFilterChain.java:95)
at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:289)
at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:838)
at com.caucho.network.listen.TcpSocketLink.dispatchRequest(TcpSocketLink.java:1341)
at com.caucho.network.listen.TcpSocketLink.handleRequest(TcpSocketLink.java:1297)
at com.caucho.network.listen.TcpSocketLink.handleRequestsImpl(TcpSocketLink.java:1281)
at com.caucho.network.listen.TcpSocketLink.handleRequests(TcpSocketLink.java:1189)
at com.caucho.network.listen.TcpSocketLink.handleAcceptTaskImpl(TcpSocketLink.java:985)
at com.caucho.network.listen.ConnectionTask.runThread(ConnectionTask.java:117)
at com.caucho.network.listen.ConnectionTask.run(ConnectionTask.java:93)
at com.caucho.network.listen.SocketLinkThreadLauncher.handleTasks(SocketLinkThreadLauncher.java:169)
at com.caucho.network.listen.TcpSocketAcceptThread.run(TcpSocketAcceptThread.java:61)
at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173)
at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118) 13 threads at (state = WAITING,
locks_waiting = [0x00000006eba80d68, 0x00000006eba87ff0, 0x00000006eb013438, 0x00000006eba88088, 0x00000006eb9ea590, 0x00000006eb851ea8, 0x00000006eb851f40, 0x00000006eaecb790, 0x00000006eb851d20, 0x00000006eaecb190, 0x00000006eba8ea88, 0x00000006eb8b3b58, 0x00000006eb8b3a18]) :
"resin-port--" daemon prio=* tid=******** nid=******** waiting on condition []
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.xxxxxx.productfront.service.HomeService.getFilteredUserId(HomeService.java:933)
at com.xxxxxx.productfront.service.HomeService.requestLatestTrends(HomeService.java:513)
at com.xxxxxx.productfront.service.HomeService.getNewHomeDataFlow(HomeService.java:140)
at com.xxxxxx.productfront.web.restful.HomeApiController.newInfo(HomeApiController.java:204)
at com.xxxxxx.productfront.web.restful.HomeApiController$$FastClassByCGLIB$$5502b6f1.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:93)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.adapter.MethodBeforeAdviceInterceptor.invoke(MethodBeforeAdviceInterceptor.java:50)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:622)
at com.xxxxxx.productfront.web.restful.HomeApiController$$EnhancerByCGLIB$$80fdc5c.newInfo(<generated>)
at sun.reflect.GeneratedMethodAccessor1384.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:159)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:97)
at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:109)
at com.xxxxxx.productfront.web.filter.FlagFilter.doFilter(FlagFilter.java:56)
at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:89)
at com.xxxxxx.productfront.web.filter.UserInfoFilter.doFilter(UserInfoFilter.java:37)
at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:89)
at com.caucho.server.webapp.WebAppListenerFilterChain.doFilter(WebAppListenerFilterChain.java:114)
at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:156)
at com.caucho.server.webapp.AccessLogFilterChain.doFilter(AccessLogFilterChain.java:95)
at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:289)
at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:838)
at com.caucho.network.listen.TcpSocketLink.dispatchRequest(TcpSocketLink.java:1341)
at com.caucho.network.listen.TcpSocketLink.handleRequest(TcpSocketLink.java:1297)
at com.caucho.network.listen.TcpSocketLink.handleRequestsImpl(TcpSocketLink.java:1281)
at com.caucho.network.listen.TcpSocketLink.handleRequests(TcpSocketLink.java:1189)
at com.caucho.network.listen.TcpSocketLink.handleAcceptTaskImpl(TcpSocketLink.java:988)
at com.caucho.network.listen.ConnectionTask.runThread(ConnectionTask.java:117)
at com.caucho.network.listen.ConnectionTask.run(ConnectionTask.java:93)
at com.caucho.network.listen.SocketLinkThreadLauncher.handleTasks(SocketLinkThreadLauncher.java:169)
at com.caucho.network.listen.TcpSocketAcceptThread.run(TcpSocketAcceptThread.java:61)
at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173)
at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118) 1 threads at (state = WAITING,
locks_waiting = [0x00000006e2b747e0]) :
"resin-port--" daemon prio=* tid=******** nid=******** waiting on condition []
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.xxxxxx.productfront.service.HomeService.getNewHomeDataFlow(HomeService.java:270)
at com.xxxxxx.productfront.web.restful.HomeApiController.newInfo(HomeApiController.java:204)
at com.xxxxxx.productfront.web.restful.HomeApiController$$FastClassByCGLIB$$5502b6f1.invoke(<generated>)
at net.sf.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation.invokeJoinpoint(Cglib2AopProxy.java:689)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
at org.springframework.validation.beanvalidation.MethodValidationInterceptor.invoke(MethodValidationInterceptor.java:93)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.adapter.MethodBeforeAdviceInterceptor.invoke(MethodBeforeAdviceInterceptor.java:50)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
at org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor.intercept(Cglib2AopProxy.java:622)
at com.xxxxxx.productfront.web.restful.HomeApiController$$EnhancerByCGLIB$$80fdc5c.newInfo(<generated>)
at sun.reflect.GeneratedMethodAccessor1384.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:159)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:97)
at com.caucho.server.dispatch.ServletFilterChain.doFilter(ServletFilterChain.java:109)
at com.xxxxxx.productfront.web.filter.FlagFilter.doFilter(FlagFilter.java:56)
at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:89)
at com.xxxxxx.productfront.web.filter.UserInfoFilter.doFilter(UserInfoFilter.java:37)
at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:89)
at com.caucho.server.webapp.WebAppListenerFilterChain.doFilter(WebAppListenerFilterChain.java:114)
at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:156)
at com.caucho.server.webapp.AccessLogFilterChain.doFilter(AccessLogFilterChain.java:95)
at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:289)
at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:838)
at com.caucho.network.listen.TcpSocketLink.dispatchRequest(TcpSocketLink.java:1341)
at com.caucho.network.listen.TcpSocketLink.handleRequest(TcpSocketLink.java:1297)
at com.caucho.network.listen.TcpSocketLink.handleRequestsImpl(TcpSocketLink.java:1281)
at com.caucho.network.listen.TcpSocketLink.handleRequests(TcpSocketLink.java:1189)
at com.caucho.network.listen.TcpSocketLink.handleAcceptTaskImpl(TcpSocketLink.java:985)
at com.caucho.network.listen.ConnectionTask.runThread(ConnectionTask.java:117)
at com.caucho.network.listen.ConnectionTask.run(ConnectionTask.java:93)
at com.caucho.network.listen.SocketLinkThreadLauncher.handleTasks(SocketLinkThreadLauncher.java:169)
at com.caucho.network.listen.TcpSocketAcceptThread.run(TcpSocketAcceptThread.java:61)
at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173)
at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118) ----------------------------------------------- threads groups with different state, size=4, end -------------------------------------------
查看resin的配置,可以知道我们配置的最大业务线程数为256:
\#Throttle the number of active threads for a port
port_thread_max : 256
accept_thread_max : 32
accept_thread_min : 4
同时从上面的stack summary结果中可以看到有257个业务逻辑的线程(以resin-port-开头的线程组)处于waiting状态,说明此时'''所有业务线程均在waiting状态,从而新的业务请求自然会被resin拒绝,造成服务不可用'''
从stack上可以看到所有waiting状态的业务线程都是客户端的首页接口【 at com.xxxxxx.productfront.web.restful.HomeApiController.newInfo(HomeApiController.java:204)
】,并且都是在对一个Future任务进行FutureTask.get()操作,同时由栈顶可以看出改这些线程都是在对一个队列进行操作时无法继续而使得线程waiting的:- parking to wait for <********> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
新的疑问产生了:这257个线程究竟在waiting什么?怎么可以这样占着xx不xx!!!
查看业务代码HomeApiController.java相关逻辑发现该业务逻辑(获取客户端首页数据,包括附近的人,推荐的人,新的动态等)为了提升性能,将没有耦合的逻辑用future模式来处理以实现并行执行。同时为了节省线程创建销毁的开销,类中new了一个static的ThreadPoolExecutor用于接受各类futureTask的执行:
protected static final ThreadPoolExecutor executor = new ThreadPoolExecutor(20, 100, 2, TimeUnit.MINUTES, new LinkedBlockingQueue<Runnable>(),
new ThreadFactory() {
private AtomicInteger id = new AtomicInteger(0);@Override
public Thread newThread(Runnable r) {
Thread thread = new Thread(r);
thread.setName("home-service-" + id.addAndGet(1));
return thread;
}
}, new ThreadPoolExecutor.CallerRunsPolicy());
且其构造函数相关含义为:
ThreadPoolExecutor
public ThreadPoolExecutor(int corePoolSize,
int maximumPoolSize,
long keepAliveTime,
TimeUnit unit,
BlockingQueue<Runnable> workQueue,
ThreadFactory threadFactory,
RejectedExecutionHandler handler)
Creates a new ThreadPoolExecutor with the given initial parameters.
Parameters:
corePoolSize - the number of threads to keep in the pool, even if they are idle.
maximumPoolSize - the maximum number of threads to allow in the pool.
keepAliveTime - when the number of threads is greater than the core, this is the maximum time that excess idle threads will wait for new tasks before terminating.
unit - the time unit for the keepAliveTime argument.
workQueue - the queue to use for holding tasks before they are executed. This queue will hold only the Runnable tasks submitted by the execute method.
threadFactory - the factory to use when the executor creates a new thread.
handler - the handler to use when execution is blocked because the thread bounds and queue capacities are reached.
Throws:
IllegalArgumentException - if corePoolSize or keepAliveTime less than zero, or if maximumPoolSize less than or equal to zero, or if corePoolSize greater than maximumPoolSize.
NullPointerException - if workQueue or threadFactory or handler are null.
因为jstack有不少线程与队列有关,为明确其构造函数中workQueue的使用,查看ThreadPoolExecutor的java doc:
Queuing
Any BlockingQueue may be used to transfer and hold submitted tasks. The use of this queue interacts with pool sizing:
* If fewer than corePoolSize threads are running, the Executor always prefers adding a new thread rather than queuing.
* If corePoolSize or more threads are running, the Executor always prefers queuing a request rather than adding a new thread.
* If a request cannot be queued, a new thread is created unless this would exceed maximumPoolSize, in which case, the task will be rejected.
There are three general strategies for queuing:
Direct handoffs.
A good default choice for a work queue is a SynchronousQueue that hands off tasks to threads without otherwise holding them.
Here, an attempt to queue a task will fail if no threads are immediately available to run it, so a new thread will be constructed.
This policy avoids lockups when handling sets of requests that might have internal dependencies.
Direct handoffs generally require unbounded maximumPoolSizes to avoid rejection of new submitted tasks.
This in turn admits the possibility of unbounded thread growth when commands continue to arrive on average faster than they can be processed.
Unbounded queues.
Using an unbounded queue (for example a LinkedBlockingQueue without a predefined capacity) will cause new tasks to wait in the queue when all corePoolSize threads are busy.
Thus, no more than corePoolSize threads will ever be created. (And the value of the maximumPoolSize therefore doesn't have any effect.)
This may be appropriate when each task is completely independent of others, so tasks cannot affect each others execution; for example, in a web page server.
While this style of queuing can be useful in smoothing out transient bursts of requests, it admits the possibility of unbounded work queue growth when commands continue to arrive on average faster than they can be processed.
Bounded queues.
A bounded queue (for example, an ArrayBlockingQueue) helps prevent resource exhaustion when used with finite maximumPoolSizes, but can be more difficult to tune and control.
Queue sizes and maximum pool sizes may be traded off for each other: Using large queues and small pools minimizes CPU usage, OS resources, and context-switching overhead, but can lead to artificially low throughput.
If tasks frequently block (for example if they are I/O bound), a system may be able to schedule time for more threads than you otherwise allow.
Use of small queues generally requires larger pool sizes, which keeps CPUs busier but may encounter unacceptable scheduling overhead, which also decreases throughput.
LinkedBlockingQueue是没有上限的队列,由undounded queues的说明可知因为它的使用会导致maximumPoolSize的设置失效,即'''构造函数中传入的100是没有意义的''',只会有corePoolSize大小的线程数,也就是我们设置的20个线程存活在pool中。很明显,这样的结果与我们如此使用的预期是不相符的。
发现了一些问题,上面的疑问也清晰了一些:很可能那257个线程是在等待ThreadPoolExecutor去完成任务,但由于只有20个线程在干活,因此这257个线程的futureTask被放到了LinkedBlockingQueue中,只有那20个线程做完手中的事之后这257个线程对应的futureTask才有机会被执行。
于是,现在的疑问变成了:这20个线程究竟在做什么?虽然只有20个,但服务正常的时候它们应该非常快才对(不然问题早就出现了),但现在为什么那么慢以至于block住了所有的业务线程,'''它们究竟在做做什么?'''
再回头看stack summary的结果,发现确实有一个名为home-service-*的线程组,并且一共20个线程,这与上面100的maximumPoolSize的设置是无效的分析一致。看一下这个线程组各线程的具体状态:
name=home-service-**
threadsNum=20
hasUserThread=true
blockNum=0
execNum=0
-------------------------------------------- threads groups with different state, size=3, start --------------------------------------------
10 threads at (state = WAITING,
locks_waiting = [0x00000006e59d3fe8, 0x00000006e4e848d0, 0x00000006e598b410, 0x00000006e4f90898, 0x00000006e674a510, 0x00000006e4e841b8, 0x00000006e4e845a0, 0x00000006e674a1e0, 0x00000006e4f90fc0, 0x00000006e4f90c90]) :
"home-service-*" daemon prio=* tid=******** nid=******** waiting on condition [********]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <********> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.xxxxxx.productfront.service.HomeService.getExtendedTrend(HomeService.java:625)
at com.xxxxxx.productfront.service.HomeService.buildTrend(HomeService.java:561)
at com.xxxxxx.productfront.service.HomeService.access$000(HomeService.java:69)
at com.xxxxxx.productfront.service.HomeService$2.call(HomeService.java:148)
at com.xxxxxx.productfront.service.HomeService$2.call(HomeService.java:143)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662) 8 threads at (state = WAITING,
locks_waiting = [0x00000006e692a260, 0x00000006e5027360, 0x00000006e5027428, 0x00000006e4f912f0, 0x00000006e4f913d8, 0x00000006e598b740, 0x00000006e598b808, 0x00000006e4f90bc8]) :
"home-service-" daemon prio= tid=******** nid=******** waiting on condition []
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.xxxxxx.productfront.service.HomeService.getExtendedRecommendInfo(HomeService.java:740)
at com.xxxxxx.productfront.service.HomeService.buildRecommend(HomeService.java:453)
at com.xxxxxx.productfront.service.HomeService$3.call(HomeService.java:242)
at com.xxxxxx.productfront.service.HomeService$3.call(HomeService.java:237)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662) 2 threads at (state = WAITING,
locks_waiting = [0x00000006e4e844e8, 0x00000006e4e84100]) :
"home-service-" daemon prio= tid=******** nid=******** waiting on condition []
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.xxxxxx.productfront.service.HomeService.getExtendedNearByInfo(HomeService.java:876)
at com.xxxxxx.productfront.service.HomeService.buildNearBy(HomeService.java:476)
at com.xxxxxx.productfront.service.HomeService$4.call(HomeService.java:258)
at com.xxxxxx.productfront.service.HomeService$4.call(HomeService.java:253)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662) ----------------------------------------------- threads groups with different state, size=3, end -------------------------------------------
仔细观察stacktrace, 可以发现有个很奇怪的地方:这些线程都是从一个FutureTask.run开始,然后又调用了另一个FutureTask,并waiting在FutureTask.get()方法。
也就是说,我们的代码逻辑中,在FutureTask中又用了FutureTask, 这20个线程是在等待它下一层的FutureTask执行完成。那么,为什么它们的下一层没有执行完成呢?
通过HomeApiControoler.java可以看到,代码中确实存在FutureTask内调FutureTask的情况,并且发现该类中所有的FutureTask(一共19种FutureTask)都是共用上面定义的threadPoolExecutor来执行的。
此时一种担忧油然而生:FutureTask内调FutureTask,并且都由同一个threadPoolExecutor来完成,靠谱吗?会不会存在类似死锁的问题?
在做实验之前答案并不确定,要看ThreadPoolExecutor是否能实现的这么好了。所以先看了下ThreadPoolExecutor的java doc,并注意其中关于类似死锁相关逻辑的说明,果然,可以看到关于direct handoff的说明:
There are three general strategies for queuing:
Direct handoffs. A good default choice for a work queue is a SynchronousQueue that hands off tasks to threads without otherwise holding them. Here, an attempt to queue a task will fail if no threads are immediately available to run it, so a new thread will be constructed. This policy avoids lockups(死锁) when handling sets of requests that might have internal dependencies. Direct handoffs generally require unbounded maximumPoolSizes to avoid rejection of new submitted tasks. This in turn admits the possibility of unbounded thread growth when commands continue to arrive on average faster than they can be processed.
从中可以知道用direct handoffs的方法可以避免线程池因为内部纯程间的依赖而造成的死锁
至此,答案已经较为明了,简言之:业务线程在占用了线程池内所有的资源后又向线程池提交了新的任务,并且要等这些任务完成后才释放资源,而这些新提交的任务根本就没机会被完成!!!
好了,我们来验证一下是否确实会这样:
...
protected static final ThreadPoolExecutor executor = new ThreadPoolExecutor(2, 100, 2, TimeUnit.MINUTES, new LinkedBlockingQueue<Runnable>(),
new ThreadFactory() {
private AtomicInteger id = new AtomicInteger(0);
<span class="token annotation builtin">@Override</span>
<span class="token keyword">public</span> Thread <span class="token function">newThread</span><span class="token punctuation">(</span>Runnable r<span class="token punctuation">)</span> <span class="token punctuation">{</span>
Thread thread <span class="token operator">=</span> new <span class="token function">Thread</span><span class="token punctuation">(</span>r<span class="token punctuation">)</span><span class="token punctuation">;</span>
thread<span class="token punctuation">.</span><span class="token function">setName</span><span class="token punctuation">(</span><span class="token string">"home-service-"</span> <span class="token operator">+</span> id<span class="token punctuation">.</span><span class="token function">addAndGet</span><span class="token punctuation">(</span><span class="token number">1</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">return</span> thread<span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span><span class="token punctuation">,</span> new ThreadPoolExecutor<span class="token punctuation">.</span><span class="token function">CallerRunsPolicy</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">public</span> static void <span class="token function">main</span><span class="token punctuation">(</span>String<span class="token punctuation">[</span><span class="token punctuation">]</span> args<span class="token punctuation">)</span> throws InterruptedException<span class="token punctuation">,</span> ExecutionException <span class="token punctuation">{</span>
Future<span class="token operator"><</span>Long<span class="token operator">></span> f1 <span class="token operator">=</span> executor<span class="token punctuation">.</span><span class="token function">submit</span><span class="token punctuation">(</span>new Callable<span class="token operator"><</span>Long<span class="token operator">></span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token annotation builtin">@Override</span>
<span class="token keyword">public</span> Long <span class="token function">call</span><span class="token punctuation">(</span><span class="token punctuation">)</span> throws Exception <span class="token punctuation">{</span>
Thread<span class="token punctuation">.</span><span class="token function">sleep</span><span class="token punctuation">(</span><span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">//延时以使得第二层的f3在第一层的f2占用corePoolSize后才submit</span>
Future<span class="token operator"><</span>Long<span class="token operator">></span> f3 <span class="token operator">=</span> executor<span class="token punctuation">.</span><span class="token function">submit</span><span class="token punctuation">(</span>new Callable<span class="token operator"><</span>Long<span class="token operator">></span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token annotation builtin">@Override</span>
<span class="token keyword">public</span> Long <span class="token function">call</span><span class="token punctuation">(</span><span class="token punctuation">)</span> throws Exception <span class="token punctuation">{</span>
<span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1L</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
System<span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"f1.f3"</span> <span class="token operator">+</span> f3<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1L</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
Future<span class="token operator"><</span>Long<span class="token operator">></span> f2 <span class="token operator">=</span> executor<span class="token punctuation">.</span><span class="token function">submit</span><span class="token punctuation">(</span>new Callable<span class="token operator"><</span>Long<span class="token operator">></span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token annotation builtin">@Override</span>
<span class="token keyword">public</span> Long <span class="token function">call</span><span class="token punctuation">(</span><span class="token punctuation">)</span> throws Exception <span class="token punctuation">{</span>
Thread<span class="token punctuation">.</span><span class="token function">sleep</span><span class="token punctuation">(</span><span class="token number">1000</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//延时</span>
Future<span class="token operator"><</span>Long<span class="token operator">></span> f4 <span class="token operator">=</span> executor<span class="token punctuation">.</span><span class="token function">submit</span><span class="token punctuation">(</span>new Callable<span class="token operator"><</span>Long<span class="token operator">></span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{</span>
<span class="token annotation builtin">@Override</span>
<span class="token keyword">public</span> Long <span class="token function">call</span><span class="token punctuation">(</span><span class="token punctuation">)</span> throws Exception <span class="token punctuation">{</span>
<span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1L</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
System<span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"f2.f4"</span> <span class="token operator">+</span> f4<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token keyword">return</span> <span class="token operator">-</span><span class="token number">1L</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token punctuation">}</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
System<span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"here"</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
System<span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"f1"</span> <span class="token operator">+</span> f1<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
System<span class="token punctuation">.</span>out<span class="token punctuation">.</span><span class="token function">println</span><span class="token punctuation">(</span><span class="token string">"f2"</span> <span class="token operator">+</span> f2<span class="token punctuation">.</span><span class="token function">get</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span>
<span class="token punctuation">}</span>
<span class="token operator">..</span><span class="token punctuation">.</span>
运行程序后发现只打出一行日志"here", 并且进程没有结束,取其jstack:
...
"home-service-2" prio=5 tid=7fe2aa15b800 nid=0x10dcef000 waiting on condition [10dcee000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <7f371ac80> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.xxxxxx.productfront.service.HomeService$3.call(HomeService.java:136)
at com.xxxxxx.productfront.service.HomeService$3.call(HomeService.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
"home-service-1" prio=5 tid=7fe2aa0b8800 nid=0x10dbec000 waiting on condition [10dbeb000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <7f36c5800> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.xxxxxx.productfront.service.HomeService$2.call(HomeService.java:121)
at com.xxxxxx.productfront.service.HomeService$2.call(HomeService.java:1)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:695)
...
"main" prio=5 tid=7fe2ab000800 nid=0x1051c9000 waiting on condition [1051c8000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <7f369cab0> (a java.util.concurrent.FutureTask$Sync)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:969)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1281)
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:218)
at java.util.concurrent.FutureTask.get(FutureTask.java:83)
at com.xxxxxx.productfront.service.HomeService.main(HomeService.java:141)
...
可见确实会造成'''死锁''',而如果去掉f1和f2内的sleep延时,而把延时放到f2之前,则f1, f2, f3, f4均能成功完成。
结论
- 因为threadPoolExecutor使用了内部依赖,使得业务线程在占用了线程池内所有的资源后又向线程池提交了新的任务,并且要等这些任务完成后才释放资源,而这些新提交的任务根本就没机会被完成,从而造成业务线程没法完成,进而导致web服务器的业务线程耗尽,不再接受新的业务请求,即整个服务不可用。
如何处理
解决死锁:
- 方法一:去掉threadPoolExecutor的内部依赖,每一层的future用各种的threadPoolExecutor,但现在各层层次较乱,即使现在理清了也很容易在之后被勿用。
- 方法二:将maxPoolSize设为maximumPoolSize —— 不建议,耗用资源太多
- 方法三:将queue的长度设为最小(貌似不能设为0,于是将其设为1),这样maxPoolSize就会生效,同时设置ThreadPoolExecutor.CallerRunsPolicy()会使得任务被pool拒绝时转由当前线程本地执行。
另外,初始化ThreadPoolExecutor时用的ThreadPoolExecutor.CallerRunsPolicy()也是没意义的,见java doc:
Rejected tasks
New tasks submitted in method execute(java.lang.Runnable) will be rejected when the Executor has been shut down, and also when the Executor uses finite bounds for both maximum threads and work queue capacity, and is saturated. In either case, the execute method invokes the RejectedExecutionHandler.rejectedExecution(java.lang.Runnable, java.util.concurrent.ThreadPoolExecutor) method of its RejectedExecutionHandler. Four predefined handler policies are provided:
* In the default ThreadPoolExecutor.AbortPolicy, the handler throws a runtime RejectedExecutionException upon rejection.
* In ThreadPoolExecutor.CallerRunsPolicy, the thread that invokes execute itself runs the task. This provides a simple feedback control mechanism that will slow down the rate that new tasks are submitted.
* In ThreadPoolExecutor.DiscardPolicy, a task that cannot be executed is simply dropped.
* In ThreadPoolExecutor.DiscardOldestPolicy, if the executor is not shut down, the task at the head of the work queue is dropped, and then execution is retried (which can fail again, causing this to be repeated.)
It is possible to define and use other kinds of RejectedExecutionHandler classes. Doing so requires some care especially when policies are designed to work only under particular capacity or queuing policies.
综合考虑,较好的使用方式是保持internal dependency并且如下设置ThreadPoolExecutor:
ThreadPoolExecutor(20, 100, keepAliveTime, unit, new LinkedBlockingQueue<Runnable>(1), threadFactory)
参考资料
原文地址:https://www.jianshu.com/p/f343782f19fc
ThreadPoolExecutor使用错误导致死锁的更多相关文章
- [转]DllMain中不当操作导致死锁问题的分析——DllMain中要谨慎写代码(完结篇)
在CSDN中发现这篇文章,讲解的比较详细,所以在这里备份一个.原文链接:http://blog.csdn.net/breaksoftware/article/details/8167641 DllMa ...
- [经验分享] MySQL Innodb表导致死锁日志情况分析与归纳【转,纯学习】
在定时脚本运行过程中,发现当备份表格的sql语句与删除该表部分数据的sql语句同时运行时,mysql会检测出死锁,并打印出日志. 两个sql语句如下: (1)insert into backup_ta ...
- [20180801]insert导致死锁.txt
[20180801]insert导致死锁.txt --//链接http://www.itpub.net/thread-2104135-2-1.html的讨论,自己有点疏忽了,插入主键相同也会导致死锁. ...
- [译]async/await中使用阻塞式代码导致死锁 百万数据排序:优化的选择排序(堆排序)
[译]async/await中使用阻塞式代码导致死锁 这篇博文主要是讲解在async/await中使用阻塞式代码导致死锁的问题,以及如何避免出现这种死锁.内容主要是从作者Stephen Cleary的 ...
- 不要使用 Dispatcher.Invoke,因为它可能在你的延迟初始化 Lazy 中导致死锁
WPF 中为了 UI 的跨线程访问,提供了 Dispatcher 线程模型.其 Invoke 方法,无论在哪个线程调用,都可以让传入的方法回到 UI 线程. 然而,如果你在 Lazy 上下文中使用了 ...
- 在有 UI 线程参与的同步锁(如 AutoResetEvent)内部使用 await 可能导致死锁
AutoResetEvent.ManualResetEvent.Monitor.lock 等等这些用来做同步的类,如果在异步上下文(await)中使用,需要非常谨慎. 本文将说一个在同步上下文中非常常 ...
- MySQL Innodb表导致死锁日志情况分析与归纳
发现当备份表格的sql语句与删除该表部分数据的sql语句同时运行时,mysql会检测出死锁,并打印出日志 案例描述在定时脚本运行过程中,发现当备份表格的sql语句与删除该表部分数据的sql语句同时 ...
- [译]async/await中使用阻塞式代码导致死锁
原文:[译]async/await中使用阻塞式代码导致死锁 这篇博文主要是讲解在async/await中使用阻塞式代码导致死锁的问题,以及如何避免出现这种死锁.内容主要是从作者Stephen Clea ...
- 信号处理函数陷阱:调用malloc导致死锁[转]
概览 因malloc是加锁的,上网了解的相关信息,额外了解到信号处理规范使用,mark 正文 在执行malloc的过程中,跳转到了信号处理函数中.而信号处理函数在调用某个系统api时,内部又调用了ma ...
随机推荐
- GoCN每日新闻(2019-10-22)
GoCN每日新闻(2019-10-22) GoCN每日新闻(2019-10-22) 1. Go 集成测试:https://www.ardanlabs.com/blog/2019/10/integrat ...
- URL的作用是什么?它由几部分组成?
URL是统一资源定位符,对可以从互联网上得到的资源的位置和访问方法的一种简洁的表示,是互联网上标准资源的地址.互联网上的每个文件都有一个唯一的URL,它包含的信息指出文件的位置以及浏览器应该怎么处理它 ...
- 【大数据应用技术】作业十|分布式文件系统HDFS 练习
本次作业的要求来自:https://edu.cnblogs.com/campus/gzcc/GZCC-16SE2/homework/3292 1.目录操作 在HDFS中为hadoop用户创建一个用户目 ...
- 【数值分析】Python实现Lagrange插值
一直想把这几个插值公式用代码实现一下,今天闲着没事,尝试尝试. 先从最简单的拉格朗日插值开始!关于拉格朗日插值公式的基础知识就不赘述,百度上一搜一大堆. 基本思路是首先从文件读入给出的样本点,根据输入 ...
- [Beta]第三次 Scrum Meeting
[Beta]第三次 Scrum Meeting 写在前面 会议时间 会议时长 会议地点 2019/5/7 22:00 30min 大运村公寓6F寝室 附Github仓库:WEDO 例会照片 工作情况总 ...
- 定制比特币btc地址address
https://99bitcoins.com/how-to-get-a-custom-bitcoin-address/ https://en.bitcoin.it/wiki/Vanitygen htt ...
- 转载:Base64编解码介绍
https://www.liaoxuefeng.com/wiki/897692888725344/949441536192576 Base64是一种用64个字符来表示任意二进制数据的方法. 用记事本打 ...
- 以太坊公链Geth同步
1.安装所需基础工具 yum update -y && yum install git wget bzip2 vim gcc-c++ ntp epel-release nodejs c ...
- ThinkPHP5 基础知识入门 [入门必先了解]
一.目录结构 下载最新版框架后,解压缩到web目录下面,可以看到初始的目录结构如下: project 应用部署目录 ├─application 应用目录(可设置) │ ├─common 公共模块目录( ...
- 爬虫中BeautifulSoup4解析器
CSS 选择器:BeautifulSoup4 和 lxml 一样,Beautiful Soup 也是一个HTML/XML的解析器,主要的功能也是如何解析和提取 HTML/XML 数据. lxml 只会 ...