Java 很少有Ignite客户端节点缓存调用会无限期地被卡住,而不是在服务器节点重新启动时抛出ClientDisconnected/CacheStopped

Java 很少有Ignite客户端节点缓存调用会无限期地被卡住,而不是在服务器节点重新启动时抛出ClientDisconnected/CacheStopped,java,ignite,gridgain,Java,Ignite,Gridgain,在scale测试设置中运行时,我们注意到,当ignite服务器节点重新启动时,只有两个ignite客户端节点在调用QueryCursorImpliterator时无限期地卡住,而不是通过抛出ClientDisconnected或CacheStopped异常而失败,很少有其他客户端节点会重新连接,因为我们已经准备好了代码发生这种情况时,请断开连接并重新连接,因为我们在Spring Boot等容器环境中使用自动重新连接WRT ignite资源句柄时会出现问题 从这些服务的线程转储中,我看到大量线程停

在scale测试设置中运行时,我们注意到,当ignite服务器节点重新启动时,只有两个ignite客户端节点在调用QueryCursorImpliterator时无限期地卡住,而不是通过抛出ClientDisconnected或CacheStopped异常而失败,很少有其他客户端节点会重新连接,因为我们已经准备好了代码发生这种情况时,请断开连接并重新连接,因为我们在Spring Boot等容器环境中使用自动重新连接WRT ignite资源句柄时会出现问题

从这些服务的线程转储中,我看到大量线程停留在堆栈跟踪下方

stackTrace:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
以上是所有源于缓存读取或写入的部分跟踪所共有的。下面是一些例子

Scheduled-task-pool-9 - threadId:194 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendGeneric(GridIoManager.java:1727)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.send(IgniteH2Indexing.java:2511)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.send(GridReduceQueryExecutor.java:1419)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:732)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1339)
at org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$9.iterator(IgniteH2Indexing.java:1403)
at org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
at com.**.**.configuration.ClientHealthBasedReconnectWrapper.monitorHealth(ClientHealthBasedReconnectWrapper.java:102)
at sun.reflect.GeneratedMethodAccessor403.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:65)
at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- locked <57b75756> (a java.util.concurrent.ThreadPoolExecutor$Worker)


http-nio-7051-exec-75 - threadId:8896 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendGeneric(GridIoManager.java:1727)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.send(IgniteH2Indexing.java:2511)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.send(GridReduceQueryExecutor.java:1419)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.releaseRemoteResources(GridReduceQueryExecutor.java:1037)
at org.apache.ignite.internal.processors.query.h2.twostep.GridReduceQueryExecutor.query(GridReduceQueryExecutor.java:835)
at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing$8.iterator(IgniteH2Indexing.java:1339)
at org.apache.ignite.internal.processors.cache.QueryCursorImpl.iterator(QueryCursorImpl.java:95)
at org.apache.ignite.internal.processors.cache.QueryCursorImpl.getAll(QueryCursorImpl.java:114)
at com.***.***.perfmon.datastore.service.DataStoreCacheService.getCongestionPortsSummary(DataStoreCacheService.java:1826)
at com.***.***.perfmon.datastore.controller.DataStoreCacheServiceController.getCongestionPortsSummary(DataStoreCacheServiceController.java:199)
at sun.reflect.GeneratedMethodAccessor574.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:205)
at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:133)
at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:97)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:827)
at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:738)
at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:967)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:901)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:970)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:872)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:661)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:846)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:742)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HttpPutFormContentFilter.doFilterInternal(HttpPutFormContentFilter.java:109)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:197)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:800)
at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:800)
at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1471)
at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
- locked <58365b7b> (a org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- locked <209011f6> (a java.util.concurrent.ThreadPoolExecutor$Worker)


pool-2-thread-10 - threadId:42 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1714)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1166)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.proceedPrepare(GridNearOptimisticTxPrepareFuture.java:593)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepareSingle(GridNearOptimisticTxPrepareFuture.java:405)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFuture.prepare0(GridNearOptimisticTxPrepareFuture.java:348)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepareOnTopology(GridNearOptimisticTxPrepareFutureAdapter.java:137)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearOptimisticTxPrepareFutureAdapter.prepare(GridNearOptimisticTxPrepareFutureAdapter.java:74)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.prepareNearTxLocal(GridNearTxLocal.java:3161)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.commitNearTxLocalAsync(GridNearTxLocal.java:3221)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.optimisticPutFuture(GridNearTxLocal.java:2391)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAllAsync0(GridNearTxLocal.java:802)
at org.apache.ignite.internal.processors.cache.distributed.near.GridNearTxLocal.putAllAsync(GridNearTxLocal.java:361)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter$35.inOp(GridCacheAdapter.java:2821)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter$SyncInOp.op(GridCacheAdapter.java:5076)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.syncOp(GridCacheAdapter.java:4088)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAll0(GridCacheAdapter.java:2819)
at org.apache.ignite.internal.processors.cache.GridCacheAdapter.putAll(GridCacheAdapter.java:2808)
at org.apache.ignite.internal.processors.cache.IgniteCacheProxyImpl.putAll(IgniteCacheProxyImpl.java:1089)
at org.apache.ignite.internal.processors.cache.GatewayProtectedCacheProxy.putAll(GatewayProtectedCacheProxy.java:942)
at com.***.***.perfmon.datastore.cachehandler.FcProductStatCacheHandler.insertCache(FcProductStatCacheHandler.java:225)
at com.***.***.perfmon.datastore.updatehandler.DataStoreUpdateHandler.storeData(DataStoreUpdateHandler.java:168)
at com.***.***.perfmon.datastore.messaging.PerfmonProcessedStatsDataConsumer$2.run(PerfmonProcessedStatsDataConsumer.java:150)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- locked <613a8ee1> (a java.util.concurrent.ThreadPoolExecutor$Worker)


pool-4-thread-5 - threadId:53 - state:WAITING
stackTrace:
java.lang.Thread.State: WAITING
at sun.misc.Unsafe.park(Native Method)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1714)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.submit(DataStreamerImpl.java:1798)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl$Buffer.flush(DataStreamerImpl.java:1534)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.doFlush(DataStreamerImpl.java:1074)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.closeEx(DataStreamerImpl.java:1240)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.closeEx(DataStreamerImpl.java:1211)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.close(DataStreamerImpl.java:1199)
at org.apache.ignite.internal.processors.datastreamer.DataStreamerImpl.close(DataStreamerImpl.java:1286)
at com.***.***.perfmon.datastore.cachehandler.RawFcPortStatCacheHandler.insertCacheBulk(RawFcPortStatCacheHandler.java:282)
at com.***.***.perfmon.datastore.updatehandler.DataStoreUpdateHandler.storeData(DataStoreUpdateHandler.java:180)
at com.***.***.perfmon.datastore.messaging.PerfmonStreamingStatsDataConsumer.processAndStoreAggData(PerfmonStreamingStatsDataConsumer.java:262)
at com.***.***.perfmon.datastore.messaging.PerfmonStreamingStatsDataConsumer$2.run(PerfmonStreamingStatsDataConsumer.java:381)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- locked <6e46d9f4> (a java.util.concurrent.ThreadPoolExecutor$Worker)
在这个特定的设置中,我们只需要一个服务器节点&大约25多个客户端节点,它们都运行在docker swarm覆盖网络中。其中一些节点将更新事务内部的大量缓存,基本上打开trx,获取一些密钥的锁,然后在关闭trx之前通过jcache API更新几个缓存,我怀疑锁钥匙是一个问题,但这是一个单独的问题,我将在另一个问题中提出

关于如何避免/解决这一问题,有人对此有任何线索、建议或意见吗

我们正在运行2.4版&使用Spring集成计划很快进行升级

谢谢 穆图

更新日期:2018年10月16日:

在两个卡住的客户端节点之一的线程转储中,我一致地看到了这个卡住的线程,在查看代码时,它看起来像是其他线程卡住的原因,尽管我在另一个客户端节点的线程转储中没有看到这一点。这可能是个问题吗

"tcp-client-disco-msg-worker-#4" - Thread t@127
   java.lang.Thread.State: WAITING
        at sun.misc.Unsafe.park(Native Method)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304)
        at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177)
        at org.apache.ignite.internal.util.future.GridFutureAdapter.get(GridFutureAdapter.java:140)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:2799)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2621)
        at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2585)
        at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:1642)
        at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:1714)
        at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1166)
        at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedCache.removeLocks(GridDhtColocatedCache.java:859)
        at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.undoLocks(GridDhtColocatedLockFuture.java:389)
        at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.onComplete(GridDhtColocatedLockFuture.java:586)
        at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.onDone(GridDhtColocatedLockFuture.java:565)
        at org.apache.ignite.internal.processors.cache.distributed.dht.colocated.GridDhtColocatedLockFuture.onDone(GridDhtColocatedLockFuture.java:90)
        at org.apache.ignite.internal.util.future.GridFutureAdapter.onDone(GridFutureAdapter.java:462)
        at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.cancelClientFutures(GridCacheMvccManager.java:386)
        at org.apache.ignite.internal.processors.cache.GridCacheMvccManager.onDisconnected(GridCacheMvccManager.java:378)
        at org.apache.ignite.internal.processors.cache.GridCacheSharedContext.onDisconnected(GridCacheSharedContext.java:343)
        at org.apache.ignite.internal.processors.cache.GridCacheProcessor.onDisconnected(GridCacheProcessor.java:1036)
        at org.apache.ignite.internal.IgniteKernal.onDisconnected(IgniteKernal.java:3793)
        at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery0(GridDiscoveryManager.java:779)
        at org.apache.ignite.internal.managers.discovery.GridDiscoveryManager$4.onDiscovery(GridDiscoveryManager.java:576)
        - locked <fb11fd7> (a java.lang.Object)
        at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2414)
        at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.notifyDiscovery(ClientImpl.java:2393)
        at org.apache.ignite.spi.discovery.tcp.ClientImpl$MessageWorker.body(ClientImpl.java:1709)
        at org.apache.ignite.spi.IgniteSpiThread.run(IgniteSpiThread.java:62)

   Locked ownable synchronizers:
        - None

我认为这不正常

这里的内容基本上是客户端通过通信尝试连接到重新启动的服务器节点,得到我不知道您是谁的响应并重试

应该发生的是,在再次尝试通信之前,客户端应该断开与拓扑的连接,并通过发现重新连接一个新拓扑。您有来自客户端的完整日志吗


由于它看起来不像正常的行为,我会尝试2.6或即将推出的2.7,看看是否更好。

谢谢@alamar,很遗憾,我现在无法升级,因为现在太晚了。对于日志,让我尝试从客户机和服务器节点附加与ignite相关的日志。但我注意到,在两个卡住的客户端节点的线程转储中,有一个我一直怀疑是根本原因&这与GridFutureAdapterDone在GridHttLocatedCacheRemovelocks上卡住有关,原因与我查看代码时认为不应该卡住的原因相同&可能是这个原因。我已经更新了这个问题。您能分享一下您对此的想法吗?我认为存在一些网络问题,长OOM或类似的问题-这导致集群变得无响应。