探索基于java的web应用程序性能随时间下降的策略

探索基于java的web应用程序性能随时间下降的策略,java,performance,profiling,async-profiler,Java,Performance,Profiling,Async Profiler,我正在开发企业java应用程序,其中已经有很多工具/框架,比如Struts、JAX-RS和SpringMVC。它包含捆绑在.war文件中的UI和REST端点。 该项目正在发展中,我们正在摆脱旧的工具,争取只使用SpringMVC/Webflux 应用程序正在对数百万条XML/JSON记录执行搜索,最近搜索引擎从Marklogic切换到Elasticsearch 我们注意到,在使用量不太大的生产环境中(2-4个应用程序节点的转速高达1.7k rpm),一些端点的响应时间会随着时间的推移而增加。 E

我正在开发企业java应用程序,其中已经有很多工具/框架,比如Struts、JAX-RS和SpringMVC。它包含捆绑在.war文件中的UI和REST端点。 该项目正在发展中,我们正在摆脱旧的工具,争取只使用SpringMVC/Webflux

应用程序正在对数百万条XML/JSON记录执行搜索,最近搜索引擎从Marklogic切换到Elasticsearch

我们注意到,在使用量不太大的生产环境中(2-4个应用程序节点的转速高达1.7k rpm),一些端点的响应时间会随着时间的推移而增加。 Elasticsearch有一个增长空间,没有显示任何巨大负载的迹象。 因此,目前我们必须在平均响应时间超过3秒时(而不是常规的200-300毫秒)每隔一两周重新启动/更换一次prod实例

我尝试使用获取CPU和堆火焰图,但每次测量时,负载配置文件都会发生变化,因为我们有一系列可用的功能,所以我无法真正比较图形是如何随时间变化的


你能告诉我一些在代码中找到正确位置的策略/方法吗?

发现问题。它与线程池有关

我们注意到,随着时间的推移,活动tomcat线程的数量随着响应时间的增长而增长: 在图中,您还可以看到服务器已于5月9日重新启动

在服务器重新启动之前,我能够获得一个堆转储,在一些挖掘之后,我在线程转储中发现了一个有趣的重复的片段:

Thread xxx
  at sun.misc.Unsafe.park(ZJ)V (Native Method)
  at java.util.concurrent.locks.LockSupport.park(Ljava/lang/Object;)V (LockSupport.java:175)
  at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()V (AbstractQueuedSynchronizer.java:2039)
  at org.apache.http.pool.AbstractConnPool.getPoolEntryBlocking(Ljava/lang/Object;Ljava/lang/Object;JLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/Future;)Lorg/apache/http/pool/PoolEntry; (AbstractConnPool.java:377)
  at org.apache.http.pool.AbstractConnPool.access$200(Lorg/apache/http/pool/AbstractConnPool;Ljava/lang/Object;Ljava/lang/Object;JLjava/util/concurrent/TimeUnit;Ljava/util/concurrent/Future;)Lorg/apache/http/pool/PoolEntry; (AbstractConnPool.java:67)
  at org.apache.http.pool.AbstractConnPool$2.get(JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/pool/PoolEntry; (AbstractConnPool.java:243)
  at org.apache.http.pool.AbstractConnPool$2.get(JLjava/util/concurrent/TimeUnit;)Ljava/lang/Object; (AbstractConnPool.java:191)
  at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(Ljava/util/concurrent/Future;JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/HttpClientConnection; (PoolingHttpClientConnectionManager.java:282)
  at org.apache.http.impl.conn.PoolingHttpClientConnectionManager$1.get(JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/HttpClientConnection; (PoolingHttpClientConnectionManager.java:269)
  at org.apache.http.impl.execchain.MainClientExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (MainClientExec.java:191)
  at org.apache.http.impl.execchain.ProtocolExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (ProtocolExec.java:185)
  at org.apache.http.impl.execchain.RetryExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (RetryExec.java:89)
  at org.apache.http.impl.execchain.RedirectExec.execute(Lorg/apache/http/conn/routing/HttpRoute;Lorg/apache/http/client/methods/HttpRequestWrapper;Lorg/apache/http/client/protocol/HttpClientContext;Lorg/apache/http/client/methods/HttpExecutionAware;)Lorg/apache/http/client/methods/CloseableHttpResponse; (RedirectExec.java:111)
  at org.apache.http.impl.client.InternalHttpClient.doExecute(Lorg/apache/http/HttpHost;Lorg/apache/http/HttpRequest;Lorg/apache/http/protocol/HttpContext;)Lorg/apache/http/client/methods/CloseableHttpResponse; (InternalHttpClient.java:185)
  at org.apache.http.impl.client.CloseableHttpClient.execute(Lorg/apache/http/client/methods/HttpUriRequest;Lorg/apache/http/protocol/HttpContext;)Lorg/apache/http/client/methods/CloseableHttpResponse; (CloseableHttpClient.java:83)
  at org.apache.http.impl.client.CloseableHttpClient.execute(Lorg/apache/http/client/methods/HttpUriRequest;)Lorg/apache/http/client/methods/CloseableHttpResponse; (CloseableHttpClient.java:108)
  at io.searchbox.client.http.JestHttpClient.executeRequest(Lorg/apache/http/client/methods/HttpUriRequest;)Lorg/apache/http/client/methods/CloseableHttpResponse; (JestHttpClient.java:136)
  at io.searchbox.client.http.JestHttpClient.execute(Lio/searchbox/action/Action;Lorg/apache/http/client/config/RequestConfig;)Lio/searchbox/client/JestResult; (JestHttpClient.java:70)
  at io.searchbox.client.http.JestHttpClient.execute(Lio/searchbox/action/Action;)Lio/searchbox/client/JestResult; (JestHttpClient.java:63)
...
在本例中,我们使用Jest库与Elasticsearch对话。 在内部,它使用ApacheHTTP客户端和ApacheHTTP异步客户端

正如您在线程转储中看到的,很明显,该线程正在HTTP客户端的线程池中等待可用线程。而且有更多的线程具有完全相同的堆栈

我还发现,我们将
maxTotal
(最大连接总数)设置为
20
,将
defaultMaxPerRoute
(每条路由的最大连接数)设置为
2

默认情况下,池总共只允许20个并发连接,每个唯一路由允许两个并发连接。两个连接限制是由于HTTP规范的要求。然而,从实际角度来看,这往往过于严格

因此,我所做的修复将这些值分别增加到
50
40

我仍然希望这个参数不受限制,并且随着使用量的增加而增加,但现在还是坚持这些值。

。在那之前,VisualVM是我的目标。虽然我不会把这称为一个答案:您有包含请求时间戳的日志记录吗?假设您的请求可以划分为不同的“阶段”或类似的阶段,可能会在从一个阶段到另一个阶段的转换上加上时间戳。然后,您可以跟踪请求的哪个阶段增长较慢。但是,由于您似乎已经将罪魁祸首与搜索“引擎”隔离开来:听起来很像内存泄漏或从属映射的降级。你是否在缓存一些东西,比如请求模式?如果没有,请检查虚拟机是否一直在分配内存。@TreffnonX我对搜索引擎没有信心,因为它的更改总的执行时间通常要高得多。然后我建议我的第一部分评论:对请求的不同部分进行计时,以便随着时间的推移分离出哪些部分会退化。如果您可以选择在jvm调试模式下运行生产代码,则可以“快照”长时间运行的请求,并进行深入的内存分析或堆栈分析。