Tomcat和Java升级在CMS主要收集期间导致CPU峰值

Tomcat和Java升级在CMS主要收集期间导致CPU峰值,java,concurrency,cpu,Java,Concurrency,Cpu,最近我们升级了Java(1.6.0\u 18->1.6.0\u 38)和Tomcat(6.0.32->7.0.34)版本的第三方J2EEweb应用程序,该应用程序在我们的生产环境中运行。我们很快收到警报,服务器上的CPU一天中有几次峰值超过50%。进一步分析后,我发现峰值与并发标记扫描主要gc同时发生,完成峰值所需的总CPU时间大幅增加,特别是在CMS并发标记和CMS并发扫描阶段: 之前: 2013-03-08T14:36:49.861-0500: 553875.681: [GC [1 CMS-

最近我们升级了
Java(1.6.0\u 18->1.6.0\u 38)
Tomcat(6.0.32->7.0.34)
版本的第三方
J2EE
web应用程序,该应用程序在我们的生产环境中运行。我们很快收到警报,服务器上的CPU一天中有几次峰值超过50%。进一步分析后,我发现峰值与并发标记扫描主要gc同时发生,完成峰值所需的总CPU时间大幅增加,特别是在CMS并发标记和CMS并发扫描阶段:

之前:

2013-03-08T14:36:49.861-0500: 553875.681: [GC [1 CMS-initial-mark: 4152134K(8303424K)] 4156673K(8380096K), 0.0067893 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
2013-03-08T14:36:49.868-0500: 553875.688: [CMS-concurrent-mark-start]
2013-03-08T14:36:55.682-0500: 553881.503: [GC 553881.503: [ParNew: 72675K->4635K(76672K), 0.0322031 secs] 4224809K->4157567K(8380096K), 0.0327540 secs] [Times: user=0.12 sys=0.01, real=0.03 secs] 
2013-03-08T14:36:58.224-0500: 553884.045: [CMS-concurrent-mark: 8.320/8.356 secs] [Times: user=9.18 sys=0.02, real=8.36 secs] 
2013-03-08T14:36:58.224-0500: 553884.045: [CMS-concurrent-preclean-start]
2013-03-08T14:36:58.276-0500: 553884.097: [CMS-concurrent-preclean: 0.051/0.052 secs] [Times: user=0.06 sys=0.00, real=0.05 secs] 
2013-03-08T14:36:58.277-0500: 553884.097: [CMS-concurrent-abortable-preclean-start]
2013-03-08T14:37:01.458-0500: 553887.279: [GC 553887.279: [ParNew: 72795K->4887K(76672K), 0.0332472 secs] 4225727K->4158532K(8380096K), 0.0337703 secs] [Times: user=0.13 sys=0.00, real=0.03 secs] 
 CMS: abort preclean due to time 2013-03-08T14:37:03.296-0500: 553889.117: [CMS-concurrent-abortable-preclean: 1.462/5.020 secs] [Times: user=2.04 sys=0.02, real=5.02 secs] 
2013-03-08T14:37:03.299-0500: 553889.119: [GC[YG occupancy: 22614 K (76672 K)]553889.120: [Rescan (parallel) , 0.0151518 secs]553889.135: [weak refs processing, 0.0356825 secs] [1 CMS-remark: 4153644K(8303424K)] 4176259K(8380096K), 0.0620445 secs] [Times: user=0.11 sys=0.00, real=0.06 secs] 
2013-03-08T14:37:03.363-0500: 553889.183: [CMS-concurrent-sweep-start]
2013-03-08T14:37:07.248-0500: 553893.069: [GC 553893.069: [ParNew: 73047K->5136K(76672K), 0.0510894 secs] 3182253K->3115235K(8380096K), 0.0516111 secs] [Times: user=0.19 sys=0.00, real=0.05 secs] 
2013-03-08T14:37:08.277-0500: 553894.097: [CMS-concurrent-sweep: 4.856/4.914 secs] [Times: user=5.67 sys=0.02, real=4.91 secs] 
2013-03-08T14:37:08.277-0500: 553894.097: [CMS-concurrent-reset-start]
2013-03-08T14:37:08.325-0500: 553894.145: [CMS-concurrent-reset: 0.048/0.048 secs] [Times: user=0.07 sys=0.00, real=0.05 secs] 
之后:

2013-03-07T17:18:01.323-0500: 180055.128: [CMS-concurrent-mark: 10.765/20.646 secs] [Times: user=50.25 sys=3.32, real=20.65 secs] 
2013-03-07T17:18:01.323-0500: 180055.128: [CMS-concurrent-preclean-start]
2013-03-07T17:18:01.401-0500: 180055.206: [CMS-concurrent-preclean: 0.076/0.078 secs] [Times: user=0.08 sys=0.00, real=0.08 secs] 
2013-03-07T17:18:01.401-0500: 180055.206: [CMS-concurrent-abortable-preclean-start]
2013-03-07T17:18:03.074-0500: 180056.879: [GC 180056.880: [ParNew: 76670K->8512K(76672K), 0.1024039 secs] 5980843K->5922977K(8380096K), 0.1028797 secs] [Times: user=0.28 sys=0.04, real=0.10 secs] 
2013-03-07T17:18:05.447-0500: 180059.253: [CMS-concurrent-abortable-preclean: 3.132/4.046 secs] [Times: user=3.94 sys=0.07, real=4.05 secs] 
2013-03-07T17:18:05.448-0500: 180059.254: [GC[YG occupancy: 51161 K (76672 K)]180059.254: [Rescan (parallel) , 0.0243232 secs]180059.279: [weak refs processing, 0.2053571 secs] [1 CMS-remark: 5914465K(8303424K)] 5965627K(8380096K), 0.2569077 secs] [Times: user=0.33 sys=0.01, real=0.26 secs] 
2013-03-07T17:18:05.706-0500: 180059.512: [CMS-concurrent-sweep-start]
2013-03-07T17:18:12.511-0500: 180066.316: [CMS-concurrent-sweep: 6.804/6.804 secs] [Times: user=13.98 sys=0.80, real=6.80 secs] 
2013-03-07T17:18:12.511-0500: 180066.316: [CMS-concurrent-reset-start]
2013-03-07T17:18:12.558-0500: 180066.363: [CMS-concurrent-reset: 0.047/0.047 secs] [Times: user=0.11 sys=0.02, real=0.05 secs] 
在这些持续大约一分钟的峰值期间,Tomcat服务器的响应时间从平均2毫秒增加到大约90秒。经过3天的生产,我们回滚了更改,此后再也没有看到CPU峰值。您知道JDK或Tomcat中的任何变化可能导致这种行为吗?注意:此web应用程序在堆中缓存大量数据(启动时高达3GB)

以下是JVM设置:

(Before) Tomcat 6 / JDK 1.6.0_18:
JAVA_HOME="/usr/local/java/jdk1.6.0_18"
JAVA_OPTS="$JAVA_OPTS -server -d64 -XX:PermSize=128m -XX:MaxPermSize=128m"
CATALINA_OPTS="$CATALINA_OPTS -Xms8192m -Xmx8192m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:/env/tomcat-instance/logs/gc.log -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=(omitted) -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

(After) Tomcat 7 / JDK 1.6.0_38:
JAVA_HOME="/usr/local/java/jdk1.6.0_38"
JAVA_OPTS="$JAVA_OPTS -server -d64 -XX:PermSize=128m -XX:MaxPermSize=128m"
CATALINA_OPTS="$CATALINA_OPTS -Xms8192m -Xmx8192m -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:CMSInitiatingOccupancyFraction=50 -XX:+UseCMSInitiatingOccupancyOnly -verbose:gc -XX:+PrintGCDateStamps -XX:+PrintGCDetails -Xloggc:/env/tomcat-instance/logs/gc.log -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=(omitted) -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

非常感谢您的帮助。

轶事反馈-我们在6u2n中遇到了一个严重的内存泄漏错误,直到7:


6u21是我使用过的最安全的Java6JRE。

您同时升级了Tomcat和JVM,因此峰值可能是由其中任何一个引起的。您可以限制GC的线程数

-XX:ParallelGCThreads=12

如果运行多个JVM,请确保GC线程数不超过内核数。也看看JVM1.7

使用这些参数查看有效的jvm参数并查找更改:

-XX:+UnlockDiagnosticVMOptions
-XX:+PrintFlagsFinal
-XX:+LogVMOutput
-XX:LogFile=logs/jvm.log