Java slf4j API的回写性能问题
我正在使用windows计算机获取logback+slf4j的性能结果Java slf4j API的回写性能问题,java,performance,logging,Java,Performance,Logging,我正在使用windows计算机获取logback+slf4j的性能结果 import org.slf4j.Logger; import org.slf4j.LoggerFactory; import ch.qos.logback.classic.Level; public class LogPerformanceAnalyser { private static final Logger LOG = LoggerFactory.getLogger(LogPerf
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import ch.qos.logback.classic.Level;
public class LogPerformanceAnalyser {
private static final Logger LOG =
LoggerFactory.getLogger(LogPerformanceAnalyser.class);
public LogPerformanceAnalyser() {
((ch.qos.logback.classic.Logger) LOG).setLevel(Level.ERROR);
}
public long getTimeWithCheck() {
long startTime = System.currentTimeMillis();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
if (LOG.isDebugEnabled()) {
LOG.debug("This log is {} check", "with");
}
}
return System.currentTimeMillis() - startTime;
}
public long getTimeWithoutCheck() {
long startTime = System.currentTimeMillis();
for (int i = 0; i < Integer.MAX_VALUE; i++) {
LOG.debug("This log is {} check", "without");
}
return System.currentTimeMillis() - startTime;
}
}
结果是,在记录日志之前进行检查可以为217亿条日志节省约3.5秒的时间
如果我将记录器更改为非静态:
private final Logger LOG =
LoggerFactory.getLogger(LogPerformanceAnalyser.class);
我得到以下信息:
Total Time getTimeWithoutCheck: 37095 ms
Total Time getTimeWithCheck : 47006 ms
有人能解释这一点吗?结果或多或少地符合这样一个假设,即字段访问对总运行时间的贡献为10000毫秒。没有支票,你的开销只有一次,而有支票,你就有两次
差异如此之大仍然令人惊讶。首先,您需要正确的基准。在Java世界中,事实上是基准测试的标准 基准:
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
public class LogBench {
private static final Logger LOG = LoggerFactory.getLogger(LogBench.class);
private final Logger localLog = LoggerFactory.getLogger(LogBench.class);
@Benchmark
public long baseline() {
return 0;
}
@Benchmark
public void getTimeWithCheck() {
if (LOG.isTraceEnabled()) {
LOG.trace("This log is {} check", "with");
}
}
@Benchmark
public void getTimeWithoutCheck() {
LOG.trace("This log is {} check", "without");
}
@Benchmark
public void getTimeWithCheckBenchLocal() {
if (localLog.isTraceEnabled()) {
localLog.trace("This log is {} check", "with");
}
}
@Benchmark
public void getTimeWithoutCheckLocal() {
localLog.trace("This log is {} check", "without");
}
}
我将调试更改为跟踪,以避免强制转换可能产生的影响
结果如何
Benchmark Mode Samples Score Score error Units
o.o.j.s.LogBench.baseline avgt 3 0.539 0.047 ns/op
o.o.j.s.LogBench.getTimeWithCheck avgt 3 1.030 0.083 ns/op
o.o.j.s.LogBench.getTimeWithCheckLocal avgt 3 1.637 0.571 ns/op
o.o.j.s.LogBench.getTimeWithoutCheck avgt 3 1.140 0.112 ns/op
o.o.j.s.LogBench.getTimeWithoutCheckLocal avgt 3 1.628 0.311 ns/op
您可以看到,条件检查在这里是无用的,但静态版本比本地版本快1.6倍。让我们开始研究getTimeWithCheckLocal和getTimeWithCheck之间的区别
静态日志组件
非静态日志组件
您可以注意到,在第二个实验中,JIT必须执行记录器字段值的额外加载:lear9、[r12+r10*8]
让我们使用perfasm分析器再次运行基准测试
0.04% 0.04% │↗ 0x00007f6c25229320: mov r10d,DWORD PTR [r8+0xc] ;*getfield localLog
││ ; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheckLocal@1 (line 77)
││ ; - org.openjdk.jmh.samples.generated.LogBench_getTimeWithoutCheckLocal_jmhTest::getTimeWithoutCheckLocal_avgt_jmhStub@14 (line 163)
6.80% 7.29% ││ 0x00007f6c25229324: mov r11d,DWORD PTR [r12+r10*8+0x8]
││ ; implicit exception: dispatches to 0x00007f6c252294a5
0.02% ││ 0x00007f6c25229329: cmp r11d,0xf80197b1 ; {metadata('ch/qos/logback/classic/Logger')}
││ 0x00007f6c25229330: jne 0x00007f6c2522939b
││ 0x00007f6c25229332: lea r9,[r12+r10*8] ;*invokeinterface debug
││ 0x00007f6c25229336: mov ecx,DWORD PTR [r9+0x28] ;*getfield loggerContext
正如你所看到的,这个额外的加载不是免费的。这是因为存在许多更改最终变量的方法,因此对字段执行此优化是不安全的
由于实验特性jvm有一个特殊选项-XX:+trustFinalOnStaticFields
,因此它必须与-XX:+UnlockeExperimentalVMOPtions
一起使用。如果使用此选项运行基准测试,您将看到另一个结果:
Benchmark Mode Cnt Score Error Units
LogBench.baseline avgt 3 2.124 ± 0.907 ns/op
LogBench.getTimeWithCheck avgt 3 0.695 ± 0.231 ns/op
LogBench.getTimeWithCheckBenchLocal avgt 3 1.608 ± 0.140 ns/op
LogBench.getTimeWithoutCheck avgt 3 0.675 ± 0.075 ns/op
LogBench.getTimeWithoutCheckLocal avgt 3 1.613 ± 0.176 ns/op
结果很奇怪,尽管现在没有额外的局部变量加载,但内联被破坏,asm代码包含直接调用:
0x00007f2355205d33: call 0x00007f2355046020 ; OopMap{off=120}
;*invokespecial filterAndLog_1
结论
- JVM不能信任final字段,所以每次在基准测试中它都必须从内存中加载它(但在99.9999999%的应用程序中这不是问题)
- JVM有
,这似乎非常不稳定,因为它破坏了CHA优化TrustFinalOnStaticFields
cmp
加载movr10d,DWORD PTR[r8+0xc]
似乎比加载字段r11d,DWORD PTR[rsi+0xc]
慢得多,不是吗?@apangin,哦,明白了。您是对的,仍然想知道为什么它在r11中不使用记录器指针,但它不能,因为基准循环包含一个充当屏障的易失性负载。因此,每次基准测试迭代都会加载localLog
字段,并一次又一次地检查其类型。@SerCe,谢谢您的努力,但结果是不同的。你能看看我的答案吗。
0.04% 0.04% │↗ 0x00007f6c25229320: mov r10d,DWORD PTR [r8+0xc] ;*getfield localLog
││ ; - org.openjdk.jmh.samples.LogBench::getTimeWithoutCheckLocal@1 (line 77)
││ ; - org.openjdk.jmh.samples.generated.LogBench_getTimeWithoutCheckLocal_jmhTest::getTimeWithoutCheckLocal_avgt_jmhStub@14 (line 163)
6.80% 7.29% ││ 0x00007f6c25229324: mov r11d,DWORD PTR [r12+r10*8+0x8]
││ ; implicit exception: dispatches to 0x00007f6c252294a5
0.02% ││ 0x00007f6c25229329: cmp r11d,0xf80197b1 ; {metadata('ch/qos/logback/classic/Logger')}
││ 0x00007f6c25229330: jne 0x00007f6c2522939b
││ 0x00007f6c25229332: lea r9,[r12+r10*8] ;*invokeinterface debug
││ 0x00007f6c25229336: mov ecx,DWORD PTR [r9+0x28] ;*getfield loggerContext
Benchmark Mode Cnt Score Error Units
LogBench.baseline avgt 3 2.124 ± 0.907 ns/op
LogBench.getTimeWithCheck avgt 3 0.695 ± 0.231 ns/op
LogBench.getTimeWithCheckBenchLocal avgt 3 1.608 ± 0.140 ns/op
LogBench.getTimeWithoutCheck avgt 3 0.675 ± 0.075 ns/op
LogBench.getTimeWithoutCheckLocal avgt 3 1.613 ± 0.176 ns/op
0x00007f2355205d33: call 0x00007f2355046020 ; OopMap{off=120}
;*invokespecial filterAndLog_1