Java 通过JMH在sun.misc.Unsafe.compareAndSwap测量中的奇怪行为
我决定用不同的锁定策略来测量增量,并为此使用JMH。 我使用JMH来检查吞吐量和平均时间,并使用简单的定制测试来检查正确性。 有六种战略:Java 通过JMH在sun.misc.Unsafe.compareAndSwap测量中的奇怪行为,java,cas,microbenchmark,jmh,Java,Cas,Microbenchmark,Jmh,我决定用不同的锁定策略来测量增量,并为此使用JMH。 我使用JMH来检查吞吐量和平均时间,并使用简单的定制测试来检查正确性。 有六种战略: 原子计数 读写锁定计数 与volatile同步 无易失性的同步块 sun.misc.Unsafe.compareAndSwap sun.misc.Unsafe.getandad 不同步计数 基准代码: @State(Scope.Benchmark) @BenchmarkMode({Mode.Throughput, Mode.AverageTime}) @
- 原子计数
- 读写锁定计数
- 与volatile同步
- 无易失性的同步块
- sun.misc.Unsafe.compareAndSwap
- sun.misc.Unsafe.getandad
- 不同步计数
@State(Scope.Benchmark)
@BenchmarkMode({Mode.Throughput, Mode.AverageTime})
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Fork(1)
@Warmup(iterations = 5)
@Measurement(iterations = 5)
public class UnsafeCounter_Benchmark {
public Counter unsync, syncNoV, syncV, lock, atomic, unsafe, unsafeGA;
@Setup(Level.Iteration)
public void prepare() {
unsync = new UnsyncCounter();
syncNoV = new SyncNoVolatileCounter();
syncV = new SyncVolatileCounter();
lock = new LockCounter();
atomic = new AtomicCounter();
unsafe = new UnsafeCASCounter();
unsafeGA = new UnsafeGACounter();
}
@Benchmark
public void unsyncCount() {
unsyncCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void unsyncCounter() {
unsync.increment();
}
@Benchmark
public void syncNoVCount() {
syncNoVCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void syncNoVCounter() {
syncNoV.increment();
}
@Benchmark
public void syncVCount() {
syncVCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void syncVCounter() {
syncV.increment();
}
@Benchmark
public void lockCount() {
lockCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void lockCounter() {
lock.increment();
}
@Benchmark
public void atomicCount() {
atomicCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void atomicCounter() {
atomic.increment();
}
@Benchmark
public void unsafeCount() {
unsafeCounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void unsafeCounter() {
unsafe.increment();
}
@Benchmark
public void unsafeGACount() {
unsafeGACounter();
}
@CompilerControl(CompilerControl.Mode.DONT_INLINE)
public void unsafeGACounter() {
unsafeGA.increment();
}
public static void main(String[] args) throws RunnerException {
Options baseOpts = new OptionsBuilder()
.include(UnsafeCounter_Benchmark.class.getSimpleName())
.threads(100)
.jvmArgs("-ea")
.build();
new Runner(baseOpts).run();
}
}
试验结果如下:
jdk8u20
正如我所期望的,除了UnsafeCounter\u Benchmark.unsafeCount
之外,大多数度量都是使用sun.misc.Unsafe.compareAndSwapLong
和while
循环。它是最慢的锁定
public void increment() {
long before = counter;
while (!unsafe.compareAndSwapLong(this, offset, before, before + 1L)) {
before = counter;
}
}
我认为低性能是因为while循环和JMH产生了更高的争用,但是当我通过执行器检查正确性时,我得到了我预期的数字:
Counter result: UnsyncCounter 97538676
Time passed in ms:259
Counter result: AtomicCounter 100000000
Time passed in ms:1805
Counter result: LockCounter 100000000
Time passed in ms:3904
Counter result: SyncNoVolatileCounter 100000000
Time passed in ms:14227
Counter result: SyncVolatileCounter 100000000
Time passed in ms:19224
Counter result: UnsafeCASCounter 100000000
Time passed in ms:8077
Counter result: UnsafeGACounter 100000000
Time passed in ms:2549
正确性测试代码:
public class UnsafeCounter_Test {
static class CounterClient implements Runnable {
private Counter c;
private int num;
public CounterClient(Counter c, int num) {
this.c = c;
this.num = num;
}
@Override
public void run() {
for (int i = 0; i < num; i++) {
c.increment();
}
}
}
public static void makeTest(Counter counter) throws InterruptedException {
int NUM_OF_THREADS = 1000;
int NUM_OF_INCREMENTS = 100000;
ExecutorService service = Executors.newFixedThreadPool(NUM_OF_THREADS);
long before = System.currentTimeMillis();
for (int i = 0; i < NUM_OF_THREADS; i++) {
service.submit(new CounterClient(counter, NUM_OF_INCREMENTS));
}
service.shutdown();
service.awaitTermination(1, TimeUnit.MINUTES);
long after = System.currentTimeMillis();
System.out.println("Counter result: " + counter.getClass().getSimpleName() + " " + counter.getCounter());
System.out.println("Time passed in ms:" + (after - before));
}
public static void main(String[] args) throws InterruptedException {
makeTest(new UnsyncCounter());
makeTest(new AtomicCounter());
makeTest(new LockCounter());
makeTest(new SyncNoVolatileCounter());
makeTest(new SyncVolatileCounter());
makeTest(new UnsafeCASCounter());
makeTest(new UnsafeGACounter());
}
}
公共类未安全计数器\u测试{
静态类反诉实现可运行{
私人柜台c;
私有整数;
公共反诉(反诉人c,int num){
这个.c=c;
this.num=num;
}
@凌驾
公开募捐{
for(int i=0;i
我知道这是一个非常糟糕的测试,但在这种情况下,它的速度是Sync变体的两倍,一切都按预期进行。
有人能澄清描述的行为吗?
有关更多信息,请参见GitHub repo:,大声思考:令人惊讶的是,人们经常做90%的枯燥工作,而把10%(乐趣开始的地方)留给其他人!好吧,我要享受所有的乐趣
让我先在我的i7-4790K,8u40 EA上重复这个实验:
Benchmark Mode Samples Score Error Units
UnsafeCounter_Benchmark.atomicCount thrpt 5 47.669 ± 18.440 ops/us
UnsafeCounter_Benchmark.lockCount thrpt 5 14.497 ± 7.815 ops/us
UnsafeCounter_Benchmark.syncNoVCount thrpt 5 11.618 ± 2.130 ops/us
UnsafeCounter_Benchmark.syncVCount thrpt 5 11.337 ± 4.532 ops/us
UnsafeCounter_Benchmark.unsafeCount thrpt 5 7.452 ± 1.042 ops/us
UnsafeCounter_Benchmark.unsafeGACount thrpt 5 43.332 ± 3.435 ops/us
UnsafeCounter_Benchmark.unsyncCount thrpt 5 102.773 ± 11.943 ops/us
的确,unsafeCount
test似乎有些可疑。实际上,在验证之前,您必须假设所有数据都是可疑的。对于nanobenchmarks,您必须验证生成的代码,以查看您是否真正测量了您想要测量的东西。在JMH中,使用-prof perfasm
可以非常快速地执行。事实上,如果你看看那里最热的unsacecount
,你会发现一些有趣的事情:
0.12% 0.04% 0x00007fb45518e7d1: mov 0x10(%r10),%rax
17.03% 23.44% 0x00007fb45518e7d5: test %eax,0x17318825(%rip)
0.21% 0.07% 0x00007fb45518e7db: mov 0x18(%r10),%r11 ; getfield offset
30.33% 10.77% 0x00007fb45518e7df: mov %rax,%r8
0.00% 0x00007fb45518e7e2: add $0x1,%r8
0.01% 0x00007fb45518e7e6: cmp 0xc(%r10),%r12d ; typecheck
0x00007fb45518e7ea: je 0x00007fb45518e80b ; bail to v-call
0.83% 0.48% 0x00007fb45518e7ec: lock cmpxchg %r8,(%r10,%r11,1)
33.27% 25.52% 0x00007fb45518e7f2: sete %r8b
0.12% 0.01% 0x00007fb45518e7f6: movzbl %r8b,%r8d
0.03% 0.04% 0x00007fb45518e7fa: test %r8d,%r8d
0x00007fb45518e7fd: je 0x00007fb45518e7d1 ; back branch
翻译:a)offset
字段在每次迭代中都会被重新读取——因为CAS内存效应意味着易变读取,因此需要悲观地重新读取该字段;b) 有趣的是,出于同样的原因,不安全的
字段也被重新读取以进行打字检查
这就是为什么高性能代码应该是这样的:
--- a/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java
+++ b/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java
@@ -5,13 +5,13 @@ import sun.misc.Unsafe;
public class UnsafeCASCounter implements Counter {
private volatile long counter = 0;
- private final Unsafe unsafe = UnsafeHelper.unsafe;
- private long offset;
- {
+ private static final Unsafe unsafe = UnsafeHelper.unsafe;
+ private static final long offset;
+ static {
try {
offset = unsafe.objectFieldOffset(UnsafeCASCounter.class.getDeclaredField("counter"));
} catch (NoSuchFieldException e) {
- e.printStackTrace();
+ throw new IllegalStateException("Whoops!");
}
}
如果这样做,unsacecount
性能会立即提升:
Benchmark Mode Samples Score Error Units
UnsafeCounter_Benchmark.unsafeCount thrpt 5 9.733 ± 0.673 ops/us
…考虑到错误界限,这与现在的同步测试非常接近。如果现在查看-prof perfasm
,这是一个unsafeCount
循环:
0.08% 0.02% 0x00007f7575191900: mov 0x10(%r10),%rax
28.09% 28.64% 0x00007f7575191904: test %eax,0x161286f6(%rip)
0.23% 0.08% 0x00007f757519190a: mov %rax,%r11
0x00007f757519190d: add $0x1,%r11
0x00007f7575191911: lock cmpxchg %r11,0x10(%r10)
47.27% 23.48% 0x00007f7575191917: sete %r8b
0.10% 0x00007f757519191b: movzbl %r8b,%r8d
0.02% 0x00007f757519191f: test %r8d,%r8d
0x00007f7575191922: je 0x00007f7575191900
这个循环非常紧密,似乎没有什么能让它走得更快。我们花了大部分时间加载“更新的”值并实际计算它。但是我们有很多争论!要确定争用是否是主要原因,让我们添加退避:
--- a/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java
+++ b/utils bench/src/main/java/org/kirmit/utils/unsafe/concurrency/UnsafeCASCounter.java
@@ -20,6 +21,7 @@ public class UnsafeCASCounter implements Counter {
long before = counter;
while (!unsafe.compareAndSwapLong(this, offset, before, before + 1L)) {
before = counter;
+ Blackhole.consumeCPU(1000);
}
}
…运行:
Benchmark Mode Samples Score Error Units
UnsafeCounter_Benchmark.unsafeCount thrpt 5 99.869 ± 107.933 ops/us
瞧。我们在循环中做了更多的工作,但它使我们避免了很多竞争。在年之前,我试图解释这一点,回到那里,阅读更多关于基准测试方法的内容可能会更好,尤其是在测量重型操作时。这突出了整个实验中的陷阱,不仅仅是unsafeCount
OP和感兴趣的读者练习:解释为什么unsafeGACount
和atomicCount
比其他测试执行得快得多。你现在有工具了
另外,在有C(C
p.p.S.时间检查:10分钟做分析和附加实验,20分钟写出来。手工复制结果浪费了多少时间?;) 也许我看错了表,但是每微秒20次运算看起来比每微秒12次运算快。这难道不会使同步变体比未同步变体更快吗?另外,与其他可能导致错误的结果相比,unsacecount上的5.9us/op错误看起来相当大。理论上,sync变体应该是我定制测试中显示的最慢的(以检查正确性)。但在JMH版本中,UnsafeCAS是最慢的。它看起来有点可疑,我只能说明原因。嗯,如果没有JMH API,什么是好的CPU(1000)替代方案。在你的记忆里
Benchmark Mode Samples Score Error Units
UnsafeCounter_Benchmark.unsafeCount thrpt 5 99.869 ± 107.933 ops/us