Java **BUSTED**如何使用sun.misc.Unsafe加快字节[]查找速度?
我正在试验不安全的方法来迭代内存,而不是迭代字节[]中的值。使用“不安全”分配内存块。内存足以容纳65536字节的值 我正在尝试:Java **BUSTED**如何使用sun.misc.Unsafe加快字节[]查找速度?,java,performance,unsafe,Java,Performance,Unsafe,我正在试验不安全的方法来迭代内存,而不是迭代字节[]中的值。使用“不安全”分配内存块。内存足以容纳65536字节的值 我正在尝试: char aChar = some character if ((byte) 0 == (unsafe.getByte(base_address + aChar) & mask)){ // do something } 而不是: char aChar = some character if ((byte) 0 == ( lookup[aChar] &
char aChar = some character
if ((byte) 0 == (unsafe.getByte(base_address + aChar) & mask)){
// do something
}
而不是:
char aChar = some character
if ((byte) 0 == ( lookup[aChar] & mask )){
// do something
}
我认为不安全可以比使用常规数组访问每个索引的索引检查更快地访问内存 jvm将有一个特殊的op(不安全的),它将以某种方式使常规数组访问和迭代更快,这只是一厢情愿的想法。在我看来,jvm可以很好地处理正常的byte[]迭代,并尽可能快地使用普通的、未经修饰的java代码来完成这些迭代 @米利穆斯击中了俗话所说的“一针见血” “不安全可能对很多事情都有用,但这种级别的微优化不在其中。–millimoose” 在非常严格的有限环境中,使用“不安全”会更快:
- (仅限64位jvm)单个65535字节[]查找速度更快,每次测试只执行一次。在这种情况下,64位jvm上的Unsafelookup8b速度提高了24%。如果测试重复进行,使每个测试进行两次,那么正常方法现在比不安全方法快30%。在冷jvm上的纯解释模式下,不安全的速度要快得多——但这只是第一次,而且只适用于较小的数组大小。在32位标准OracleJVM7.x上,正常运行速度是使用不安全运行速度的三倍
- 在Oracle java 64位和32位虚拟机上速度较慢
- 无论操作系统和机器体系结构如何(32位和64位),速度都较慢
- 即使调用了
jvm选项,速度也会变慢server
- “不安全”比9%或更慢(32位jvm上代码中的1_GB数组和UnsafeLookup_8B(最快的一个)(64位更慢??)
- 在64位jvm上,不安全的速度比下面代码中的234%或更多(1 MB数组)和UnsafeLookup_1B(最快的)慢
C:\Users\wilf>java -Xms1600m -Xprof -jar "S:\wilf\testing\dist\testing.jar"
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1967737 us.
use unsafeLookup_1B()...
Not found '0'
time : 2923367 us.
use unsafeLookup_8B()...
Not found '0'
time : 2495663 us.
Flat profile of 26.35 secs (2018 total ticks): main
Interpreted + native Method
0.0% 1 + 0 test.StackOverflow.main
0.0% 1 + 0 Total interpreted
Compiled + native Method
67.8% 1369 + 0 test.StackOverflow.main
11.7% 236 + 0 test.StackOverflow.unsafeLookup_8B
11.2% 227 + 0 test.StackOverflow.unsafeLookup_1B
9.1% 184 + 0 test.StackOverflow.normalLookup
99.9% 2016 + 0 Total compiled
Stub + native Method
0.0% 0 + 1 sun.misc.Unsafe.getLong
0.0% 0 + 1 Total stub
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 26.39 seconds:
100.0% 2023 Received ticks
C:\Users\wilf>java -version
java version "1.7.0_07"
Java(TM) SE Runtime Environment (build 1.7.0_07-b11)
Java HotSpot(TM) Client VM (build 23.3-b01, mixed mode, sharing)
CPU为:Intel Core 2 Duo E4600@2.4GHZ 4.00GB(可用3.25GB)
操作系统:Windows 7(32)
使用Windows 7_64 32位java在4核AMD64上运行测试:
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1631142 us.
use unsafeLookup_1B()...
Not found '0'
time : 2365214 us.
use unsafeLookup_8B()...
Not found '0'
time : 1783320 us.
use normalLookup()...
Not found '0'
time : 655146 us.
use unsafeLookup_1B()...
Not found '0'
time : 904783 us.
use unsafeLookup_8B()...
Not found '0'
time : 764427 us.
Flat profile of 6.34 secs (13 total ticks): main
Interpreted + native Method
23.1% 3 + 0 java.io.PrintStream.println
23.1% 3 + 0 test.StackOverflow.unsafeLookup_8B
15.4% 2 + 0 test.StackOverflow.main
7.7% 1 + 0 java.io.DataInputStream.<init>
69.2% 9 + 0 Total interpreted
Compiled + native Method
7.7% 0 + 1 test.StackOverflow.unsafeLookup_1B
7.7% 0 + 1 test.StackOverflow.main
7.7% 0 + 1 test.StackOverflow.normalLookup
7.7% 0 + 1 test.StackOverflow.unsafeLookup_8B
30.8% 0 + 4 Total compiled
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 6.35 seconds:
100.0% 14 Received ticks
42.9% 6 Compilation
在使用Windows 7_64、64位java的4核AMD64上运行测试:
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1631142 us.
use unsafeLookup_1B()...
Not found '0'
time : 2365214 us.
use unsafeLookup_8B()...
Not found '0'
time : 1783320 us.
use normalLookup()...
Not found '0'
time : 655146 us.
use unsafeLookup_1B()...
Not found '0'
time : 904783 us.
use unsafeLookup_8B()...
Not found '0'
time : 764427 us.
Flat profile of 6.34 secs (13 total ticks): main
Interpreted + native Method
23.1% 3 + 0 java.io.PrintStream.println
23.1% 3 + 0 test.StackOverflow.unsafeLookup_8B
15.4% 2 + 0 test.StackOverflow.main
7.7% 1 + 0 java.io.DataInputStream.<init>
69.2% 9 + 0 Total interpreted
Compiled + native Method
7.7% 0 + 1 test.StackOverflow.unsafeLookup_1B
7.7% 0 + 1 test.StackOverflow.main
7.7% 0 + 1 test.StackOverflow.normalLookup
7.7% 0 + 1 test.StackOverflow.unsafeLookup_8B
30.8% 0 + 4 Total compiled
Flat profile of 0.00 secs (1 total ticks): DestroyJavaVM
Thread-local ticks:
100.0% 1 Blocked (of total)
Global summary of 6.35 seconds:
100.0% 14 Received ticks
42.9% 6 Compilation
使用normalLookup()。。。
找不到“0”
时间:655146美国。
使用unsafeLookup_1B()。。。
找不到“0”
时间:904783美国。
使用unsafelookup8b()。。。
找不到“0”
时间:764427美国。
6.34秒的平面轮廓(13个总刻度):主
解释+本机方法
23.1%3+0 java.io.PrintStream.println
23.1%3+0测试。堆栈溢出。不安全的OOKUP8b
15.4%2+0 test.StackOverflow.main
7.7%1+0 java.io.DataInputStream。
69.2%9+0总解释
编译+本机方法
7.7%0+1测试。堆栈溢出。不安全的OOKUP_1B
7.7%0+1 test.StackOverflow.main
7.7%0+1 test.StackOverflow.normalLookup
7.7%0+1测试。堆栈溢出。不安全的OOKUP8b
30.8%0+4合计
0.00秒的平面轮廓(1个总刻度):DestroyJavaVM
线程本地标记:
100.0%1被阻止(总数)
6.35秒的全局摘要:
100.0%14个收到滴答声
42.9%6
我认为您发布的两个函数基本相同,因为它们只读取1字节,然后将其转换为int并进行进一步比较
每次读取4字节的int或8字节的long要有效得多。我编写了两个函数来做同样的事情:比较两个字节[]的内容,看看它们是否相同:
职能1:
public static boolean hadoopEquals(byte[] b1, byte[] b2)
{
if(b1 == b2)
{
return true;
}
if(b1.length != b2.length)
{
return false;
}
// Bring WritableComparator code local
for(int i = 0;i < b1.length; ++i)
{
int a = (b1[i] & 0xff);
int b = (b2[i] & 0xff);
if (a != b)
{
return false;
}
}
return true;
}
============================================================================
嗨,威尔夫,
我使用您的代码创建一个测试类,如下所示,该类比较3个函数查找字节数组中第一个0的速度:
package test;
import java.lang.reflect.Field;
import sun.misc.Unsafe;
/**
* Test the speed in looking up the 1st 0 in a byte array
* Set -Xms the same as -Xms to avoid Heap reallocation
*
* @author yellowb
*
*/
public class StackOverflow
{
public static Unsafe UnSafe;
public static Unsafe getUnsafe() throws SecurityException,
NoSuchFieldException, IllegalArgumentException,
IllegalAccessException
{
Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
theUnsafe.setAccessible(true);
Unsafe unsafe = (Unsafe) theUnsafe.get(null);
return unsafe;
}
/**
* use 'byte[index]' form to read 1 byte every time
* @param buf
*/
public static void normalLookup(byte[] buf)
{
for (int i = 0; i < buf.length; ++i)
{
if ((byte) 0 == buf[i])
{
System.out.println("The 1st '0' is at position : " + i);
return;
}
}
System.out.println("Not found '0'");
}
/**
* use Unsafe.getByte to read 1 byte every time directly from the memory
* @param buf
*/
public static void unsafeLookup_1B(byte[] buf)
{
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
for (int i = 0; i < buf.length; ++i)
{
byte b = UnSafe.getByte(buf, (long) (baseOffset + i));
if (0 == ((int) b & 0xFF))
{
System.out.println("The 1st '0' is at position : " + i);
return;
}
}
System.out.println("Not found '0'");
}
/**
* use Unsafe.getLong to read 8 byte every time directly from the memory
* @param buf
*/
public static void unsafeLookup_8B(byte[] buf)
{
int baseOffset = UnSafe.arrayBaseOffset(byte[].class);
//The first (numLongs * 8) bytes will be read by Unsafe.getLong in below loop
int numLongs = buf.length / 8;
long currentOffset = 0L;
for (int i = 0; i < numLongs; ++i)
{
currentOffset = baseOffset + (i * 8); //the step is 8 bytes
long l = UnSafe.getLong(buf, currentOffset);
//Compare each byte(in the 8-Byte long) to 0
//PS:x86 cpu is little-endian mode
if (0L == (l & 0xFF))
{
System.out.println("The 1st '0' is at position : " + (i * 8));
return;
}
if (0L == (l & 0xFF00L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 1));
return;
}
if (0L == (l & 0xFF0000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 2));
return;
}
if (0L == (l & 0xFF000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 3));
return;
}
if (0L == (l & 0xFF00000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 4));
return;
}
if (0L == (l & 0xFF0000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 5));
return;
}
if (0L == (l & 0xFF000000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 6));
return;
}
if (0L == (l & 0xFF00000000000000L))
{
System.out.println("The 1st '0' is at position : " + (i * 8 + 7));
return;
}
}
//If some rest bytes exists
int rest = buf.length % 8;
if(0 != rest)
{
currentOffset = currentOffset + 8;
//Because the length of rest bytes < 8,we have to read them one by one
for(; currentOffset < (baseOffset + buf.length); ++currentOffset)
{
byte b = UnSafe.getByte(buf, (long)currentOffset);
if (0 == ((int) b & 0xFF))
{
System.out.println("The 1st '0' is at position : " + (currentOffset - baseOffset));
return;
}
}
}
System.out.println("Not found '0'");
}
public static void main(String[] args) throws SecurityException,
NoSuchFieldException, IllegalArgumentException,
IllegalAccessException
{
UnSafe = getUnsafe();
int len = 1024 * 1024 * 1024; //1G
long startTime = 0L;
long endTime = 0L;
System.out.println("initialize data...");
byte[] byteArray1 = new byte[len];
for (int i = 0; i < len; ++i)
{
byteArray1[i] = (byte) (i % 128 + 1); //No byte will equal to 0
}
//If you want to set one byte to 0,uncomment the below statement
// byteArray1[2500] = (byte)0;
System.out.println("initialize data done!");
System.out.println("use normalLookup()...");
startTime = System.nanoTime();
normalLookup(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
System.out.println("use unsafeLookup_1B()...");
startTime = System.nanoTime();
unsafeLookup_1B(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
System.out.println("use unsafeLookup_8B()...");
startTime = System.nanoTime();
unsafeLookup_8B(byteArray1);
endTime = System.nanoTime();
System.out.println("time : " + ((endTime - startTime) / 1000) + " us.");
}
}
结果表明,即使使用Unsafe.getByte()每次读取1个字节,也要比定期迭代字节[]快得多。而读取8字节长的字节要快得多
我认为不安全可以比使用常规数组访问每个索引的索引检查更快地访问内存
范围检查可能不是一个因素的一个可能原因是JIT编译器的优化器。由于数组的大小从未改变,优化器可能会“提升”所有范围检查,并在循环开始时执行一次
相反,JIT编译器可能无法优化(例如内联)不安全的.getByte()调用。或者,getByte
方法可能有读取障碍…)
然而,这只是猜测。确保的方法是让JVM为这两种情况转储JIT编译的本机代码,并逐个指令进行比较。不安全的方法可能被标记为本机方法,但这并不意味着它们一定是JNI。几乎所有不安全的方法都是intrinsic(请参阅这里的一篇短文:),对于Sun JVM,它们将被转换为单个汇编指令(在许多情况下),对于其他JVM,它们可能擅长处理intrinsic,也可能不擅长处理intrinsic,并可能将它们转换为JNI调用或普通java调用。据我所知,JRockit倾向于采用JNI方式,Android JVM也是如此。您是否可以尝试使用Xprof标志并将输出粘贴到此处(java-Xprof SampleClass>output.txt),您需要为基准测试提供代码。完全有可能您没有为JVM做足够的预热来JIT不安全的版本。然而,从提供的代码来看,我认为不会有任何加速。仅当您正在读取大于一个字节的类型时,不安全将提供优势,例如,您希望读取单个长字节而不是8个单独的字节和ap
initialize data...
initialize data done!
use normalLookup()...
Not found '0'
time : 1271781 us.
use unsafeLookup_1B()...
Not found '0'
time : 716898 us.
use unsafeLookup_8B()...
Not found '0'
time : 591689 us.