如何编写Java代码以允许SSE使用和边界检查消除（或其他高级优化）？情况：_Java_Performance_Optimization_Jvm Hotspot_Bounds Check Elimination

如何编写Java代码以允许SSE使用和边界检查消除（或其他高级优化）？情况：

java performance optimization

如何编写Java代码以允许SSE使用和边界检查消除（或其他高级优化）？情况：,java,performance,optimization,jvm-hotspot,bounds-check-elimination,Java,Performance,Optimization,Jvm Hotspot,Bounds Check Elimination,我正在优化LZF压缩算法的纯java实现，它涉及大量字节[]访问和用于哈希和比较的基本int数学。性能确实很重要，因为压缩的目标是减少I/O需求。我不会发布代码，因为它还没有被清理，可能会被重组问题是：如何编写代码以允许它使用更快的SSE操作JIT编译成表单如何构造它，以便编译器可以轻松地消除数组边界检查关于特定数学运算的相对速度，是否有广泛的参考资料（需要多少增量/减量才能等于正常的加法/减法，移位或数组访问的速度有多快）我怎样才能优化分支呢？最好是有许多短体的条件语句，还是有几个

我正在优化LZF压缩算法的纯java实现，它涉及大量字节[]访问和用于哈希和比较的基本int数学。性能确实很重要，因为压缩的目标是减少I/O需求。我不会发布代码，因为它还没有被清理，可能会被重组

问题是：

如何编写代码以允许它使用更快的SSE操作JIT编译成表单
如何构造它，以便编译器可以轻松地消除数组边界检查
关于特定数学运算的相对速度，是否有广泛的参考资料（需要多少增量/减量才能等于正常的加法/减法，移位或数组访问的速度有多快）
我怎样才能优化分支呢？最好是有许多短体的条件语句，还是有几个长体，还是有嵌套条件的短体
对于当前的1.6 JVM，在System.arraycopy击败复制循环之前，必须复制多少个元素

我已经做了：在我被攻击进行过早优化之前：基本算法已经非常优秀，但Java实现的速度还不到同等C语言的2/3。我已经用System.arraycopy替换了复制循环，致力于优化循环，并消除了不必要的操作

为了提高性能，我大量使用位旋转和将字节打包成整数，以及移位和掩蔽

出于法律原因，我无法查看类似库中的实现，而且现有库的许可条款过于严格，无法使用

良好（可接受）答案的要求：

不可接受的答案：“这更快”，但没有解释速度和原因，或者没有使用JIT编译器进行测试
临界答案：在Hotspot 1.4之前没有进行过任何测试
基本答案：将提供一个一般规则和解释，说明为什么它在编译器级别更快，以及大约快多少
好答案：包括两个代码示例以进行演示
优秀答案：具有JRE 1.5和1.6的基准
完美答案：由在HotSpot编译器上工作的人提供，他可以充分解释或参考要使用的优化条件，以及通常要快多少。可能包括由HotSpot生成的java代码和示例汇编代码

另外：如果任何人都有详细介绍热点优化和分支性能的链接，欢迎访问。我对字节码有足够的了解，所以一个站点在字节码而不是源代码级别上分析性能会很有帮助

（编辑）部分答案：边界检查省略：这是从热点内部wiki提供的链接中获取的，位于：

HotSpot将消除具有以下条件的所有for循环中的边界检查：

数组是循环不变的（不在循环中重新分配）
索引变量具有恒定步幅（以恒定量增加/减少，如果可能，仅在一个位置）
数组由变量的线性函数索引

示例：

int val=array[index*2+5]

或：

int val=array[index+9]

非：

int val=array[Math.min（var，index）+7]

早期版本的代码：这是一个示例版本。不要窃取它，因为它是H2数据库项目代码的未发布版本。最终版本将是开源的。这是对此处代码的优化：

从逻辑上讲，这与开发版本相同，但使用for（…）循环来逐步完成输入，使用if/else循环来完成文本和反向引用模式之间的不同逻辑。它减少了阵列访问和模式之间的检查

public int compressNewer(final byte[] in, final int inLen, final byte[] out, int outPos){
        int inPos = 0;
        // initialize the hash table
        if (cachedHashTable == null) {
            cachedHashTable = new int[HASH_SIZE];
        } else {
            System.arraycopy(EMPTY, 0, cachedHashTable, 0, HASH_SIZE);
        }
        int[] hashTab = cachedHashTable;
        // number of literals in current run
        int literals = 0;
        int future = first(in, inPos);
        final int endPos = inLen-4;

        // Loop through data until all of it has been compressed
        while (inPos < endPos) {
                future = (future << 8) | in[inPos+2] & 255;
//                hash = next(hash,in,inPos);
                int off = hash(future);
                // ref = possible index of matching group in data
                int ref = hashTab[off];
                hashTab[off] = inPos;
                off = inPos - ref - 1; //dropped for speed

                // has match if bytes at ref match bytes in future, etc
                // note: using ref++ rather than ref+1, ref+2, etc is about 15% faster
                boolean hasMatch = (ref > 0 && off <= MAX_OFF && (in[ref++] == (byte) (future >> 16) && in[ref++] == (byte)(future >> 8) && in[ref] == (byte)future));

                ref -=2; // ...EVEN when I have to recover it
                // write out literals, if max literals reached, OR has a match
                if ((hasMatch && literals != 0) || (literals == MAX_LITERAL)) {
                    out[outPos++] = (byte) (literals - 1);
                    System.arraycopy(in, inPos - literals, out, outPos, literals);
                    outPos += literals;
                    literals = 0;
                }

                //literal copying split because this improved performance by 5%

                if (hasMatch) { // grow match as much as possible
                    int maxLen = inLen - inPos - 2;
                    maxLen = maxLen > MAX_REF ? MAX_REF : maxLen;
                    int len = 3;
                    // grow match length as possible...
                    while (len < maxLen && in[ref + len] == in[inPos + len]) {
                        len++;
                    }
                    len -= 2;

                    // short matches write length to first byte, longer write to 2nd too
                    if (len < 7) {
                        out[outPos++] = (byte) ((off >> 8) + (len << 5));
                    } else {
                        out[outPos++] = (byte) ((off >> 8) + (7 << 5));
                        out[outPos++] = (byte) (len - 7);
                    }
                    out[outPos++] = (byte) off;
                    inPos += len;

                    //OPTIMIZATION: don't store hashtable entry for last byte of match and next byte
                    // rebuild neighborhood for hashing, but don't store location for this 3-byte group
                    // improves compress performance by ~10% or more, sacrificing ~2% compression...
                    future = ((in[inPos+1] & 255) << 16) | ((in[inPos + 2] & 255) << 8) | (in[inPos + 3] & 255);
                    inPos += 2;
                } else { //grow literals
                    literals++;
                    inPos++;
                } 
        }
        
        // write out remaining literals
        literals += inLen-inPos;
        inPos = inLen-literals;
        if(literals >= MAX_LITERAL){
            out[outPos++] = (byte)(MAX_LITERAL-1);
            System.arraycopy(in, inPos, out, outPos, MAX_LITERAL);
            outPos += MAX_LITERAL;
            inPos += MAX_LITERAL;
            literals -= MAX_LITERAL;
        }
        if (literals != 0) {
            out[outPos++] = (byte) (literals - 1);
            System.arraycopy(in, inPos, out, outPos, literals);
            outPos += literals;
        }
        return outPos; 
    }

public int compressNewer（最终字节[]输入，最终整数输入，最终字节[]输出，整数输出）{
int-inPos=0；
//初始化哈希表
if（cachedHashTable==null）{
cachedHashTable=newint[HASH_SIZE]；
}否则{
System.arraycopy（空，0，cachedHashTable，0，散列大小）；
}
int[]hashTab=cachedHashTable；
//当前运行中的文本数
int字面值=0；
int future=第一（in，inPos）；
最终int endPos=inLen-4；
//循环遍历数据，直到所有数据都被压缩
while（inPos16）&&in[ref++]=（byte）（future>>8）&&in[ref]=（byte）future）；
ref-=2；/…即使我必须恢复它
//如果达到最大文字量或有匹配项，则写出文字量
if（（hasMatch&&literals！=0）| |（literals==MAX_LITERAL））{
out[outPos++]=（字节）（字面值-1）；
arraycopy（输入，输入-文字，输出，输出，文字）；
输出+=文字；
字面值=0；
}
//文字复制拆分，因为这将性能提高了5%
如果（hasMatch）{//尽可能地增加匹配
int maxLen=inLen-inPos-2；
maxLen=maxLen>MAX\u REF？MAX\u REF:maxLen；
int len=3；
//尽可能增加匹配长度。。。
而（len