Java Hashmap内存化比直接计算答案慢_Java_Performance_Hashmap_Memoization

Java Hashmap内存化比直接计算答案慢

java performance

Java Hashmap内存化比直接计算答案慢,java,performance,hashmap,memoization,Java,Performance,Hashmap,Memoization,我一直在研究项目Euler挑战，以帮助提高我的Java知识。特别是，我为编写了以下代码，它要求您查找从1000000以下的数字开始的最长Collatz链。它的工作原理是假设子链极有可能出现不止一次，并且通过将其存储在缓存中，不会进行冗余计算 Collatz.java： import java.util.HashMap; public class Collatz { private HashMap<Long, Integer> chainCache = new HashMap

我一直在研究项目Euler挑战，以帮助提高我的Java知识。特别是，我为编写了以下代码，它要求您查找从1000000以下的数字开始的最长Collatz链。它的工作原理是假设子链极有可能出现不止一次，并且通过将其存储在缓存中，不会进行冗余计算

Collatz.java：

import java.util.HashMap;

public class Collatz {
    private HashMap<Long, Integer> chainCache = new HashMap<Long, Integer>();

    public void initialiseCache() {
        chainCache.put((long) 1, 1);
    }

    private long collatzOp(long n) {
        if(n % 2 == 0) {
            return n/2;
        }
        else {
            return 3*n +1;
        }
    }

    public int collatzChain(long n) {
        if(chainCache.containsKey(n)) {
            return chainCache.get(n);
        }
        else {
            int count = 1 + collatzChain(collatzOp(n));     
            chainCache.put(n, count);
            return count;
        }
    }  
}

ProjectEuler14.java：

public class ProjectEuler14 {
    public static void main(String[] args) {
        Collatz col = new Collatz();
    
        col.initialiseCache();
        long limit = 1000000;
    
        long temp = 0;
        long longestLength = 0;
        long index = 1;
    
        for(long i = 1; i < limit; i++) {
            temp = col.collatzChain(i);
            if(temp > longestLength) {
                longestLength = temp;
                index = i;
            }
        }
        System.out.println(index + " has the longest chain, with length " + longestLength);
    }
}

public class NaiveProjectEuler14 {
    public static void main(String[] args) {
        int longest = 0;
        int numTerms = 0;
        int i;
        long j;

        for (i = 1; i <= 10000000; i++) {
            j = i;
            int currentTerms = 1;

            while (j != 1) {
                currentTerms++;
    
                if (currentTerms > numTerms){
                    numTerms = currentTerms;
                    longest = i;
                }
    
                if (j % 2 == 0){
                    j = j / 2;
                }
                else{
                    j = 3 * j + 1;
                }
            }
        }
        System.out.println("Longest: " + longest + " (" + numTerms + ").");
    }
}

这很有效。根据Windows Powershell的measure命令，执行大约需要1708毫秒1.708秒

然而，在阅读了论坛之后，我注意到一些人，他们编写了看似幼稚的代码，从头开始计算每个链，似乎得到了比我好得多的执行时间。我从概念上选择了其中一个答案，并将其翻译成Java：

NaiveProjectEuler14.java：

public class ProjectEuler14 {
    public static void main(String[] args) {
        Collatz col = new Collatz();
    
        col.initialiseCache();
        long limit = 1000000;
    
        long temp = 0;
        long longestLength = 0;
        long index = 1;
    
        for(long i = 1; i < limit; i++) {
            temp = col.collatzChain(i);
            if(temp > longestLength) {
                longestLength = temp;
                index = i;
            }
        }
        System.out.println(index + " has the longest chain, with length " + longestLength);
    }
}

public class NaiveProjectEuler14 {
    public static void main(String[] args) {
        int longest = 0;
        int numTerms = 0;
        int i;
        long j;

        for (i = 1; i <= 10000000; i++) {
            j = i;
            int currentTerms = 1;

            while (j != 1) {
                currentTerms++;
    
                if (currentTerms > numTerms){
                    numTerms = currentTerms;
                    longest = i;
                }
    
                if (j % 2 == 0){
                    j = j / 2;
                }
                else{
                    j = 3 * j + 1;
                }
            }
        }
        System.out.println("Longest: " + longest + " (" + numTerms + ").");
    }
}

在我的机器上，这也给出了正确的答案，但是它在0.502毫秒内给出了答案——是我原来程序速度的三分之一。起初，我认为创建HashMap可能会有一点开销，而且花费的时间太少，无法得出任何结论。但是，如果我将两个程序中的上限从1000000增加到10000000，NaiveProjectEuler14需要4709毫秒4.709秒，而ProjectEuler14需要25324毫秒25.324秒

为什么ProjectEuler14需要这么长时间？我能理解的唯一解释是，在HashMap数据结构中存储大量对会增加巨大的开销，但我不明白为什么会这样。我还尝试记录了在程序运行过程中存储的键、值对的数量，1000000的情况下为2168611对，10000000的情况下为21730849对，并向HashMap构造函数提供略多于该数量的数据，以便它最多只需调整自身大小一次，但这似乎并不影响执行时间

有谁能解释为什么记忆版的速度要慢得多吗？

这一不幸的现实有一些原因：

代替containsKey，执行立即get并检查null 代码使用一个额外的方法来调用映射存储包裹的对象整数，长为基元类型 JIT编译器将字节码转换为机器码，可以进行更多的计算缓存并不涉及很大的百分比，比如斐波那契可比

public static void main(String[] args) {
    int longest = 0;
    int numTerms = 0;
    int i;
    long j;

    Map<Long, Integer> map = new HashMap<>();

    for (i = 1; i <= 10000000; i++) {
        j = i;

        Integer terms = map.get(i);
        if (terms != null) {
            continue;
        }
        int currentTerms = 1;

        while (j != 1) {
            currentTerms++;

            if (currentTerms > numTerms){
                numTerms = currentTerms;
                longest = i;
            }

            if (j % 2 == 0){
                j = j / 2;

                // Maybe check the map only here
                Integer m = map.get(j);
                if (m != null) {
                    currentTerms += m;
                    break;
                }
            }
            else{
                j = 3 * j + 1;
            }
        }
        map.put(j, currentTerms);
    }
    System.out.println("Longest: " + longest + " (" + numTerms + ").");
}

这并不能真正做到充分的记忆。对于增加参数，不检查3*j+1会在一定程度上减少未命中，但也可能跳过meoized值

每一次调用都需要大量计算才能实现记忆化。如果函数由于深度递归而不是计算而花费了很长时间，则每个函数调用的记忆开销是负的。

您是否尝试过增加Hashmap的初始容量？而且您的Hashmap只是一个数组，为什么不直接使用数组呢？它会更快，不涉及自动装箱。@krzyk是的，正如我在倒数第二段中提到的，我尝试将初始容量增加到存储的键、值对/0.75 0.75是默认的加载因子，并且执行时间没有变化。您是否尝试过使用探查器来查看它们花费了多少时间？给出的答案可能是正确的，但正如你们想知道为什么有些东西慢，然后测量它。