Java 为什么减少循环次数并不能加快程序的速度？_Java_Performance_Matrix Multiplication

Java 为什么减少循环次数并不能加快程序的速度？

java performance

Java 为什么减少循环次数并不能加快程序的速度？,java,performance,matrix-multiplication,Java,Performance,Matrix Multiplication,我有一个做很多矩阵乘法的程序。我想我应该通过减少代码中的循环数来加速它，看看它会快多少（稍后我将尝试使用矩阵数学库）。结果它一点也不快。我已经能够用一些示例代码复制这个问题。我猜想testOne（）会比testTwo（）快，因为它不会创建任何新的数组，而且它有三分之一的循环。在我的机器上，它的运行时间是原来的两倍：具有5000个纪元的testOne的持续时间：657，循环计数：64000000 具有5000个纪元的测试二的持续时间：365，循环计数：192000000 我猜multOne（）比

我有一个做很多矩阵乘法的程序。我想我应该通过减少代码中的循环数来加速它，看看它会快多少（稍后我将尝试使用矩阵数学库）。结果它一点也不快。我已经能够用一些示例代码复制这个问题。我猜想testOne（）会比testTwo（）快，因为它不会创建任何新的数组，而且它有三分之一的循环。在我的机器上，它的运行时间是原来的两倍：

具有5000个纪元的testOne的持续时间：657，循环计数：64000000

具有5000个纪元的测试二的持续时间：365，循环计数：192000000

我猜

multOne（）

比

multTwo（）

慢，因为在

multOne（）

中，CPU不像

multTwo（）

中那样写入顺序内存地址。听起来对吗？如有任何解释，将不胜感激

import java.util.Random;

public class ArrayTest {

    double[] arrayOne;
    double[] arrayTwo;
    double[] arrayThree;

    double[][] matrix;

    double[] input;
    int loopCount;

    int rows;
    int columns;

    public ArrayTest(int rows, int columns) {
        this.rows = rows;
        this.columns = columns;
        this.loopCount = 0;
        arrayOne = new double[rows];
        arrayTwo = new double[rows];
        arrayThree = new double[rows];
        matrix = new double[rows][columns];
        Random random = new Random();
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < columns; j++) {
                matrix[i][j] = random.nextDouble();
            }
        }
    }

    public void testOne(double[] input, int epochs) {
        this.input = input;
        this.loopCount = 0;
        long start = System.currentTimeMillis();
        long duration;
        for (int i = 0; i < epochs; i++) {
            multOne();
        }
        duration = System.currentTimeMillis() - start;
        System.out.println("Duration for testOne with " + epochs + " epochs: " + duration + ", loopCount: " + loopCount);
    }

    public void multOne() {
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < columns; j++) {
                arrayOne[i] += matrix[i][j] * arrayOne[i] * input[j];
                arrayTwo[i] += matrix[i][j] * arrayTwo[i] * input[j];
                arrayThree[i] += matrix[i][j] * arrayThree[i] * input[j];
                loopCount++;
            }
        }
    }

    public void testTwo(double[] input, int epochs) {

        this.loopCount = 0;
        long start = System.currentTimeMillis();
        long duration;
        for (int i = 0; i < epochs; i++) {
            arrayOne = multTwo(matrix, arrayOne, input);
            arrayTwo = multTwo(matrix, arrayTwo, input);
            arrayThree = multTwo(matrix, arrayThree, input);
        }
        duration = System.currentTimeMillis() - start;
        System.out.println("Duration for testTwo with " + epochs + " epochs: " + duration + ", loopCount: " + loopCount);
    }

    public double[] multTwo(double[][] matrix, double[] array, double[] input) {
        double[] newArray = new double[rows];
        for (int i = 0; i < rows; i++) {
            for (int j = 0; j < columns; j++) {
                newArray[i] += matrix[i][j] * array[i] * input[j];
                loopCount++;
            }
        }
        return newArray;
    }

    public static void main(String[] args) {
        int rows = 100;
        int columns = 128;
        ArrayTest arrayTest = new ArrayTest(rows, columns);
        Random random = new Random();
        double[] input = new double[columns];
        for (int i = 0; i < columns; i++) {
            input[i] = random.nextDouble();
        }
        arrayTest.testOne(input, 5000);
        arrayTest.testTwo(input, 5000);
    }
}

import java.util.Random；
公共类阵列测试{
双[]阵列；
双[]阵列2；
双[]排列树；
双[]矩阵；
双[]输入；
整数循环计数；
int行；
int列；
公共ArrayTest（int行、int列）{
this.rows=行；
this.columns=列；
this.loopCount=0；
arrayOne=新的双[行]；
arrayTwo=新的双[行]；
ArrayTree=新的双[行]；
矩阵=新的双[行][列]；
随机=新随机（）；
对于（int i=0；i

测试花费不同时间的原因很简单：它们做的事情不同。由于您比较的两个循环在功能上并不相同，因此迭代次数不是一个好的指标

testOne

花费的时间比

testTwo

长，因为：

在
```
multOne
```
中，您可以在每次迭代期间将
```
arrayOne[i]
```
更新到位属于
```
j
```
循环。这意味着对于内循环的每次迭代
```
j
```
您正在使用一个新值
```
arrayOne[i]
```
，该值是在上一次迭代。这将创建一个循环携带的依赖项，即编译器更难优化，因为您需要输出在下一次操作中，矩阵[i][j]*阵列[i]*输入[j] CPU时钟周期。这在浮点运算中是不可能的操作，通常有几个时钟周期的延迟，所以这会导致暂停，从而降低性能

在

testTwo中

在

历元的每次迭代中仅更新阵列一次，以及
由于没有携带的依赖项，因此可以对循环进行矢量化
有效地，这将产生更好的缓存和算法
表演

Java中的基准测试非常非常困难。我不相信你的测量结果——编写与事实相反的基准测试非常容易。如果您想要可靠的数据，请使用JMH或类似工具。这里有一个提示，让您开始自己回答它（尽管看到一个精心设计的问题很好）：将mult1更改为使用新数组，而不是arrayOne+=
，arrayTwo+=
和arraytree+=
，然后分配arrayOne
，arrayTwo
，和方法末尾的arrayThree
。这可能是真的。跟踪所花费的时间而不是执行的操作会很有帮助。您可以System.nanotime（）
获取当前时间，然后在操作结束后减去差值。是的，以纳秒为单位计算时间。