Java CPU缓存/内存访问时间异常_Java_C++_Memory_Memory Management_Cpu Cache

Java CPU缓存/内存访问时间异常

java c++ memory memory-management

Java CPU缓存/内存访问时间异常,java,c++,memory,memory-management,cpu-cache,Java,C++,Memory,Memory Management,Cpu Cache,我们试图在Java中优化重内存操作，但遇到了一些异常情况。根据我们的数据，我们得出了这样的假设：一个数组/内存块可能会由于大量访问而加载到CPU缓存中，但在多次克隆该数组后，缓存变满，并将初始数组移回RAM 为了测试这一点，我们建立了一个基准。它做了以下工作：创建具有给定大小的数组在字段中写入一些数据读取/迭代它一百万次（将其推入CPU缓存）将其克隆一次到新阵列中将新阵列克隆到一个新阵列中，并在下一次使用该新阵列一定次数此外，在这些步骤中的每一步之后，数组将迭代三次，并测量每次迭代所

我们试图在Java中优化重内存操作，但遇到了一些异常情况。根据我们的数据，我们得出了这样的假设：一个数组/内存块可能会由于大量访问而加载到CPU缓存中，但在多次克隆该数组后，缓存变满，并将初始数组移回RAM

为了测试这一点，我们建立了一个基准。它做了以下工作：

创建具有给定大小的数组

在字段中写入一些数据

读取/迭代它一百万次（将其推入CPU缓存）

将其克隆一次到新阵列中

将新阵列克隆到一个新阵列中，并在下一次使用该新阵列一定次数

此外，在这些步骤中的每一步之后，数组将迭代三次，并测量每次迭代所需的时间。代码如下：

private static long[] read(byte[] array, int count, boolean logTimes) {
    long[] times = null;
    if (logTimes) {
        times = new long[count];
    }
    int sum = 0;
    for (int n = 0; n < count; n++) {
        long start = System.nanoTime();
        for (int i = 0; i < array.length; i++) {
            sum += array[i];
        }
        if (logTimes) {
            long time = System.nanoTime() - start;
            times[n] = time;
        }
    }
    System.out.println(sum);
    return times;
}

public static void main(String[] args) {
    int arraySize = Integer.parseInt(args[0]);
    int clones = Integer.parseInt(args[1]);
    byte[] array = new byte[arraySize];
    long[] initialReadTimes = read(array, 3, true);
    // Fill with some non-zero content
    for (int i = 0; i < array.length; i++) {
        array[i] = (byte) i;
    }
    long[] afterWriteTimes = read(array, 3, true);

    // Make this array important, so it lands in CPU Cache
    read(array, 1_000_000, false);
    long[] afterReadTimes = read(array, 3, true);

    long[] afterFirstCloneReadTimes = null;
    byte[] copy = new byte[array.length];
    System.arraycopy(array, 0, copy, 0, array.length);
    for (int i = 1; i <= clones; i++) {
        byte[] copy2 = new byte[copy.length];
        System.arraycopy(copy, 0, copy2, 0, copy.length);
        copy = copy2;
        if (i == 1) {
            afterFirstCloneReadTimes = read(array, 3, true);
        }
    }

    long[] afterAllClonesReadTimes = read(array, 3, true);

    // Write to CSV
    ...
    System.out.println("Finished.");
}

private静态长[]读（字节[]数组、整数计数、布尔对数时间）{
长[]次=空；
如果（日志时间）{
次数=新长[计数]；
}
整数和=0；
对于（int n=0；n对于（int i=1；i）Java中的微基准测试比较复杂，您应该使用jmh。例如，您需要预热JIT，并在测量之前给它时间编译read方法。此外，我不确定nanotime是否足够精确，可以读取数组的10K循环（取决于Java版本、cpu、os）。JIT编译器也可能意识到克隆循环只需要执行一次，因为下一次迭代没有副作用。有一个微基准框架可以帮助您在Java中进行基准测试。例如，您的代码缺少的是预热阶段。预热尝试确保您的代码已经优化当您测量时，否则优化可能会在测试的中间出现（）。另外，您是否评估了垃圾收集日志以确保不计算GC时间？@assylias@cmoetzing感谢您的建议。我查看了jmh，但找不到如何等效地实现上述基准，因为jmh似乎基于独立的原子基准函数。我们需要“一次”运行那个大方法它允许每个基准设置和拆卸，因此我看到的唯一可能性是将代码克隆5次，并对不同的读取进行基准注释（）-每次调用。或者您是否想到了不同的解决方案？因此，我尝试按照所述实现它，从一个配置数组并在基准方法中对其迭代一次的设置方法开始。有两个问题：1）使用Level.Invocation让安装程序在每次读取之前执行是非常困难的，因为迭代所花费的时间远小于1ms。2）只能测量每个事件之后的第一次读取迭代。
void read(unsigned char array[], int length, int count, std::vector<long int> & logTimes) {
    for (int c = 0; c < count; c++) {
        int sum = 0;
        std::chrono::high_resolution_clock::time_point t1;
        if (count <= 3) {
            t1 = std::chrono::high_resolution_clock::now();
        }
        for (int i = 0; i < length; i++) {
            sum += array[i];
        }
        if (count <= 3) {
            std::chrono::high_resolution_clock::time_point t2 = std::chrono::high_resolution_clock::now();
            long int duration = std::chrono::duration_cast<std::chrono::nanoseconds>(t2 - t1).count();
            std::cout << duration << " ns\n";
            logTimes.push_back(duration);
        }
    }
}

int main(int argc, char ** args)
{
    int ARRAYSIZE = 10000;
    int CLONES = 10000000;
    std::vector<long int> initialTimes, afterWritingTimes, afterReadTimes, afterFirstCloneTimes, afterCloneTimes, null;
    unsigned char array[ARRAYSIZE];
    read(array, ARRAYSIZE, 3, initialTimes);
    for (long long i = 0; i < ARRAYSIZE; i++) {
        array[i] = i;
    }
    std::cout << "Reads after writing:\n";
    read(array, ARRAYSIZE, 3, afterWritingTimes);

    read(array, ARRAYSIZE, 1000000, null);
    std::cout << "Reads after 1M Reads:\n";
    read(array, ARRAYSIZE, 3, afterReadTimes);

    unsigned char copy[ARRAYSIZE];
    unsigned char * ptr_copy = copy;
    std::memcpy(ptr_copy, array, ARRAYSIZE);
    for (long long i = 0; i < CLONES; i++) {
        unsigned char copy2[ARRAYSIZE];
        std::memcpy(copy2, ptr_copy, ARRAYSIZE);
        ptr_copy = copy2;
        if (i == 0) {
            read(array, ARRAYSIZE, 3, afterFirstCloneTimes);
        }
    }
    std::cout << "Reads after cloning:\n";
    read(array, ARRAYSIZE, 3, afterCloneTimes);

    writeTimesToCSV(initialTimes, afterWritingTimes, afterReadTimes, afterFirstCloneTimes, afterCloneTimes);
    std::cout << "Finished.\n";
}