Python 关于矩阵乘法ijk模型6种组合的观察

Python 关于矩阵乘法ijk模型6种组合的观察,python,matrix,matrix-multiplication,cpu-cache,Python,Matrix,Matrix Multiplication,Cpu Cache,我们尝试了ijk模型的所有6种组合(即ijk、JIK、KIJ、IKJ、JKI、KJI)进行矩阵乘法,以观察每种组合所用时间的差异。我们制作了一个脚本,并在装有Windows 10和Windows 10的笔记本电脑上运行了一夜,以便进行观察 我们测量了时间与阵列大小的关系。问题是,我们从观察中获得的图形与所有组合的性能差异背后的理论不一致,即必须以行方式遍历的内循环将具有较低的缓存未命中率和更好的性能,而必须以列方式遍历的内循环将具有最大的缓存未命中率和最差的性能表演 这是我们观察到的图表: 这是

我们尝试了ijk模型的所有6种组合(即ijk、JIK、KIJ、IKJ、JKI、KJI)进行矩阵乘法,以观察每种组合所用时间的差异。我们制作了一个脚本,并在装有Windows 10和Windows 10的笔记本电脑上运行了一夜,以便进行观察

我们测量了时间与阵列大小的关系。问题是,我们从观察中获得的图形与所有组合的性能差异背后的理论不一致,即必须以行方式遍历的内循环将具有较低的缓存未命中率和更好的性能,而必须以列方式遍历的内循环将具有最大的缓存未命中率和最差的性能表演

这是我们观察到的图表:

这是我们的参考图:

尽管图中每个内部循环迭代与数组大小之间有循环,但时间与数组大小不应该类似吗?我们的观察方法或代码是否错误?我们无法找到故障,也无法得出结论。如果有人能告诉我们哪里出了问题,那就太好了

我们为观察而运行的代码:

import time
import numpy as np
import random
import csv

#size = int(input("Enter the size of arrays: "))
# []
sizeArr = np.array([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400, 1500, 1600, 1700], dtype=int)

for size in sizeArr:
    with open('observation.csv', 'a+') as file:
        writer = csv.writer(file)

        print(f"Size -> {size}\n")
        writer.writerow([f"Size -> {size}"])
        # Initialize for all the sizes
        A = np.zeros((size, size), dtype=int)
        B = np.zeros((size, size), dtype=int)
        C = np.zeros((size, size), dtype=int)

        # Filling arrays with random numbers
        for a in range(size):
            for b in range(size):
                A[a][b] = random.randrange(1, 101, 1)
                B[a][b] = random.randrange(1, 101, 1)

        # ijk
        before = time.time()
        for i in range(size):
            for j in range(size):
                C[i][j] = 0
                for k in range(size):
                    C[i][j] += A[i][k] * B[k][j]
        after = time.time()
        timeTaken = int(after - before)
        print(f"Time taken by ijk -> {timeTaken}")
        writer.writerow([f"Time taken by ijk -> {timeTaken}"])

        # Reset C elements to 0
        C = np.zeros((size, size), dtype=int)

        # Filling arrays with random numbers
        for a in range(size):
            for b in range(size):
                A[a][b] = random.randrange(1, 101, 1)
                B[a][b] = random.randrange(1, 101, 1)

        # jik
        before = time.time()
        for j in range(size):
            for i in range(size):
                C[i][j] = 0
                for k in range(size):
                    C[i][j] += A[i][k] * B[k][j]
        after = time.time()
        timeTaken = int(after - before)
        print(f"Time taken by jik -> {timeTaken}")
        writer.writerow([f"Time taken by jik -> {timeTaken}"])

        # Reset C elements to 0
        C = np.zeros((size, size), dtype=int)

        # Filling arrays with random numbers
        for a in range(size):
            for b in range(size):
                A[a][b] = random.randrange(1, 101, 1)
                B[a][b] = random.randrange(1, 101, 1)

        # kij
        before = time.time()
        for k in range(size):
            for i in range(size):
                temp = A[i][k]
                for j in range(size):
                    C[i][j] += temp * B[k][j]
        after = time.time()
        timeTaken = int(after - before)
        print(f"Time taken by kij -> {timeTaken}")
        writer.writerow([f"Time taken by kij -> {timeTaken}"])

        # Reset C elements to 0
        C = np.zeros((size, size), dtype=int)

        # Filling arrays with random numbers
        for a in range(size):
            for b in range(size):
                A[a][b] = random.randrange(1, 101, 1)
                B[a][b] = random.randrange(1, 101, 1)

        # ikj
        before = time.time()
        for i in range(size):
            for k in range(size):
                temp = A[i][k]
                for j in range(size):
                    C[i][j] += temp * B[k][j]
        after = time.time()
        timeTaken = int(after - before)
        print(f"Time taken by ikj -> {timeTaken}")
        writer.writerow([f"Time taken by ikj -> {timeTaken}"])

        # Reset C elements to 0
        C = np.zeros((size, size), dtype=int)

        # Filling arrays with random numbers
        for a in range(size):
            for b in range(size):
                A[a][b] = random.randrange(1, 101, 1)
                B[a][b] = random.randrange(1, 101, 1)

        # jki
        before = time.time()
        for j in range(size):
            for k in range(size):
                temp = B[k][j]
                for i in range(size):
                    C[i][j] += A[i][k] * temp
        after = time.time()
        timeTaken = int(after - before)
        print(f"Time taken by jki -> {timeTaken}")
        writer.writerow([f"Time taken by jki -> {timeTaken}"])

        # Reset C elements to 0
        C = np.zeros((size, size), dtype=int)

        # Filling arrays with random numbers
        for a in range(size):
            for b in range(size):
                A[a][b] = random.randrange(1, 101, 1)
                B[a][b] = random.randrange(1, 101, 1)

        # kji
        before = time.time()
        for k in range(size):
            for j in range(size):
                temp = B[k][j]
                for i in range(size):
                    C[i][j] += A[i][k] * temp
        after = time.time()
        timeTaken = int(after - before)
        print(f"Time taken by kji -> {timeTaken}")
        writer.writerow([f"Time taken by kji -> {timeTaken}"])
        print("\n")


在windows上,您需要类似于Linux上的
perf stat
的东西-硬件计数器访问。也许这会有帮助:您应该将时间除以n^3,得到一个时间/迭代。2/您无法对函数的唯一运行进行可靠的时间估计。运行函数10次,并保持最小的结果。3/使用python进行基准测试时会有开销。您应该使用C和最高优化级别。在windows上,您需要类似于Linux上的
perf stat
的内容-硬件计数器访问。也许这会有帮助:您应该将时间除以n^3,得到一个时间/迭代。2/您无法对函数的唯一运行进行可靠的时间估计。运行函数10次,并保持最小的结果。3/使用python进行基准测试时会有开销。您应该使用C和最高的优化级别。