Python 高效地计算矩阵中所有行的运算_Python_Numpy_Matrix

Python 高效地计算矩阵中所有行的运算

python numpy matrix

Python 高效地计算矩阵中所有行的运算,python,numpy,matrix,Python,Numpy,Matrix,假设我有一个任意大小（n，M）的二维矩阵M。现在我想对M中的所有行和M中的所有其他行进行有效的向量运算。在我的例子中，我想计算每行之间的按位xnor 我们知道的是不需要计算xnor（x， y）和xnor（y，x），计算其中一个就足够了。另外，在我的例子中，不需要计算xnor（x，x）。我知道解决这个问题的两种方法： def xnor(p, q): return abs(p-1) * abs(q-1) + (p * q) def solution_one(M): return

假设我有一个任意大小（n，M）的二维矩阵M。现在我想对M中的所有行和M中的所有其他行进行有效的向量运算。在我的例子中，我想计算每行之间的按位xnor

我们知道的是不需要计算xnor（x， y）和xnor（y，x），计算其中一个就足够了。另外，在我的例子中，不需要计算xnor（x，x）。我知道解决这个问题的两种方法：

def xnor(p, q):
    return abs(p-1) * abs(q-1) + (p * q)

def solution_one(M):
    return xor(M[:, None], M).all(axis=2)

def solution_two(M):
    n = M.shape[0]
    results = np.zeros((n, n))

    # Upper triangular indices of 
    ii, jj = np.triu_indices(n, 1)
    for i, j in zip(ii,jj):
        results[i,j] = xnor(M[i], M[j]).all(axis=0)

    return results

用任意大矩阵运行这两个函数，我们得到了解决方案一

# 100x100
%timeit solution_one(M)
# 3.55 ms ± 73.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# 500x500
%timeit solution_one(M)
# 1.36 s ± 59.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# 1000x1000
%timeit solution_one(M)
# 27.4 s ± 603 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

解决方案二呢

# 100x100
%timeit solution_two(M)
# 29.5 ms ± 535 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

# 500x500
%timeit solution_two(M)
# 1.06 s ± 41.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# 1000x1000
%timeit solution_two(M)
# 5.58 s ± 95.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

正如我们看到的，对于较小的实例，

solution\u one

比

solution\u two

快，但对于较大的实例则相反

我们可以看到，

solution\u one

正在进行

nxn

计算，而

solution\u two

将进行计算（如Naphat Amundsen在评论中提到的）

（（nxn）-n）/2

。有更好的解决方案吗？一个人能否以某种方式构造一个新的M矩阵，从而使解决方案一能够更接近于

（（nxn）-n）/2

计算？

我相信，在这种情况下，你能做的最低计算数就是

（（nxn）-n）/2

。如果是这样，那么就复杂性而言，您的

解决方案\u two

已经是最佳的。但是，我相信您可以使

解决方案\u two

更加矢量化，这可能会提高运行时性能。以下是我的尝试：

import numpy as np

def solution_two(M):
    n = M.shape[0]
    results = np.zeros((n, n))

    # Upper triangular indices of
    ii, jj = np.triu_indices(n, 1)
    for i, j in zip(ii,jj):
        results[i,j] = xnor(M[i], M[j]).all(axis=0)

    return results

def solution_two_vectorized(M):
    n = M.shape[0]
    results = np.zeros((n, n))
    ii, jj = np.triu_indices(n, 1)
    results[ii,jj] = xnor(M[ii], M[jj]).all(-1)
    return results

# Quick test checking that your outputs are same as mine on some random matrices
import time
# Generator of random test matrices
Ms = (np.random.randint(-10,10,(np.random.randint(100,200),np.random.randint(100,200))) for i in
      range(100))

vectortimes = []
loopytimes = []

for M in Ms:
    t0 = time.time()
    out_vectorized = solution_two_vectorized(M)
    vectortimes.append(time.time() - t0)

    t0 = time.time()
    out_loopy = solution_two(M)
    loopytimes.append(time.time() - t0)

    if not np.allclose(out_vectorized, out_loopy):
        print("Output mismatch!")
        break
else:
    print("Success!")
# Success!

mean_vectortimes = np.mean(vectortimes)
mean_loopytimes = np.mean(loopytimes)

print("Vector times: ", mean_vectortimes)
# Vector times:  0.017428524494171142
print("Loopy times: ", mean_loopytimes)
# Loopy times:  0.07040918111801148
print(f"Performance improvement: {mean_loopytimes/mean_vectortimes} times")
# Performance improvement: 4.039881926984661 times

你好，这是一个有趣的问题。我有点想知道

解决方案2

如何完成您所说的

（nxn）

的一半以上。它不是在做

（（nxn）-n）/2

？是的，你是对的，谢谢你的注意！我会更新我的问题。我做了

tril

我本该做的

triu

很好，谢谢！是的，这感觉像介于两者之间。然而，对于更大的矩阵，它实际上比之前的两个解都慢(≥ 500x500）。也许从长远来看，时间索引比循环和计算更糟糕？这是不幸的。我现在自己检查了一下，我同意，对于大的输入，速度确实较慢。奇特的索引：

M[ii]

和

M[jj]

创建的是副本而不是视图，因此您最终需要进行大量复制。像您这样使用for循环可以避免上述副本，因为您只做基本的索引。我真的没想到。我可以建议使用

numba

来jit编译

solution\u two

函数吗？使用

numba

这是一个巨大的改进。这几乎是每个解决方案的一半时间，矢量化的解决方案现在比循环解决方案快哇，真的吗？你应该提交你的结果作为答案！我（可能还有其他人）很想看到他们：）