Python 为什么dgemm和sgemm比numpy慢得多(200x)';s点?

Python 为什么dgemm和sgemm比numpy慢得多(200x)';s点?,python,numpy,scipy,blas,Python,Numpy,Scipy,Blas,为什么dgemm和sgemm比numpy的dot慢得多(200倍)?这是正常的吗 以下是我用来测试的代码: from scipy.linalg import blas import numpy as np import time x2 = np.zeros((1000000, 512)) x1 = np.zeros((1, 512)) t1 = time.time() for i in range(10): np.dot(x1, x2.T) t2 = time.time() pri

为什么dgemm和sgemm比numpy的dot慢得多(200倍)?这是正常的吗

以下是我用来测试的代码:

from scipy.linalg import blas
import numpy as np
import time


x2 = np.zeros((1000000, 512))
x1 = np.zeros((1, 512))

t1 = time.time()
for i in range(10):
    np.dot(x1, x2.T)
t2 = time.time()
print("np.dot: ", t2-t1)
t1 = time.time()
for i in range(10):
    blas.dgemm(alpha=1.0, a=x1, b=x2, trans_b=True)
t2 = time.time()
print("dgemm: ", t2-t1)
t1 = time.time()
for i in range(10):
    blas.sgemm(alpha=1.0, a=x1, b=x2, trans_b=True)
t2 = time.time()
print("sgemm: ", t2-t1)
我得到的结果是:

np.dot:  0.1820526123046875
dgemm:  34.11782765388489
sgemm:  25.33052659034729
以下是我的scipy配置,显示它是使用OpenBLAS编译的:

   >>> import scipy
    >>> scipy.__config__.show()
    openblas_lapack_info:
        libraries = ['openblas', 'openblas']
        library_dirs = ['/usr/local/lib']
        language = c
        define_macros = [('HAVE_CBLAS', None)]
    lapack_opt_info:
        libraries = ['openblas', 'openblas']
        library_dirs = ['/usr/local/lib']
        language = c
        define_macros = [('HAVE_CBLAS', None)]
    blas_mkl_info:
      NOT AVAILABLE
    openblas_info:
        libraries = ['openblas', 'openblas']
        library_dirs = ['/usr/local/lib']
        language = c
        define_macros = [('HAVE_CBLAS', None)]
    blas_opt_info:
        libraries = ['openblas', 'openblas']
        library_dirs = ['/usr/local/lib']
        language = c
        define_macros = [('HAVE_CBLAS', None)]
以下是我的numpy配置,它与scipy的基本相同:

>>> import numpy
>>> numpy.__config__.show()
blas_mkl_info:
  NOT AVAILABLE
blis_info:
  NOT AVAILABLE
openblas_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
  NOT AVAILABLE
openblas_lapack_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
    libraries = ['openblas', 'openblas']
    library_dirs = ['/usr/local/lib']
    language = c
    define_macros = [('HAVE_CBLAS', None)]

我用错了吗?

我认为从f2py调用blas函数的开销是罪魁祸首。首先,使您的数组Fortran连续,否则将创建一个数组副本以传递给GEMM,同时使您的数据类型不为dgemm而浮动int@percusse为什么要把它设为int?我使用np.zero作为虚拟变量。我在处理浮点矩阵。我的意思不是int。如果我将
order='F'
添加到零函数中,dgemm就开始赢了。sgemm无论如何都会慢一些,因为数据应该在内部转换为单精度。@percusse我找不到numpy.dot的源代码。它是否在幕后使用相同的scipy.blas.gemm函数?速度非常接近,这让我怀疑区别在于函数调用的开销。