Python 为什么dgemm和sgemm比numpy慢得多(200x)';s点?
为什么dgemm和sgemm比numpy的dot慢得多(200倍)?这是正常的吗 以下是我用来测试的代码:Python 为什么dgemm和sgemm比numpy慢得多(200x)';s点?,python,numpy,scipy,blas,Python,Numpy,Scipy,Blas,为什么dgemm和sgemm比numpy的dot慢得多(200倍)?这是正常的吗 以下是我用来测试的代码: from scipy.linalg import blas import numpy as np import time x2 = np.zeros((1000000, 512)) x1 = np.zeros((1, 512)) t1 = time.time() for i in range(10): np.dot(x1, x2.T) t2 = time.time() pri
from scipy.linalg import blas
import numpy as np
import time
x2 = np.zeros((1000000, 512))
x1 = np.zeros((1, 512))
t1 = time.time()
for i in range(10):
np.dot(x1, x2.T)
t2 = time.time()
print("np.dot: ", t2-t1)
t1 = time.time()
for i in range(10):
blas.dgemm(alpha=1.0, a=x1, b=x2, trans_b=True)
t2 = time.time()
print("dgemm: ", t2-t1)
t1 = time.time()
for i in range(10):
blas.sgemm(alpha=1.0, a=x1, b=x2, trans_b=True)
t2 = time.time()
print("sgemm: ", t2-t1)
我得到的结果是:
np.dot: 0.1820526123046875
dgemm: 34.11782765388489
sgemm: 25.33052659034729
以下是我的scipy配置,显示它是使用OpenBLAS编译的:
>>> import scipy
>>> scipy.__config__.show()
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_mkl_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
以下是我的numpy配置,它与scipy的基本相同:
>>> import numpy
>>> numpy.__config__.show()
blas_mkl_info:
NOT AVAILABLE
blis_info:
NOT AVAILABLE
openblas_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
blas_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_mkl_info:
NOT AVAILABLE
openblas_lapack_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
lapack_opt_info:
libraries = ['openblas', 'openblas']
library_dirs = ['/usr/local/lib']
language = c
define_macros = [('HAVE_CBLAS', None)]
我用错了吗?我认为从f2py调用blas函数的开销是罪魁祸首。首先,使您的数组Fortran连续,否则将创建一个数组副本以传递给GEMM,同时使您的数据类型不为dgemm而浮动int@percusse为什么要把它设为int?我使用np.zero作为虚拟变量。我在处理浮点矩阵。我的意思不是int。如果我将
order='F'
添加到零函数中,dgemm就开始赢了。sgemm无论如何都会慢一些,因为数据应该在内部转换为单精度。@percusse我找不到numpy.dot的源代码。它是否在幕后使用相同的scipy.blas.gemm函数?速度非常接近,这让我怀疑区别在于函数调用的开销。