Python中的Numba用于更快的代码执行,只使用一个核心
我一直在尝试在python中使用Python中的Numba用于更快的代码执行,只使用一个核心,python,numpy,scipy,numba,eigenvector,Python,Numpy,Scipy,Numba,Eigenvector,我一直在尝试在python中使用numba,以使特定的python脚本运行得更快 我已经阅读了有关受支持的numpy函数的numba文档,并修改了代码,使其兼容 然而,(在使用top之后,似乎只使用了一个内核)。 我的代码就是这样,并使用scipy.sparse.linalg.LinearOperator和scipy.sparse.linalg.eigsh估算矩阵(未明确构建)的最大k特征向量 import numpy as np, scipy from scipy.sparse.linalg
numba
,以使特定的python脚本运行得更快
我已经阅读了有关受支持的numpy
函数的numba
文档,并修改了代码,使其兼容
然而,(在使用top
之后,似乎只使用了一个内核)。
我的代码就是这样,并使用scipy.sparse.linalg.LinearOperator
和scipy.sparse.linalg.eigsh估算矩阵(未明确构建)的最大k
特征向量
import numpy as np, scipy
from scipy.sparse.linalg import eigs,eigsh
from scipy.sparse.linalg import LinearOperator
from timeit import default_timer as timer
from numba import jit
from scipy import stats
def standardize(X):
X = stats.zscore(X, axis =1, ddof=1)
return X
@jit(nopython=True)
def Binline(x):
# transpose() is supported by numba
tmp1 = np.dot(X.transpose(), x)
# np.dot() and np.divide() is supported by numba
y = np.divide(np.dot(X,tmp1) , (T-1.0) )
return y
X = np.random.rand(10000,500)
X = standardize(X)
T = X.shape[1]
# Pass the above inline function into a Linear Operator
A = LinearOperator(shape=(X.shape[0], X.shape[0]) , matvec = Binline)
print("starting")
k = 400 # how many eigenvectors to estimate
vals1, vecs1 = eigsh(A, k=k, which='LA', ncv=2*k, maxiter=2500, tol=1e-14)
print("finished")
# sorting
idx = vals1.argsort()[::-1]
vals1, vecs1 = vals1[idx], vecs1[:,idx]
print("all done")
我希望我的系统上所有可用的内核都被使用。
任何提示/帮助都将不胜感激
系统规格:
$ cat /proc/meminfo
MemTotal: 527842404 kB
MemFree: 359609816 kB
MemAvailable: 413678484 kB
Buffers: 106068 kB
Cached: 56403884 kB
SwapCached: 15596 kB
Active: 107761496 kB
Inactive: 57753936 kB
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 85
Model name: Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
Stepping: 4
CPU MHz: 3615.553
BogoMIPS: 4200.00
>>> np.show_config()
mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/opt/Open_Soft/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/Open_Soft/anaconda3/include']
blas_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/opt/Open_Soft/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/Open_Soft/anaconda3/include']
blas_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/opt/Open_Soft/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/Open_Soft/anaconda3/include']
lapack_mkl_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/opt/Open_Soft/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/Open_Soft/anaconda3/include']
lapack_opt_info:
libraries = ['mkl_rt', 'pthread']
library_dirs = ['/opt/Open_Soft/anaconda3/lib']
define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
include_dirs = ['/opt/Open_Soft/anaconda3/include']
为什么是麻木?假设您的计算由矩阵向量积控制,只需使用sane numpy设置和并行BLAS后端(OpenBLAS,MKL)。然后,.dot
免费利用它。谢谢你的建议numba
是我发现的最为用户友好的模块,理论上,它仅适用于像我这样的案例。我希望它能顺利地使用我所有的内核。它永远不会与BLAS竞争一个简单的matrix vec产品。有可能提供一个关于如何为BLAS使用定制代码的示例吗?np。dot
将自动完成这项工作(至少在纯numpy代码中;不确定外部@jit会做什么)。寻找一些有关如何读取numpy配置以解释BLAS后端的相关问题。许多安装numpy的方法都提供了开箱即用的并行BLAS(Anaconda和MKL,Ubuntu和OpenBLAS;至少我记得是这样的状态)