Python中的Numba用于更快的代码执行,只使用一个核心

Python中的Numba用于更快的代码执行,只使用一个核心,python,numpy,scipy,numba,eigenvector,Python,Numpy,Scipy,Numba,Eigenvector,我一直在尝试在python中使用numba,以使特定的python脚本运行得更快 我已经阅读了有关受支持的numpy函数的numba文档,并修改了代码,使其兼容 然而,(在使用top之后,似乎只使用了一个内核)。 我的代码就是这样,并使用scipy.sparse.linalg.LinearOperator和scipy.sparse.linalg.eigsh估算矩阵(未明确构建)的最大k特征向量 import numpy as np, scipy from scipy.sparse.linalg

我一直在尝试在python中使用
numba
,以使特定的python脚本运行得更快

我已经阅读了有关受支持的
numpy
函数的
numba
文档,并修改了代码,使其兼容

然而,(在使用
top
之后,似乎只使用了一个内核)。

我的代码就是这样,并使用
scipy.sparse.linalg.LinearOperator
scipy.sparse.linalg.eigsh估算矩阵(未明确构建)的最大
k
特征向量

import numpy as np, scipy
from scipy.sparse.linalg import eigs,eigsh
from scipy.sparse.linalg import LinearOperator
from timeit import default_timer as timer
from numba import jit
from scipy import stats

def standardize(X):
    X = stats.zscore(X, axis =1, ddof=1)
    return X

@jit(nopython=True)
def Binline(x):
    # transpose() is supported by numba
    tmp1 = np.dot(X.transpose(), x) 
    # np.dot() and np.divide() is supported by numba
    y = np.divide(np.dot(X,tmp1) , (T-1.0) )
    return y

X = np.random.rand(10000,500)
X = standardize(X)
T = X.shape[1]

# Pass the above inline function into a Linear Operator 
A = LinearOperator(shape=(X.shape[0], X.shape[0]) , matvec =  Binline)

print("starting")
k = 400 # how many eigenvectors to estimate
vals1, vecs1 = eigsh(A, k=k, which='LA', ncv=2*k, maxiter=2500, tol=1e-14)
print("finished")

# sorting
idx = vals1.argsort()[::-1]
vals1, vecs1 = vals1[idx], vecs1[:,idx]
print("all done")
我希望我的系统上所有可用的内核都被使用。 任何提示/帮助都将不胜感激

系统规格:

$ cat /proc/meminfo
MemTotal:       527842404 kB
MemFree:        359609816 kB
MemAvailable:   413678484 kB
Buffers:          106068 kB
Cached:         56403884 kB
SwapCached:        15596 kB
Active:         107761496 kB
Inactive:       57753936 kB

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               85
Model name:          Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz
Stepping:            4
CPU MHz:             3615.553
BogoMIPS:            4200.00

>>> np.show_config()
mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/Open_Soft/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/opt/Open_Soft/anaconda3/include']
blas_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/Open_Soft/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/opt/Open_Soft/anaconda3/include']
blas_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/Open_Soft/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/opt/Open_Soft/anaconda3/include']
lapack_mkl_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/Open_Soft/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/opt/Open_Soft/anaconda3/include']
lapack_opt_info:
    libraries = ['mkl_rt', 'pthread']
    library_dirs = ['/opt/Open_Soft/anaconda3/lib']
    define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)]
    include_dirs = ['/opt/Open_Soft/anaconda3/include']

为什么是麻木?假设您的计算由矩阵向量积控制,只需使用sane numpy设置和并行BLAS后端(OpenBLAS,MKL)。然后,
.dot
免费利用它。谢谢你的建议
numba
是我发现的最为用户友好的模块,理论上,它仅适用于像我这样的案例。我希望它能顺利地使用我所有的内核。它永远不会与BLAS竞争一个简单的matrix vec产品。有可能提供一个关于如何为BLAS使用定制代码的示例吗?
np。dot
将自动完成这项工作(至少在纯numpy代码中;不确定外部@jit会做什么)。寻找一些有关如何读取numpy配置以解释BLAS后端的相关问题。许多安装numpy的方法都提供了开箱即用的并行BLAS(Anaconda和MKL,Ubuntu和OpenBLAS;至少我记得是这样的状态)