Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/341.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何优化此Cython功能?_Python_Cython_Numba - Fatal编程技术网

Python 如何优化此Cython功能?

Python 如何优化此Cython功能?,python,cython,numba,Python,Cython,Numba,我有一个Cython模块: #!python #cython: language_level=3, boundscheck=False, nonecheck=False import numpy as np cimport numpy as np def portfolio_s2( double[:,:] cv, double[:] weights ): """ Calculate portfolio variance""" cdef double s0 cd

我有一个Cython模块:

#!python
#cython: language_level=3, boundscheck=False, nonecheck=False

import numpy as np
cimport numpy as np

def portfolio_s2( double[:,:] cv, double[:] weights ):    
    """ Calculate portfolio variance"""
    cdef double s0
    cdef double s1
    cdef double s2
    s0 = 0.0
    for i in range( weights.shape[0] ):
        s0 += weights[i]*weights[i]*cv[i,i]

    s1 = 0.0
    for i in range( weights.shape[0]-1 ):
        s2 = 0.0
        for j in range( i+1, weights.shape[0] ):
            s2 += weights[j]*cv[i,j]
        s1+= weights[i]*s2
    return s0+2.0*s1 
我在Numba中具有等效功能:

@nb.jit( nopython=True )
def portfolio_s2( cv, weights ):
    """ Calculate portfolio variance using numba """
    s0 = 0.0
    for i in range( weights.shape[0] ):
        s0 += weights[i]*weights[i]*cv[i,i]

    s1 = 0.0
    for i in range( weights.shape[0]-1 ):
        s2 = 0.0
        for j in range( i+1, weights.shape[0] ):
            s2 += weights[j]*cv[i,j]
        s1+= weights[i]*s2
    return s0+2.0*s1 
对于大小为10的协方差矩阵,Numba版本比Cython快20倍。我想这是因为我在Cython做了错事,但我对Cython是新手,不知道该怎么办

使用Cel的优化

我编写了一个脚本来测试Cel的代码和Numba版本:

    sizes = [ 2, 3, 4, 6, 8, 12, 16, 32, 48, 64, 96, 128, 196, 256 ]
    cython_timings = []
    numba_timings = []
    for size in sizes:
        X = np.random.randn(100,size)
        cv = np.cov( X, rowvar=0 )
        w  = np.ones( cv.shape[0] )

        num_tests=10

        pm.portfolio_s2( cv, w )
        with Timer( 'Cython' ) as cython_timer:
            for _ in range( num_tests ):
                s2_cython = pm.portfolio_s2_opt( cv, w )
        cython_timings.append( cython_timer.interval )

        helpers.portfolio_s2( cv, w )
        with Timer( 'Numba' ) as numba_timer:
            for _ in range( num_tests ):
                s2_numba = helpers.portfolio_s2( cv, w )
        numba_timings.append( numba_timer.interval )

    plt.plot( sizes, cython_timings, label='Cython' )
    plt.plot( sizes, numba_timings, label='Numba' )
    plt.title( 'Execution Time By Covariance Size' )
    plt.legend()
    plt.show()
生成的图表如下所示:

图表显示,对于较小的协方差矩阵,Numba的性能更好。但随着协方差矩阵大小的增加,Cython的伸缩性更好,最终表现更出色


是否有某种函数调用开销导致Cython对小矩阵的性能如此差?我对这段代码的用例将涉及计算许多小协方差矩阵的协方差。因此,对于小矩阵,我需要更好的性能,而不是大矩阵。

使用
Cython
时,重要的是确保所有内容都是静态类型的

在您的示例中,没有键入循环变量
i
j
。声明
cdef size\u t i,j
已经为您提供了巨大的加速

在cython的文档中有一些很好的例子

这是我的设置和评估:

import numpy as np
n = 100
cv = np.random.rand(n,n)
weights= np.random.rand(n)
原文:

%timeit portfolio_s2(cv, weights)
10000 loops, best of 3: 147 µs per loop
优化版本:

%timeit portfolio_s2_opt(cv, weights)
100000 loops, best of 3: 10 µs per loop
代码如下:

import numpy as np
cimport numpy as np


def portfolio_s2_opt(double[:,:] cv, double[:] weights):    
    """ Calculate portfolio variance"""
    cdef double s0
    cdef double s1
    cdef double s2
    cdef size_t i, j

    s0 = 0.0
    for i in range( weights.shape[0] ):
        s0 += weights[i]*weights[i]*cv[i,i]

    s1 = 0.0
    for i in range( weights.shape[0]-1 ):
        s2 = 0.0
        for j in range( i+1, weights.shape[0] ):
            s2 += weights[j]*cv[i,j]
        s1+= weights[i]*s2
    return s0+2.0*s1 

有一个很好的教程中。还要记住声明每个变量的类型。请注意,
i
不是静态类型。如果cython的性能“差”,那么为什么不使用numba?numba不允许我在nopython模式下创建数组。所以我正在学习Cython。您的测试可能有点误导:您正在测量python->Cython(或python->numba)调用开销。如果您使用“cpdef”函数并从Cython内部调用它,可能会更好。调用开销对您的问题是否重要?如果是这种情况,您可以通过
cythonization
更多的代码而不使用函数来避免函数调用。请注意,您正在用程序的良好结构换取速度。如果你真的需要这个额外的速度,我只会这样做。也许值得指出的是,输入
I
j
(Cython甚至在我不这样做时警告我)会带来巨大的收益。使用numpy接口而不是原始代码中的
double[:]
memoryview接口可以使我的速度提高约2%。@DavidW,使用
memoryview
并不能使我的速度提高。然而,他们现在似乎更受欢迎。我要修改我的答案。谢谢你指出这一点。