Python 记忆视图规范-Cython

Python 记忆视图规范-Cython,python,numpy,cython,precision,memoryview,Python,Numpy,Cython,Precision,Memoryview,我有一个函数,它有一个memoryview向量,我想计算这个向量的范数。到目前为止,我是通过将memoryview转换为Numpy数组并通过np.sqrtV.dotV计算范数来实现的。现在,出于速度原因,我想取消该步骤,但程序在以下实现中的某个点失败 cdef do_something(np.double_t[::1] M_mem): cdef: int i np.double_t norm_mv = 0 np.double_t norm_

我有一个函数,它有一个memoryview向量,我想计算这个向量的范数。到目前为止,我是通过将memoryview转换为Numpy数组并通过np.sqrtV.dotV计算范数来实现的。现在,出于速度原因,我想取消该步骤,但程序在以下实现中的某个点失败

cdef do_something(np.double_t[::1] M_mem):
    cdef:
        int i
        np.double_t norm_mv = 0
        np.double_t norm_np = 0
        np.ndarray[np.double_t, ndim=1] V = np.copy(np.asarray(M_mem))

    # Original implementation -- working
    norm_np = np.sqrt(V.dot(V))

    # My failed try with memoryview -- not working
    for i in range(M_mem.shape[0]):
        norm_mv += M_mem[i]**2
    norm_mv = np.sqrt(norm_mv)

    # norm_mv != norm_np
我怀疑这是因为,对于足够大的向量来说,这是一个障碍。有没有一种数值稳定的方法来计算Cython记忆视图的范数

更新

经检查,舍入误差可能毫无意义。相反,发生了一件非常奇怪的事情。我的实际函数如下所示:

@cython.boundscheck(False)
@cython.cdivision(True)
@cython.wraparound(False)
cdef np.double_t[:,::1] GS_coefficients(np.double_t[:,::1] M_mem):
    cdef:
        int n, i, k
        int N_E = M_mem.shape[1]
        np.ndarray[np.double_t, ndim=2] W = np.asarray(M_mem)
        np.ndarray[np.double_t, ndim=2] V = np.copy(W)
        np.double_t[:,::1] G = np.eye(N_E, dtype=np.float64)
        np.longdouble_t norm  = 0 # np.sqrt(V[:,0].dot(V[:,0]))
    for i in range(M_mem.shape[0]):
        norm += M_mem[i,0]**2
    norm = sqrt(norm)
    print("npx: ", np.sqrt(V[:,0].dot(V[:,0]))) # line 1
    print("cp: ", norm) # line 2
    V[:,0] /= norm
    G[0,0] /= norm
    for n in range(1, N_E):
        for i in range(0, n):
            G[n,i] = - (V[:,i].dot(W[:,n]))
            V[:,n] += G[n,i] * V[:,i]
        norm = np.sqrt(V[:,n].dot(V[:,n]))
        V[:,n] /= norm
        for i in range(n+1):
            G[n,i] /= norm
    return G
我插入了print语句来检查norm的结果是否相等。问题是,现在一切正常,正如上面的代码所示。但是,当我注释掉第一个print语句第1行时,代码会很好地通过函数运行,但很快就会在程序中失败。那里发生了什么事?这难道不只是一份打印的声明,在操作上不应该影响其他任何东西吗

更新2

下面是我尝试的一个最小、完整且可验证的示例:

DEF N_E_cpt = 4

cimport cython
cimport numpy as np
import numpy as np
from libc.math cimport sqrt

@cython.boundscheck(False)
@cython.cdivision(True)
@cython.wraparound(False)
cdef np.double_t[:,::1] GS_coefficients(np.double_t[:,::1] M_mem):
    """Writes the coefficients, that the Gram-Schmidt procedure
    provides in a Matrix and retruns it."""
    cdef:
        int n, i, k
        int N_E = M_mem.shape[1]
        np.ndarray[np.double_t, ndim=2] W = np.asarray(M_mem)
        np.ndarray[np.double_t, ndim=2] V = np.copy(W)
        np.double_t[:,::1] G = np.eye(N_E, dtype=np.float64)
        np.longdouble_t norm  = 0 # np.sqrt(V[:,0].dot(V[:,0]))
    for i in range(M_mem.shape[0]):
        norm += M_mem[i,0]**2
    norm = sqrt(norm)
    print("npx: ", np.sqrt(V[:,0].dot(V[:,0]))) # line 1
    print("cp: ", norm) # line 2
    V[:,0] /= norm
    G[0,0] /= norm
    for n in range(1, N_E):
        for i in range(0, n):
            G[n,i] = - (V[:,i].dot(W[:,n]))
            V[:,n] += G[n,i] * V[:,i]
        norm = np.sqrt(V[:,n].dot(V[:,n]))
        V[:,n] /= norm
        for i in range(n+1):
            G[n,i] /= norm
    return G

@cython.boundscheck(False)
@cython.cdivision(True)
@cython.wraparound(False)
cdef np.double_t[:,::1] G_mat(np.double_t[:,::1] M_mem):
    """Calls GS_coefficients and uses the coefficients to calculate
    the entries of the transformation matrix G_ij"""
    cdef:
        np.double_t[:,::1] G_mem = GS_coefficients(M_mem)
        int N_E = G_mem.shape[1]
        np.double_t carr[N_E_cpt][N_E_cpt]
        np.double_t[:,::1] G = carr
        int n, i, j

    # delete lower triangle in G
    G[...] = G_mem
    for i in range(N_E_cpt):
        for j in range(0, i):
            G[i,j] = 0.

    for n in range(1, N_E):
        for i in range(0, n):
            for j in range(0, i+1):
                G[n,j] += G_mem[n,i] * G[i,j]
    return G


def run_test():
    cdef:
        np.double_t[:,::1] A_mem
        np.double_t[:,::1] G
        np.ndarray[np.double_t, ndim=2] A = np.random.rand(400**2, N)
        int N = 4

    A_mem = A
    G = G_mat(A_mem)
    X = np.zeros((400**2, N))
    for i in range(0, N):
        for j in range(0,i+1):
            X[:,i] += G[i,j] * A[:,j]
    print(X)
    print("\n", X.T.dot(X))

run_test()
我认为没有必要去理解这些代码的作用。对我来说,真正的谜团是为什么印刷版的声明会有任何不同

这段代码的目的是取一组非正交向量,作为矩阵a中的列向量,并返回一个正交化矩阵,该矩阵对向量集进行正交化,如下所示:


所以一个{正交}等价于代码中的X矩阵。将正交矩阵的转置与正交矩阵本身相乘,得到单位矩阵,只要print语句line1在其中,就得到单位矩阵。一旦你移除它,你也会得到对角项,这意味着矩阵甚至不是正交的。为什么?

至少有一个打字错误

for i in range(M_mem.shape[0]):
    norm += M_mem[i]**2
->

除此之外,我推荐以下更惯用的版本:

import numpy as np
cimport numpy as np
from libc.math cimport sqrt

def do_something(double[::1] M_mem):
    cdef:
        int i
        double norm_mv = 0
        double norm_np = 0
        double[::1] V = np.copy(np.asarray(M_mem))

    # Original implementation -- working
    norm_np = np.sqrt(np.dot(V, V))

    # My failed try with memoryview -- not working
    for i in range(M_mem.shape[0]):
        norm_mv += M_mem[i]**2
    norm_mv = sqrt(norm_mv)

    # norm_mv != norm_np
    return norm_np, norm_mv

导入和导入numpy并使用libc.math中的标量数学函数,而不是numpy版本。您仍然可以通过使用@cython.boundscheckFalse装饰例程来加快代码的速度。然后您需要cimport cython。

这里面有一个输入错误:norm Vs norm\u mv。你能确认这不是问题吗?你是对的,但那不是问题。我还要测试它们是否接近,而不是相等。这两个答案之间很可能会有微小的无意义的舍入误差差异。是的,你是对的,它们非常接近。我最初认为舍入误差虽然很小,但却是导致失败的原因。但我认为你是对的,这是毫无意义的。我现在偶然发现了一件非常奇怪的事情,我将更新帖子,尽管我不确定是否要为此打开一个新的线程…numpy.isclosenorm\u mv,norm\u np是比较浮点的更好方法。关于拼写错误,你是对的。我纠正了它。libc.math通常比numpy快还是仅在这种特定情况下更快?Cython可以在没有Python开销的情况下编译标量值上的libc.math函数,而numpy函数则不能。因此,当你在标量上操作时,速度会更快。对于数组操作,我想这取决于数组大小和NumPy函数相对于用户实现算法的优化。
import numpy as np
cimport numpy as np
from libc.math cimport sqrt

def do_something(double[::1] M_mem):
    cdef:
        int i
        double norm_mv = 0
        double norm_np = 0
        double[::1] V = np.copy(np.asarray(M_mem))

    # Original implementation -- working
    norm_np = np.sqrt(np.dot(V, V))

    # My failed try with memoryview -- not working
    for i in range(M_mem.shape[0]):
        norm_mv += M_mem[i]**2
    norm_mv = sqrt(norm_mv)

    # norm_mv != norm_np
    return norm_np, norm_mv