Numpy Cython一维归一化滑动互相关优化
我有以下代码,它在python中对两个信号进行归一化互相关,以查找相似性:Numpy Cython一维归一化滑动互相关优化,numpy,cython,Numpy,Cython,我有以下代码,它在python中对两个信号进行归一化互相关,以查找相似性: def normcorr(template,srchspace): template=(template-np.mean(template))/(np.std(template)*len(template)) # Normalize template CCnorm=srchspace.copy() CCnorm=CCnorm[np.shape(template)[0]:] # trim CC matrix for a i
def normcorr(template,srchspace):
template=(template-np.mean(template))/(np.std(template)*len(template)) # Normalize template
CCnorm=srchspace.copy()
CCnorm=CCnorm[np.shape(template)[0]:] # trim CC matrix
for a in range(len(CCnorm)):
s=srchspace[a:a+np.shape(template)[0]]
sp=(s-np.mean(s))/np.std(s)
CCnorm[a]=numpy.sum(numpy.multiply(template,sp))
return CCnorm
但正如你所想象的,它太慢了。查看cython文档,在使用原始python执行循环时,速度有望大幅提高。因此,我尝试编写一些cython代码,其中包含变量的数据类型,如下所示:
from __future__ import division
import numpy as np
import math as m
cimport numpy as np
cimport cython
def normcorr(np.ndarray[np.float32_t, ndim=1] template,np.ndarray[np.float32_t, ndim=1] srchspace):
cdef int a
cdef np.ndarray[np.float32_t, ndim=1] s
cdef np.ndarray[np.float32_t, ndim=1] sp
cdef np.ndarray[np.float32_t, ndim=1] CCnorm
template=(template-np.mean(template))/(np.std(template)*len(template))
CCnorm=srchspace.copy()
CCnorm=CCnorm[len(template):]
for a in range(len(CCnorm)):
s=srchspace[a:a+len(template)]
sp=(s-np.mean(s))/np.std(s)
CCnorm[a]=np.sum(np.multiply(template,sp))
return CCnorm
但是一旦我编译了它,代码实际上比纯python代码运行得慢。我在这里发现()从cython调用numpy可能会显著降低代码的速度,这是我的代码的问题吗,在这种情况下,我必须定义内联函数来替换对np的所有调用,还是我遗漏了其他一些错误?因为在cython循环中调用numpy函数,速度不会提高 如果您使用pandas,您可以在numpy中使用
roll\u mean()
和roll\u std()
和convolve()
快速进行计算,代码如下:
import numpy as np
import pandas as pd
np.random.seed()
def normcorr(template,srchspace):
template=(template-np.mean(template))/(np.std(template)*len(template)) # Normalize template
CCnorm=srchspace.copy()
CCnorm=CCnorm[np.shape(template)[0]:] # trim CC matrix
for a in range(len(CCnorm)):
s=srchspace[a:a+np.shape(template)[0]]
sp=(s-np.mean(s))/np.std(s)
CCnorm[a]=np.sum(np.multiply(template,sp))
return CCnorm
def fast_normcorr(t, s):
n = len(t)
nt = (t-np.mean(t))/(np.std(t)*n)
sum_nt = nt.sum()
a = pd.rolling_mean(s, n)[n-1:-1]
b = pd.rolling_std(s, n)[n-1:-1]
b *= np.sqrt((n-1.0) / n)
c = np.convolve(nt[::-1], s, mode="valid")[:-1]
result = (c - sum_nt * a) / b
return result
n = 100
m = 1000
t = np.random.rand(n)
s = np.random.rand(m)
r1 = normcorr(t, s)
r2 = fast_normcorr(t, s)
assert np.allclose(r1, r2)
您可以检查结果r1
和r2
是否相同。下面是timeit
test:
%timeit normcorr(t, s)
%timeit fast_normcorr(t, s)
输出:
10 loops, best of 3: 59 ms per loop
1000 loops, best of 3: 273 µs per loop
速度快了200倍。因为在cython循环中调用numpy函数,所以速度不会提高 如果您使用pandas,您可以在numpy中使用
roll\u mean()
和roll\u std()
和convolve()
快速进行计算,代码如下:
import numpy as np
import pandas as pd
np.random.seed()
def normcorr(template,srchspace):
template=(template-np.mean(template))/(np.std(template)*len(template)) # Normalize template
CCnorm=srchspace.copy()
CCnorm=CCnorm[np.shape(template)[0]:] # trim CC matrix
for a in range(len(CCnorm)):
s=srchspace[a:a+np.shape(template)[0]]
sp=(s-np.mean(s))/np.std(s)
CCnorm[a]=np.sum(np.multiply(template,sp))
return CCnorm
def fast_normcorr(t, s):
n = len(t)
nt = (t-np.mean(t))/(np.std(t)*n)
sum_nt = nt.sum()
a = pd.rolling_mean(s, n)[n-1:-1]
b = pd.rolling_std(s, n)[n-1:-1]
b *= np.sqrt((n-1.0) / n)
c = np.convolve(nt[::-1], s, mode="valid")[:-1]
result = (c - sum_nt * a) / b
return result
n = 100
m = 1000
t = np.random.rand(n)
s = np.random.rand(m)
r1 = normcorr(t, s)
r2 = fast_normcorr(t, s)
assert np.allclose(r1, r2)
您可以检查结果r1
和r2
是否相同。下面是timeit
test:
%timeit normcorr(t, s)
%timeit fast_normcorr(t, s)
输出:
10 loops, best of 3: 59 ms per loop
1000 loops, best of 3: 273 µs per loop
它快了200倍。如果您使用cython-a编译代码并查看HTML输出,您会发现Python开销很大
@cython.boundscheck(False)
@cython.cdivision(True) # Don't check for divisions by 0
def normcorr(np.ndarray[np.float32_t, ndim=1] template,np.ndarray[np.float32_t, ndim=1] srchspace):
cdef int a
cdef int N = template.shape[0]
cdef NCC = srchspace.shape[0] - N
cdef np.ndarray[np.float32_t, ndim=1] s
cdef np.ndarray[np.float32_t, ndim=1] sp
cdef np.ndarray[np.float32_t, ndim=1] CCnorm
template=(template - template.mean()) / (template.std() * N)
CCnorm=srchspace[N:].copy() # You don't need to copy the whole array
for a in xrange(NCC): # Use xrange in Python2
s=srchspace[a:a+N]
sp=(s-np.mean(s)) / np.std(s)
CCnorm[a]= (template * sp).sum()
return CCnorm
为了提高性能,您可以优化最后两行:
@cython.boundscheck(False)
@cython.cdivision(True)
cdef multiply_by_normalised(np.ndarray[np.float32_t, ndim=1] template, np.ndarray[np.float32_t, ndim=1] s):
cdef int i
cdef int N = template.shape[0]
cdef float_32_t mean, std, out = 0
mean = s.mean()
std = s.std()
for i in xrange(N):
out += (s[i] - mean) / std * template[i]
return out
如果您仍然需要压缩更多的时间,您可以使用瓶颈的
mean
和std
函数,它们比Numpy快。如果您使用cython-a
编译代码并查看HTML输出,您将看到有大量Python开销
@cython.boundscheck(False)
@cython.cdivision(True) # Don't check for divisions by 0
def normcorr(np.ndarray[np.float32_t, ndim=1] template,np.ndarray[np.float32_t, ndim=1] srchspace):
cdef int a
cdef int N = template.shape[0]
cdef NCC = srchspace.shape[0] - N
cdef np.ndarray[np.float32_t, ndim=1] s
cdef np.ndarray[np.float32_t, ndim=1] sp
cdef np.ndarray[np.float32_t, ndim=1] CCnorm
template=(template - template.mean()) / (template.std() * N)
CCnorm=srchspace[N:].copy() # You don't need to copy the whole array
for a in xrange(NCC): # Use xrange in Python2
s=srchspace[a:a+N]
sp=(s-np.mean(s)) / np.std(s)
CCnorm[a]= (template * sp).sum()
return CCnorm
为了提高性能,您可以优化最后两行:
@cython.boundscheck(False)
@cython.cdivision(True)
cdef multiply_by_normalised(np.ndarray[np.float32_t, ndim=1] template, np.ndarray[np.float32_t, ndim=1] s):
cdef int i
cdef int N = template.shape[0]
cdef float_32_t mean, std, out = 0
mean = s.mean()
std = s.std()
for i in xrange(N):
out += (s[i] - mean) / std * template[i]
return out
如果您仍然需要挤出更多的时间,您可以使用瓶颈的
mean
和std
函数,这些函数比Numpy快。这正是我想要的,而且代码要简单得多。我无法感谢您,因为它完全符合我的要求,并且代码更简单。我对你感激不尽