Python 如何使用numpy.correlate进行自相关?
我需要做一组数字的自相关,据我所知,这只是一组数字与自身的相关性 我试过使用numpy的相关函数,但我不相信结果,因为它几乎总是给出一个向量,其中第一个数字不是最大的,它应该是最大的 所以,这个问题实际上是两个问题:Python 如何使用numpy.correlate进行自相关?,python,math,numpy,numerical-methods,Python,Math,Numpy,Numerical Methods,我需要做一组数字的自相关,据我所知,这只是一组数字与自身的相关性 我试过使用numpy的相关函数,但我不相信结果,因为它几乎总是给出一个向量,其中第一个数字不是最大的,它应该是最大的 所以,这个问题实际上是两个问题: 他到底在干什么 如何使用它(或其他东西)进行自相关 为了回答您的第一个问题,numpy.correlate(a,v,mode)正在执行a与v相反的卷积运算,并给出按指定模式剪裁的结果。C(t)=∑ -∞ < i
为了回答您的第一个问题,
numpy.correlate(a,v,mode)
正在执行a
与v
相反的卷积运算,并给出按指定模式剪裁的结果。C(t)=∑ -∞ < i<∞ aivt+i在哪里-∞ < t<∞, 允许来自的结果-∞ 到∞, 但是很明显,你不能存储无限长的数组。所以它必须被剪裁,这就是模式的作用。有三种不同的模式:完全模式、相同模式和有效模式:
- “full”模式返回每个
的结果,其中t
和a
有一些重叠李>v
- “相同”模式返回与最短向量长度相同的结果(
或a
)李>v
- “有效”模式仅在
和a
完全重叠时返回结果。forv
提供了有关模式的更多详细信息numpy.convolve
numpy.correlate
给了你自相关,它也给了你多一点。自相关用于发现信号或函数在某个时差下与自身的相似程度。当时差为0时,自相关应该是最高的,因为信号本身是相同的,所以您希望自相关结果数组中的第一个元素是最大的。但是,相关性不是在时间差为0时开始的。它从负时差开始,接近0,然后变为正。也就是说,您期望:
自相关(a)=∑ -∞ < i<∞ aivt+i,其中0自相关有两种版本:统计和卷积。除了一点细节外,它们的作用是相同的:统计版本标准化为区间[-1,1]。下面是一个如何进行统计分析的示例:
def acf(x, length=20):
return numpy.array([1]+[numpy.corrcoef(x[:-i], x[i:])[0,1] \
for i in range(1, length)])
由于我刚刚遇到了同样的问题,我想与您分享几行代码。事实上,到目前为止,关于stackoverflow中的自相关的文章已经有好几篇了。如果将自相关定义为
a(x,L)=sum(k=0,N-L-1)((xk-xbar)*(x(k+L)-xbar))/sum(k=0,N-1)((xk-xbar)**2)
[这是IDL的a_-correlate函数中给出的定义,它与我在问题的答案2中看到的结果一致],那么下面的结果似乎是正确的:
import numpy as np
import matplotlib.pyplot as plt
# generate some data
x = np.arange(0.,6.12,0.01)
y = np.sin(x)
# y = np.random.uniform(size=300)
yunbiased = y-np.mean(y)
ynorm = np.sum(yunbiased**2)
acor = np.correlate(yunbiased, yunbiased, "same")/ynorm
# use only second half
acor = acor[len(acor)/2:]
plt.plot(acor)
plt.show()
正如你们所见,我用正弦曲线和均匀随机分布测试了这一点,两个结果看起来都像我所期望的。请注意,我使用了mode=“same”
而不是其他人使用的mode=“full”
使用函数而不是计算t滞后的统计相关性:
def autocorr(x, t=1):
return numpy.corrcoef(numpy.array([x[:-t], x[t:]]))
我认为OP问题的真正答案简明地包含在Numpy.correlate文档的摘录中:
mode : {'valid', 'same', 'full'}, optional
Refer to the `convolve` docstring. Note that the default
is `valid`, unlike `convolve`, which uses `full`.
这意味着,当与无“模式”定义一起使用时,当为其两个输入参数提供相同的向量时(即-用于执行自相关时),Numpy.CORREL函数将返回标量。我使用talib.CORREL进行自相关,就像这样,我怀疑您可以对其他软件包执行相同的操作:
def autocorrelate(x, period):
# x is a deep indicator array
# period of sample and slices of comparison
# oldest data (period of input array) may be nan; remove it
x = x[-np.count_nonzero(~np.isnan(x)):]
# subtract mean to normalize indicator
x -= np.mean(x)
# isolate the recent sample to be autocorrelated
sample = x[-period:]
# create slices of indicator data
correls = []
for n in range((len(x)-1), period, -1):
alpha = period + n
slices = (x[-alpha:])[:period]
# compare each slice to the recent sample
correls.append(ta.CORREL(slices, sample, period)[-1])
# fill in zeros for sample overlap period of recent correlations
for n in range(period,0,-1):
correls.append(0)
# oldest data (autocorrelation period) will be nan; remove it
correls = np.array(correls[-np.count_nonzero(~np.isnan(correls)):])
return correls
# CORRELATION OF BEST FIT
# the highest value correlation
max_value = np.max(correls)
# index of the best correlation
max_index = np.argmax(correls)
您的问题1已经在这里的几个优秀答案中进行了广泛讨论 我想和大家分享几行代码,这些代码允许您仅根据自相关的数学特性计算信号的自相关。即,可通过以下方式计算自相关:
def autocorrelation (x) :
"""
Compute the autocorrelation of the signal, based on the properties of the
power spectral density of the signal.
"""
xp = x-np.mean(x)
f = np.fft.fft(xp)
p = np.array([np.real(v)**2+np.imag(v)**2 for v in f])
pi = np.fft.ifft(p)
return np.real(pi)[:x.size/2]/np.sum(xp**2)
我是一名计算生物学家,当我不得不计算随机过程时间序列对之间的自相关/互相关时,我意识到,
np.correlate
并没有完成我需要的工作
事实上,
np.correlate
中似乎缺少的是在距离处对所有可能的时间点进行平均,我认为有两件事会让这个话题更加混乱:
import numpy
import matplotlib.pyplot as plt
def autocorr1(x,lags):
'''numpy.corrcoef, partial'''
corr=[1. if l==0 else numpy.corrcoef(x[l:],x[:-l])[0][1] for l in lags]
return numpy.array(corr)
def autocorr2(x,lags):
'''manualy compute, non partial'''
mean=numpy.mean(x)
var=numpy.var(x)
xp=x-mean
corr=[1. if l==0 else numpy.sum(xp[l:]*xp[:-l])/len(x)/var for l in lags]
return numpy.array(corr)
def autocorr3(x,lags):
'''fft, pad 0s, non partial'''
n=len(x)
# pad 0s to 2n-1
ext_size=2*n-1
# nearest power of 2
fsize=2**numpy.ceil(numpy.log2(ext_size)).astype('int')
xp=x-numpy.mean(x)
var=numpy.var(x)
# do fft and ifft
cf=numpy.fft.fft(xp,fsize)
sf=cf.conjugate()*cf
corr=numpy.fft.ifft(sf).real
corr=corr/var/n
return corr[:len(lags)]
def autocorr4(x,lags):
'''fft, don't pad 0s, non partial'''
mean=x.mean()
var=numpy.var(x)
xp=x-mean
cf=numpy.fft.fft(xp)
sf=cf.conjugate()*cf
corr=numpy.fft.ifft(sf).real/var/len(x)
return corr[:len(lags)]
def autocorr5(x,lags):
'''numpy.correlate, non partial'''
mean=x.mean()
var=numpy.var(x)
xp=x-mean
corr=numpy.correlate(xp,xp,'full')[len(x)-1:]/var/len(x)
return corr[:len(lags)]
if __name__=='__main__':
y=[28,28,26,19,16,24,26,24,24,29,29,27,31,26,38,23,13,14,28,19,19,\
17,22,2,4,5,7,8,14,14,23]
y=numpy.array(y).astype('float')
lags=range(15)
fig,ax=plt.subplots()
for funcii, labelii in zip([autocorr1, autocorr2, autocorr3, autocorr4,
autocorr5], ['np.corrcoef, partial', 'manual, non-partial',
'fft, pad 0s, non-partial', 'fft, no padding, non-partial',
'np.correlate, non-partial']):
cii=funcii(y,lags)
print(labelii)
print(cii)
ax.plot(lags,cii,label=labelii)
ax.set_xlabel('lag')
ax.set_ylabel('correlation coefficient')
ax.legend()
plt.show()
这里是ou
import numpy
import matplotlib.pyplot as plt
def autocorr1(x,lags):
'''numpy.corrcoef, partial'''
corr=[1. if l==0 else numpy.corrcoef(x[l:],x[:-l])[0][1] for l in lags]
return numpy.array(corr)
def autocorr2(x,lags):
'''manualy compute, non partial'''
mean=numpy.mean(x)
var=numpy.var(x)
xp=x-mean
corr=[1. if l==0 else numpy.sum(xp[l:]*xp[:-l])/len(x)/var for l in lags]
return numpy.array(corr)
def autocorr3(x,lags):
'''fft, pad 0s, non partial'''
n=len(x)
# pad 0s to 2n-1
ext_size=2*n-1
# nearest power of 2
fsize=2**numpy.ceil(numpy.log2(ext_size)).astype('int')
xp=x-numpy.mean(x)
var=numpy.var(x)
# do fft and ifft
cf=numpy.fft.fft(xp,fsize)
sf=cf.conjugate()*cf
corr=numpy.fft.ifft(sf).real
corr=corr/var/n
return corr[:len(lags)]
def autocorr4(x,lags):
'''fft, don't pad 0s, non partial'''
mean=x.mean()
var=numpy.var(x)
xp=x-mean
cf=numpy.fft.fft(xp)
sf=cf.conjugate()*cf
corr=numpy.fft.ifft(sf).real/var/len(x)
return corr[:len(lags)]
def autocorr5(x,lags):
'''numpy.correlate, non partial'''
mean=x.mean()
var=numpy.var(x)
xp=x-mean
corr=numpy.correlate(xp,xp,'full')[len(x)-1:]/var/len(x)
return corr[:len(lags)]
if __name__=='__main__':
y=[28,28,26,19,16,24,26,24,24,29,29,27,31,26,38,23,13,14,28,19,19,\
17,22,2,4,5,7,8,14,14,23]
y=numpy.array(y).astype('float')
lags=range(15)
fig,ax=plt.subplots()
for funcii, labelii in zip([autocorr1, autocorr2, autocorr3, autocorr4,
autocorr5], ['np.corrcoef, partial', 'manual, non-partial',
'fft, pad 0s, non-partial', 'fft, no padding, non-partial',
'np.correlate, non-partial']):
cii=funcii(y,lags)
print(labelii)
print(cii)
ax.plot(lags,cii,label=labelii)
ax.set_xlabel('lag')
ax.set_ylabel('correlation coefficient')
ax.legend()
plt.show()
import numpy as np
def auto_corrcoef(x):
return np.corrcoef(x[1:-1], x[2:])[0,1]
def autocorr1(x):
r2=np.fft.ifft(np.abs(np.fft.fft(x))**2).real
return r2[:len(x)//2]
def autocorr2(x):
r2=np.fft.ifft(np.abs(np.fft.fft(x))**2).real
c=(r2/x.shape-np.mean(x)**2)/np.std(x)**2
return c[:len(x)//2]
def autocorr(x):
result = numpy.correlate(x, x, mode='full')
return result[result.size/2:]
import matplotlib.pyplot as plt
def plot_autocorr(returns, lags):
autocorrelation = []
for lag in range(lags+1):
corr_lag = returns.corr(returns.shift(-lag))
autocorrelation.append(corr_lag)
plt.plot(range(lags+1), autocorrelation, '--o')
plt.xticks(range(lags+1))
return np.array(autocorrelation)
from statsmodels.tsa import stattools
# x = 1-D array
# Yield normalized autocorrelation function of number lags
autocorr = stattools.acf( x )
# Get autocorrelation coefficient at lag = 1
autocorr_coeff = autocorr[1]