Algorithm 什么技术可以有效地发现任意数据点的周期性？_Algorithm_Signal Processing

Algorithm 什么技术可以有效地发现任意数据点的周期性？

algorithm

Algorithm 什么技术可以有效地发现任意数据点的周期性？,algorithm,signal-processing,Algorithm,Signal Processing,所谓“任意”，我的意思是，我没有一个信号采样的网格，这是适合采取FFT。我只知道事件发生的时间点，我想知道发生率的估计值，例如： p = [0, 1.1, 1.9, 3, 3.9, 6.1 ...] …可能是来自标称周期（重复间隔）为1.0的过程的命中，但存在噪声和一些漏检有处理这些数据的众所周知的方法吗？听起来你需要决定到底要确定什么。如果您想知道一组时间戳中的平均间隔，那么这很容易（只需取平均值或中位数）如果你认为时间间隔可能在变化，那么你需要知道它变化的速度有多快。然后你可以找到一个

所谓“任意”，我的意思是，我没有一个信号采样的网格，这是适合采取FFT。我只知道事件发生的时间点，我想知道发生率的估计值，例如：

p = [0, 1.1, 1.9, 3, 3.9, 6.1 ...]

…可能是来自标称周期（重复间隔）为1.0的过程的命中，但存在噪声和一些漏检

有处理这些数据的众所周知的方法吗？

听起来你需要决定到底要确定什么。如果您想知道一组时间戳中的平均间隔，那么这很容易（只需取平均值或中位数）

如果你认为时间间隔可能在变化，那么你需要知道它变化的速度有多快。然后你可以找到一个窗口移动平均线。您需要了解它的变化速度，以便可以适当地选择窗口大小-较大的窗口将为您提供更平滑的结果，但较小的窗口将对更快的变化速度作出更大的响应

如果您不知道数据是否遵循任何类型的模式，那么您可能正在进行数据探索。在这种情况下，我将从绘制间隔开始，以查看眼睛是否看到图案。如果数据非常嘈杂，应用移动平均值也可能有好处

从本质上讲，数据中是否存在某些内容及其含义取决于您和您对该领域的知识。也就是说，在任何一组时间戳中都会有一个平均值（您也可以轻松地计算方差，以给出数据可变性的指示），但这个平均值是否具有任何意义取决于您。

听起来您需要决定您到底想要确定什么。如果您想知道一组时间戳中的平均间隔，那么这很容易（只需取平均值或中位数）

从本质上讲，数据中是否存在某些内容及其含义取决于您和您对该领域的知识。也就是说，在任何一组时间戳中都会有一个平均值（您也可以轻松地计算方差以给出数据可变性的指示），但该平均值是否具有任何意义取决于您。

如果正确初始化，最小二乘算法可能会起到作用。为此，可以采用聚类方法

当执行FFT时，信号被描述为正弦波之和。频率的振幅可被描绘为由频率变化产生的。因此，如果信号采样不均匀，如果要估计傅里叶变换，则解决相同的最小二乘问题可能有意义。如果应用于均匀采样的信号，则归结为相同的结果

由于您的信号是descrete，您可能希望将其拟合为的和。将距离最近的狄拉克梳子的平方和最小化似乎更合理。这是一个非线性优化问题，其中狄拉克梳子由其周期和偏移量来描述。这个非线性最小二乘问题可以通过下列方法求解。下面是一个使用该函数的python示例。此外，可以如中所述估计估计周期和偏移的误差。它也记录在和中

然而，半个周期，或周期的三分之一，…，也将适用，并且周期的倍数是局部极小值，可通过优化初始化来避免。为此，可以对事件时间之间的差异进行聚类，聚类的最小值为预期周期的值。如中所述，应用了聚类功能

请注意，该过程可以扩展到多维数据，以查找具有不同基本周期的周期模式或混合周期模式

import numpy as np

from scipy.optimize import least_squares
from scipy.optimize import leastsq

from sklearn.cluster import MeanShift, estimate_bandwidth

ticks=[0,1.1,1.9,3,3.9,6.1]

import scipy
print scipy.__version__


def crudeEstimate():
    # loooking for the period by looking at differences between values :
    diffs=np.zeros(((len(ticks))*(len(ticks)-1))/2)
    k=0
    for i in range(len(ticks)):
        for j in range(i):
            diffs[k]=ticks[i]-ticks[j]
            k=k+1
    #see https://stackoverflow.com/questions/18364026/clustering-values-by-their-proximity-in-python-machine-learning
    X = np.array(zip(diffs,np.zeros(len(diffs))), dtype=np.float)
    bandwidth = estimate_bandwidth(X, quantile=1.0/len(ticks))
    ms = MeanShift(bandwidth=bandwidth, bin_seeding=True)
    ms.fit(X)
    labels = ms.labels_
    cluster_centers = ms.cluster_centers_
    print cluster_centers
    labels_unique = np.unique(labels)
    n_clusters_ = len(labels_unique)

    for k in range(n_clusters_):
        my_members = labels == k
        print "cluster {0}: {1}".format(k, X[my_members, 0])
    estimated_period=np.min(cluster_centers[:,0])
    return estimated_period

def disttoDiracComb(x):
    residual=np.zeros((len(ticks)))
    for i in range(len(ticks)):
        mindist=np.inf
        for j in range(len(x)/2):
            offset=x[2*j+1]
            period=x[2*j]
            #print period, offset
            index=np.floor((ticks[i]-offset)/period)
           
            #print 'index', index
            currdist=ticks[i]-(index*period+offset)
            
            if currdist>0.5*period:
                 currdist=period-currdist
                 index=index+1
            #print 'event at ',ticks[i], 'not far from index ',index, '(', currdist, ')'
            #currdist=currdist*currdist
            #print currdist
            if currdist<mindist:
                 mindist=currdist
        residual[i]=mindist
    #residual=residual-period*period
    #print x, residual
    return residual


estimated_period=crudeEstimate()
print 'crude estimate by clustering :',estimated_period

xp=np.array([estimated_period,0.0])
#res_1 = least_squares(disttoDiracComb, xp,method='lm',xtol=1e-15,verbose=1)


p,pcov,infodict,mesg,ier=leastsq(disttoDiracComb, x0=xp,ftol=1e-18, full_output=True)
#print ' p is ',p, 'covariance is ', pcov

# see https://stackoverflow.com/questions/14581358/getting-standard-errors-on-fitted-parameters-using-the-optimize-leastsq-method-i
s_sq = (disttoDiracComb(p)**2).sum()/(len(ticks)-len(p))
pcov=pcov *s_sq

perr = np.sqrt(np.diag(pcov))
#print 'estimated standard deviation on parameter :' , perr 

print 'estimated period is ', p[0],' +/- ', 1.96*perr[0]
print 'estimated offset is ', p[1],' +/- ', 1.96*perr[1]

如果正确初始化，最小二乘算法可以实现这一点。为此，可以采用聚类方法

请注意

crude estimate by clustering : 0.975
estimated period is  1.0042857141346768  +/-  0.04035792507868619
estimated offset is  -0.011428571139828817  +/-  0.13385206912205957