Python中的平滑_Python_Numpy_Convolution_Smoothing

Python中的平滑

python numpy

Python中的平滑,python,numpy,convolution,smoothing,Python,Numpy,Convolution,Smoothing,我经常对数据使用时间平均视图，以便在绘图时噪音更小。例如，如果我的数据每1分钟采集一次，那么我有两个数组，ts和ys。然后我创建了fs，这是ys中60个最近点的局部平均值。我自己通过简单地计算60个最近点的平均值来进行卷积，所以我不使用numpy中的任何模块或任何其他模块我有新的数据，其中ts有点稀疏。也就是说，有时我会错过一些数据点，所以我不能简单地取60个最近点的平均值。如果自变量ts以分钟为单位，如何计算因变量ys的小时平均值，以在python中创建小时平均值函数fs 如果自变量ts以分

我经常对数据使用时间平均视图，以便在绘图时噪音更小。例如，如果我的数据每1分钟采集一次，那么我有两个数组，

ts

和

ys

。然后我创建了

fs

，这是

ys

中60个最近点的局部平均值。我自己通过简单地计算60个最近点的平均值来进行卷积，所以我不使用

numpy

中的任何模块或任何其他模块

我有新的数据，其中

ts

有点稀疏。也就是说，有时我会错过一些数据点，所以我不能简单地取60个最近点的平均值。如果自变量

ts

以分钟为单位，如何计算因变量

ys

的小时平均值，以在

python

中创建小时平均值函数

fs

如果自变量ts以分钟为单位，如何计算因变量ys的小时平均值，以在python中创建小时平均值函数fs

这是一个复杂的问题，根据你所说的“每小时平均值”，可能的答案有很大不同

处理不规则间隔数据的一种方法是对其重新采样。重采样可以通过插值完成，得到的重采样数据可以用于您喜欢的任何过滤方法

import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import savgol_filter
%matplotlib inline

def y(t):
    # a function to simulate data
    return np.sin(t/20.) + 0.05*np.random.randn(len(t))

four_hours = np.arange(240)
random_time_points = np.sort(np.random.choice(four_hours, size=30, replace=False))

simulated_data = y(random_time_points)
resampled_data = np.interp(four_hours, random_time_points, simulated_data)

# here I smooth with a Savitzky-Golay filter, 
#  but you could use a moving avg or anything else
#  the window-length=61 means smooth over a 1-hour (60 minute) window
smoothed_data = savgol_filter(resampled_data, window_length=61, polyorder=0)

# plot some results
plt.plot(random_time_points, simulated_data, '.k', 
         four_hours, smoothed_data, '--b',
         four_hours, y(four_hours), '-g')

# save plot
plt.savefig('SO35038933.png')

该图显示原始“稀疏”数据（黑点）、原始“真实”数据（绿色曲线）和平滑数据（蓝色虚线曲线）。

如果我理解正确，我认为类似的方法可能有效

import threading    

hours_worth_of_data = []    
def timer():
  threading.Timer(3600, timer).start()  #3600 seconds in an hour
  smooth = sum(hours_worth_of_data) / len(hours_worth_of_data)
  # do something with smooth here
  del hours_worth_of_data[:]  #Start over with fresh data for next hour
timer()

无论何时获得数据，也要将数据加载到“hours\u worth\u of\u data”中。每小时它将平均数据，然后删除列表中的数据

我最终创建了一个数组，以我感兴趣的时间单位表示数据，然后对该数组执行统计。例如，将“分钟”时间创建为“小时”时间，并在该小时内使用平均值

ys

：

for i in range(len(ts0)):
    tM = ts0[i] # time in minutes
    tH = tM/60.0 # time in hours

    tHs[i] = int(tH) # now this is the data within hour tH

tHs0 = tHs[:] # keep a record of the original hourly times, there are repeats here
tHs = list(set(tHs0)) # now we have a list of the hours with no repeats

for i in range(len(ts0)):
    y0 = ys0[i]
    t0 = ts0[i]
    tH = int(t0/60.0)
    ys[tHs.index(tH)] += R0
    Cs[tHs.index(tH)] += 1 # keep a record of how many times this was seen

for i in range(len(ys)):
    ys[i] = ys[i]/Cs[i]

这可能是一个解决方案：从+/-30分钟内取点，求和并除以计数，有什么问题吗？这就是我最后所做的。我将在下面发布我的解决方案。我不知道为什么会将其标记为副本。假设的先行问题涉及熊猫，比这个问题更专业。不幸的是，我的数据太大，此解决方案占用内存太多，因此不可行。您的数据中有多少时间点？如果内存有限，则没有理由不能将数据分为块，并且我描述的过程应用于每个块。True。因此，我没有理由不能使用MPI在C中编写此代码，并将其放在非统一内存体系结构上，在该体系结构中，内存可以分布在多个主机之间。不幸的是，开发时间是一个因素，我正在使用python以最简单的麻烦快速而肮脏的解决方案。。。lotsIf如果您对如何分块数据感到困惑，请检查以获取一些想法。这是一种很常见的战术。