在python中处理实时数据，滚动窗口_Python_Numpy_Time

在python中处理实时数据，滚动窗口

python numpy time

在python中处理实时数据，滚动窗口,python,numpy,time,Python,Numpy,Time,我想创建一个函数，该函数将从文件中读取一系列时间值（采样率存在差距，这就是问题所在），并将读取我整整200天，并允许我遍历整个数据长度，例如10000天，类似于滚动窗口我不知道如何编码。我可以添加一个语句来计算时间变量（x轴）的两个值之间的差值，直到正好是200天吗或者我可以写一个函数，找到起始值，比如t0，然后找到数组中最接近t0+（interval=）200天的元素到目前为止，我得到的是： f = open(reading the file from directory) l

我想创建一个函数，该函数将从文件中读取一系列时间值（采样率存在差距，这就是问题所在），并将读取我整整200天，并允许我遍历整个数据长度，例如10000天，类似于滚动窗口

我不知道如何编码。我可以添加一个语句来计算时间变量（x轴）的两个值之间的差值，直到正好是200天吗

或者我可以写一个函数，找到起始值，比如t0，然后找到数组中最接近t0+（interval=）200天的元素

到目前为止，我得到的是：

  f = open(reading the file from directory)

  lines = f.readlines()
  print(len(lines))



  tx = np.array([]) # times 
  y= np.array([])
  interval = 200 # days 



  for li in lines:
     col = li.split()

     t0 = np.array([])
     t1 = np.array([])


     tx = np.append(tx, float(col[0]))
     y= np.append(y, float(col[1]))

  t0 = np.append(t0, np.max(tx))
  t1 = np.append(t1, tx[np.argmin(tx)])

  print(t0,t1)

  days = [t1 + dt.timedelta(days = float(x)) for x in days]
  #y = np.random.randn(len(days))

  # use pandas for convenient rolling function:
  df = pd.DataFrame({"day":tx, "value": y}).set_index("day")

 def closest_value(s):
     if s.shape[0]<2:
         return np.nan
     X = np.empty((s.shape[0]-1, 2))
     X[:, 0] = s[:-1]
     X[:, 1] = np.fabs(s[:-1]-s[-1])
     min_diff = np.min(X[:, 1])
     return X[X[:, 1]==min_diff, 0][0]

df['closest_value'] = df.rolling(window=dt.timedelta(days=200)) 
['value'].apply(closest_value, raw=True)
print(df.tail(5))

Output error: 

TypeError: float() argument must be a string or a number, not 
'datetime.datetime'

将numpy导入为np
作为pd进口熊猫
将日期时间导入为dt
#以天和y数组加载数据
# ... 或生成它们：
N=1000#天数
day_min=dt.datetime.strtime（'2000-01-01'，'%Y-%m-%d'）
日最大值=2000
days=np.sort（np.unique（np.random.uniform）（低=0，高=day\u max，大小=N）。astype（int）））
天数=[day_min+dt.timedelta（days=int（x））表示x的天数]
y=np.random.randn（len（天））
#使用熊猫进行方便的滚动功能：
df=pd.DataFrame（{“day”：days，“value”：y}）。设置索引（“day”）
def最接近的_值：
如果s.shape[0]可以使用pandas，请设置日期时间范围并创建while循环以成批处理数据
import pandas as pd
from datetime import datetime, timedelta

# Load data into pandas dataframe
df = pd.read_csv(filepath)

# Name columns
df.columns = ['dates', 'num_value']

# Convert strings to datetime
df.dates = pd.to_datetime(df['dates'], format='%d/%m/%Y')

# Print dates within a 200 day interval and move on to the next interval
i = 0
while i < len(df.dates):
    start = df.dates[i]
    end = start + timedelta(days=200)
    print(df.dates[(df.dates >= start) & (df.dates < end)])
    i += 200

当我运行这段代码时，它显示ValueError：数组的长度必须相同。谢谢！我已将y=np.random.randn（N）
更改为y=np.random.randn（len（days））我尝试将tx和y替换为days=days=[day\u min+dt.timedelta（days=int（x）），但它表示该函数的参数无效。如果我在上面的代码中使用变量tx和y，这个函数会发生什么变化？我还能用熊猫吗？对不起，我以前从未使用过pandas，尽管可以单独使用数组和列表来构建一些东西。感谢您写下您的tx和y的10个第一个值，请X=0.003372722575018，0.015239999629557 0.003366515509113，0.04582999726266 0.003385171061055，0.07536999974998 0.003385171061055，0.993219999596477 0.003366509113，1.0226996623 0.003378941085299，1.0521799964952 0.003369617612836 1.08166999975219 0.003397665493594 3.002589996981 0.003378941085299 3.0412099993756 0.003394537568711至约7000。谢谢。请举例说明如何输入.dat文件中的数据（仅两列）？同上，我还能用熊猫吗？还有x变量（以天为单位，不是日期格式）。谢谢，你仍然可以用熊猫。你能给我举一个你使用的日期格式的例子吗？这仅仅是一个工作日列表（星期一、星期二…）？嘿，我修改了我的帖子，以便包含x值的前10个输入（例如，0.0.015239999629557是一天的一部分（24*60*0.015239999629557~00:22）。我的y值只是振幅。谢谢
               value  closest_value
day                                
2005-06-15  1.668638       1.591505
2005-06-16  0.316645       0.304382
2005-06-17  0.458580       0.445592
2005-06-18 -0.846174      -0.847854
2005-06-22 -0.151687      -0.166404

import pandas as pd
from datetime import datetime, timedelta

# Load data into pandas dataframe
df = pd.read_csv(filepath)

# Name columns
df.columns = ['dates', 'num_value']

# Convert strings to datetime
df.dates = pd.to_datetime(df['dates'], format='%d/%m/%Y')

# Print dates within a 200 day interval and move on to the next interval
i = 0
while i < len(df.dates):
    start = df.dates[i]
    end = start + timedelta(days=200)
    print(df.dates[(df.dates >= start) & (df.dates < end)])
    i += 200

dates     num_value
2004-7-1  1
2004-7-2  5
2004-7-4  8
2004-7-5  11
2004-7-6  17

df = pd.read_table(filepath, sep="\s+", skiprows=1)