Pandas 在数据帧上迭代_Pandas

Pandas 在数据帧上迭代

pandas

Pandas 在数据帧上迭代,pandas,Pandas,这是一个新的问题，但接下来我有一个timeseries值的数据帧，如下所示 Index Value A Value B 2019-02-05 18:00:00 1.16 6.32 2019-02-05 17:00:00 1.1475 23.7825 2019-02-05 18:00:00 1.16 6.32 2019-02-05 17:00:00 1.1475 23.7825 2019-02

这是一个新的问题，但接下来

我有一个timeseries值的数据帧，如下所示

Index                  Value A    Value B
2019-02-05 18:00:00    1.16       6.32
2019-02-05 17:00:00    1.1475     23.7825
2019-02-05 18:00:00    1.16       6.32
2019-02-05 17:00:00    1.1475     23.7825
2019-02-05 16:00:00    0.4125     23.7825
2019-02-05 15:00:00    0.0        31.71
2019-02-05 14:00:00    0.0        23.7825
2019-02-05 13:00:00    1.015      23.7825
2019-02-05 12:00:00    0.24       23.7825

对于数据帧中的每一行，我想识别前面的最后24个值，并根据原始索引将它们写入新的数据帧。结果会是这样的

Index                  Time diff Value A    Value B
2019-02-05 18:00:00     0         1.16       6.32
2019-02-05 18:00:00     -1        1.147      23.7825
2019-02-05 18:00:00     -2        1.16       6.32
2019-02-05 18:00:00     -3        1.147      23.7825
2019-02-05 18:00:00     etc...    etc....    etc....
2019-02-05 18:00:00     -23       1.147      23.7825

因此，总结一下，对于原始数据帧中的每一行，我将在新数据帧中得到24行，并用一个新列指示时间延迟

此操作的原因是为机器学习准备数据，其中索引是我们的目标，24个相关历史值是预测变量

目前我正在尝试使用类似于

for i, row in be_hour.iterrows():
    if <something>:
        df.at[i, 'ifor'] = x
    else:
        df.at[i, 'ifor'] = y

但由于缺乏经验，我正在努力实现这一点。

我想出了一个解决方案：

import pandas as pd

idx = ['2019-02-05 18:00:00',
       '2019-02-05 17:00:00', 
       '2019-02-05 16:00:00',
       '2019-02-05 15:00:00',
       '2019-02-05 14:00:00',
       '2019-02-05 13:00:00',
       '2019-02-05 12:00:00']

A = [1.16, 1.1475, 1.1475, 0.4125, 0.0, 1.015, 0.24]
B = [6.32, 23.7825, 23.7825, 23.7825, 23.7825, 23.7825, 23.7825]

idx = [pd.Timestamp(t) for t in idx]
idx = pd.Index(idx)
d = {'A': A, 'B': B}
df = pd.DataFrame(data = d)
df = df.set_index(idx)

df1 = pd.DataFrame()
for i in df.index:
    top = i
    bot = top - pd.Timedelta('3 hour')
    result = df.loc[top:bot]
    for j in result.index:
        diff = top - j
        row = {'timestamp': top, 'diff': diff, 'A': df.A[j], 'B': df.B[j]}
        df1 = df1.append(row, ignore_index=True)
df1 = df1.set_index('timestamp')

非常感谢，这非常好用。非常感谢你的帮助。