Python 熊猫:如果满足条件,则在数据帧中包含新的时间戳行
我有一个如下所示的数据帧:Python 熊猫:如果满足条件,则在数据帧中包含新的时间戳行,python,pandas,dataframe,timestamp,Python,Pandas,Dataframe,Timestamp,我有一个如下所示的数据帧: value timestamp 18.832939 2019-03-04 12:37:26 UTC 18.832939 2019-03-04 12:38:26 UTC 18.832939 2019-03-04 12:39:27 UTC 18.955200 2019-03-04 12:40:28 UTC 18.784912 2019-03-04 12:44:32 UTC 18.784912 2019-03-
value timestamp
18.832939 2019-03-04 12:37:26 UTC
18.832939 2019-03-04 12:38:26 UTC
18.832939 2019-03-04 12:39:27 UTC
18.955200 2019-03-04 12:40:28 UTC
18.784912 2019-03-04 12:44:32 UTC
18.784912 2019-03-04 12:45:33 UTC
20.713936 2019-03-04 17:59:36 UTC
20.871742 2019-03-04 18:08:31 UTC
20.871742 2019-03-04 18:09:32 UTC
20.873871 2019-03-04 18:10:32 UTC
我希望得到以下结果,其中我确定了大于2分钟但小于15分钟2 value timestamp
18.832939 2019-03-04 12:37:26 UTC
18.832939 2019-03-04 12:38:26 UTC
18.832939 2019-03-04 12:39:27 UTC
18.955200 2019-03-04 12:40:28 UTC
NaN 2019-03-04 12:41:28 UTC
NaN 2019-03-04 12:42:28 UTC
NaN 2019-03-04 12:43:28 UTC
18.784912 2019-03-04 12:44:32 UTC
18.784912 2019-03-04 12:45:33 UTC
20.713936 2019-03-04 17:59:36 UTC
NaN 2019-03-04 18:00:36 UTC
NaN 2019-03-04 18:01:36 UTC
NaN 2019-03-04 18:02:36 UTC
NaN 2019-03-04 18:03:36 UTC
NaN 2019-03-04 18:04:36 UTC
NaN 2019-03-04 18:05:36 UTC
NaN 2019-03-04 18:06:36 UTC
NaN 2019-03-04 18:07:36 UTC
20.871742 2019-03-04 18:08:31 UTC
20.871742 2019-03-04 18:09:32 UTC
20.873871 2019-03-04 18:10:32 UTC
这意味着,为了实现这一目标,我必须做两件事:
确定间隙在哪里符合我想要的条件。因为我们可能会有超过15分钟的差距,我对此不感兴趣。
一旦确定,用1分钟的增量或带有时间戳的均匀间隔值创建新行。
我可以用这个做第一个:
df['aux_1'] = ((df['timestamp'].diff() > '0 days 00:02:00') & (df['timestamp'].diff() < '0 days 00:15:00')).astype(int) #get ending of the gap.
df['aux_2'] = df['aux_1'].shift(-1) #beginning of the gap.
df['intervals'] = df['aux_1'] + df['aux_2'] #both beginning and end with numeric consecutive flags contained in a single column.
但是,我不知道如何做第二部分,至少熊猫不喜欢。以某种方式确定我想要填充的时间戳间隔的开始和结束,然后应用asfreq'1m',并使用该向量来填充我想要的间隔,这将是理想的。只是我不知道该怎么做
有人能帮我吗?提前谢谢 熊猫不太喜欢,但我会做以下事情
new_timestamp = []
for i, row in df.iterrows():
if row['aux_2']==0:
new_timestamp.append(row['timestamp'])
elif row['aux_2']==1:
new_timestamp += pd.date_range(row['timestamp'], df.iloc[i+1]['timestamp'], freq='min').to_list()
new_df = df.set_index('timestamp')
new_df = new_df.loc[new_timestamp]
这导致
print(new_df)
timestamp value aux_1 aux_2 intervals
2019-03-04 12:37:26+00:00 18.832939 0.0 0.0 0.0
2019-03-04 12:38:26+00:00 18.832939 0.0 0.0 0.0
2019-03-04 12:39:27+00:00 18.832939 0.0 0.0 0.0
2019-03-04 12:40:28+00:00 18.955200 0.0 1.0 1.0
2019-03-04 12:41:28+00:00 NaN NaN NaN NaN
2019-03-04 12:42:28+00:00 NaN NaN NaN NaN
2019-03-04 12:43:28+00:00 NaN NaN NaN NaN
2019-03-04 12:44:28+00:00 NaN NaN NaN NaN
2019-03-04 12:44:32+00:00 18.784912 1.0 0.0 1.0
2019-03-04 12:45:33+00:00 18.784912 0.0 0.0 0.0
2019-03-04 17:59:36+00:00 20.713936 0.0 1.0 1.0
2019-03-04 18:00:36+00:00 NaN NaN NaN NaN
2019-03-04 18:01:36+00:00 NaN NaN NaN NaN
2019-03-04 18:02:36+00:00 NaN NaN NaN NaN
2019-03-04 18:03:36+00:00 NaN NaN NaN NaN
2019-03-04 18:04:36+00:00 NaN NaN NaN NaN
2019-03-04 18:05:36+00:00 NaN NaN NaN NaN
2019-03-04 18:06:36+00:00 NaN NaN NaN NaN
2019-03-04 18:07:36+00:00 NaN NaN NaN NaN
2019-03-04 18:08:31+00:00 20.871742 1.0 0.0 1.0
2019-03-04 18:09:32+00:00 20.871742 0.0 0.0 0.0
我的建议是:1生成一个带有时间戳列的数据帧,间隔为1分钟。2将数据连接回新创建的数据帧中,使用key:timestamp,精确到分钟级别。这很好,我的意思是,我希望有一种方法可以做到这一点,而不必将时间戳设置为索引。但是如果我想的话,我可以改变列的位置,并设置一个新的数字升序索引。谢谢你的支持!