Python 滚动窗口无法从复制轴重新索引
我正在尝试使用滚动窗口根据条件获取行之间的时间差 我的数据集就像Python 滚动窗口无法从复制轴重新索引,python,pandas,Python,Pandas,我正在尝试使用滚动窗口根据条件获取行之间的时间差 我的数据集就像 Time Type ConditionA default index 00:00 A True 0 00:00 A Flase 1 00:00 A True 2 00:01 B True 3 00:01 A True
Time Type ConditionA default index
00:00 A True 0
00:00 A Flase 1
00:00 A True 2
00:01 B True 3
00:01 A True 4
00:01 B True 5
我的目的是在10秒的滚动窗口中获得相同类型的时间差
如果两个条件a都为真
第5行的时差将为0,因为第5行和第3行的类型相同,并且两个条件A都为真
我的最终数据集如下所示
Time Type ConditionA default index Time difference
00:00 A True 0 N/A (or -1 )
00:00 A Flase 1 N/A (or -1 )
00:00 A True 2 0s
00:01 B True 3 N/A (or -1 )
00:01 A True 4 1s
00:01 B True 5 0s
我尝试了以下方法
df.groupby('Type',sort = False).apply(lambda win: win.rolling('10s').apply(test_func))
def test_func(win):
target_value = win['ConditionA'].values[-1]
if(len(win)>1 ):
qualified_rows = win.loc[win['ConditionA'].values == target_value]
target_row = qualified_rows.iloc[[-2]]
current_row = win.iloc[[-1]]
time_difference = current_row.index - target_row.index
return pd.Series(time_difference ,index= win.iloc[[-1]].index )
else:
return pd.Series(-1,index= win.iloc[[-1]].index )
然而,它又回来了
ValueError: cannot reindex from a duplicate axis
这是因为我将时间设置为索引,并且时间具有重复性
我还尝试了以下方法
df.groupby('Type',sort = False).apply(lambda win: win.rolling('10s').apply(test_func))
def test_func(win):
target_value = win['ConditionA'].values[-1]
if(len(win)>1 ):
qualified_rows = win.loc[win['ConditionA'].values == target_value]
target_row = qualified_rows.iloc[[-2]]
current_row = win.iloc[[-1]]
time_difference = current_row.index - target_row.index
return pd.Series(time_difference ,index= win.iloc[[-1]].index )
else:
return pd.Series(-1,index= win.iloc[[-1]].index )
ValueError: window must be an integer
import pandas as pd
from datetime import datetime, timedelta
df = pd.DataFrame({'Time':[datetime.now(),datetime.now(),datetime.now(),datetime.now(),datetime.now(),datetime.now()],
'Type':['A','A','A','B','A','B'],
'ConditionA':[True,False,True,True,True,True]})
df['Time shift'] = pd.Series(pd.NaT,index=df.index)
df['Time diff'] = pd.Series(pd.NaT,index=df.index)
for name, group in df.groupby(['Type','ConditionA']):
df.loc[group.index,'Time shift'] = group['Time'].shift(periods=1) # previous time for each group
结果类似于您的示例数据帧:
Time Type ConditionA Time shift Time diff
0 2020-03-07 22:38:47.710763 A True NaT NaT
1 2020-03-07 22:38:47.710768 A False NaT NaT
2 2020-03-07 22:38:47.710769 A True 2020-03-07 22:38:47.710763008 NaT
3 2020-03-07 22:38:47.710769 B True NaT NaT
4 2020-03-07 22:38:47.710770 A True 2020-03-07 22:38:47.710768896 NaT
5 2020-03-07 22:38:47.710771 B True 2020-03-07 22:38:47.710768896 NaT
然后,对于函数,再次使用groupby:
for name, group in df.groupby(['Type','ConditionA']):
if name[1]: # If CondiditionA is True
mask = group[(group['Time'] - group['Time shift']) < timedelta(seconds=10)].index #Row within 10s of their previous one
df.loc[mask,'Time diff'] = df.loc[mask,'Time'] - df.loc[mask,'Time shift']
df.groupby(['Type','ConditionA'])中的组名称:
如果名称[1]:#如果条件a为真
掩码=组[(组['Time']-组['Time shift'])