Python 如何基于datetime之间的差异合并dataframe中的组行?
我有一个Python 如何基于datetime之间的差异合并dataframe中的组行?,python,pandas,date,pandas-groupby,Python,Pandas,Date,Pandas Groupby,我有一个dataframe,每行包含事件,有Start和Enddatatime import pandas as pd import datetime df = pd.DataFrame({ 'Value' : [1.,2.,3.], 'Start' : [datetime.datetime(2017,1,1,0,0,0),datetime.datetime(2017,1,1,0,1,0),datetime.datetime(2017,1,1,0,4,0)], 'End' : [dateti
dataframe
,每行包含事件,有Start
和End
datatime
import pandas as pd
import datetime
df = pd.DataFrame({ 'Value' : [1.,2.,3.],
'Start' : [datetime.datetime(2017,1,1,0,0,0),datetime.datetime(2017,1,1,0,1,0),datetime.datetime(2017,1,1,0,4,0)],
'End' : [datetime.datetime(2017,1,1,0,0,59),datetime.datetime(2017,1,1,0,5,0),datetime.datetime(2017,1,1,0,6,00)]},
index=[0,1,2])
df
Out[7]:
End Start Value
0 2017-01-01 00:00:59 2017-01-01 00:00:00 1.0
1 2017-01-01 00:05:00 2017-01-01 00:01:00 2.0
2 2017-01-01 00:07:00 2017-01-01 00:06:00 3.0
我想对连续行进行分组,其中连续行的End
和Start
之间的差异小于给定的timedelta
。
e、 g.在这里,对于5秒的时间增量,我希望将索引为0,1
的行分组,如果时间增量为2分钟,则应生成行0,1,2
解决方案是使用.shift()
将连续行与其移位版本进行比较,但是,如果需要合并两行以上的组,则需要多次迭代比较
由于我的df非常大,这不是一个选项。我假设您尝试根据时差进行聚合
marker = 60
df = df.assign(diff=df.apply(lambda row:(row.End - row.Start).total_seconds() <= marker, axis=1))
for g in df.groupby('diff'):
print g[1]
End Start Value diff
1 2017-01-01 00:05:00 2017-01-01 00:01:00 2.0 False
2 2017-01-01 00:06:00 2017-01-01 00:04:00 3.0 False
End Start Value diff
0 2017-01-01 00:00:59 2017-01-01 1.0 True
marker=60
df=df.assign(diff=df.apply(lambda row:(row.End-row.Start).total_seconds()我假设您尝试根据时差进行聚合
marker = 60
df = df.assign(diff=df.apply(lambda row:(row.End - row.Start).total_seconds() <= marker, axis=1))
for g in df.groupby('diff'):
print g[1]
End Start Value diff
1 2017-01-01 00:05:00 2017-01-01 00:01:00 2.0 False
2 2017-01-01 00:06:00 2017-01-01 00:04:00 3.0 False
End Start Value diff
0 2017-01-01 00:00:59 2017-01-01 1.0 True
marker=60
df=df.assign(diff=df.apply(lambda row:(row.End-row.Start).total_seconds()threshold=datetime.timedelta(分钟=5)
df['delta']=df['End']-df['Start']
df['group']=(df['delta']-df['delta'].shift(-1)threshold=datetime.timedelta(分钟=5)
df['delta']=df['End']-df['Start']
df['group']=(df['delta']-df['delta'].shift(-1)可能只是我的问题,但我不知道你在寻找什么样的输出来显示你的预期输出。/\u \。抱歉,伙计们,迟到了:(我正在添加所需的输出,可能只是我,但我不知道你在寻找什么样的输出来显示你的预期输出。/\u \。抱歉,伙计们,迟到了:(我正在添加所需的输出