Python 按不同级别的数据汇总各组之间的差异
这是我的数据帧Python 按不同级别的数据汇总各组之间的差异,python,python-3.x,pandas,pandas-groupby,Python,Python 3.x,Pandas,Pandas Groupby,这是我的数据帧 data = [[1,'A','a','2020-01-01'], [1,'A','b','2020-01-02'], [1,'B','a','2020-01-03'], [2,'A','a','2020-01-04'], [2,'A','b','2020-01-05'], [2,'A','b','2020-01-06']] df_1 = pd.DataFrame(data = data,columns = ['id','main','s
data = [[1,'A','a','2020-01-01'],
[1,'A','b','2020-01-02'],
[1,'B','a','2020-01-03'],
[2,'A','a','2020-01-04'],
[2,'A','b','2020-01-05'],
[2,'A','b','2020-01-06']]
df_1 = pd.DataFrame(data = data,columns = ['id','main','sub_steps','date'])
df_1['date'] = pd.to_datetime(df_1['date'])
我想按id
列进行分组,并计算Main
或sub\u步骤更改时的时间差
期望结果
id main sub_steps date sub_steps date_main_diff date_subStep_diff
0 1 A a 2020-01-01 [a, b] 0 days 0 days
1 1 A b 2020-01-02 [a, b] 1 days 0 days
2 1 B a 2020-01-03 [a] 0 days 0 days
3 2 A a 2020-01-04 [a, b, b] 0 days 0 days
4 2 A b 2020-01-05 [a, b, b] 1 days 0 days
5 2 A b 2020-01-06 [a, b, b] 2 days 1 days
我只能想出
(df_1.merge(df_1.groupby(['id','Main'])
.agg({'sub_steps':list,
'date': df_1.date - df_1.date.shift(1) })
,on=['id', 'Main']))
它给出了一个错误TypeError:“NaTType”对象不可调用
日期差异列的唯一问题在于我得到了我想要的结果。我们只能使用transform
和diff
df['sub_steps1']=df.groupby(['id','main'])['sub_steps'].transform(lambda x : [x.tolist()]*len(x))
df['date_main_diff']=df.groupby(['id','main']).date.diff().fillna(pd.Timedelta('0 days'))
df['date_main_diff']=df.groupby(['id','main']).date_main_diff.apply(lambda x : x.cumsum())
df['date_subStep_diff']=df.groupby(['id','main','sub_steps']).date.diff().fillna(pd.Timedelta('0 days'))
df['date_subStep_diff']=df.groupby(['id','main','sub_steps']).date_subStep_diff.apply(lambda x : x.cumsum())
df
id main sub_steps date sub_steps1 date_main_diff date_subStep_diff
0 1 A a 2020-01-01 [a, b] 0 days 0 days
1 1 A b 2020-01-02 [a, b] 1 days 0 days
2 1 B a 2020-01-03 [a] 0 days 0 days
3 2 A a 2020-01-04 [a, b, b] 0 days 0 days
4 2 A b 2020-01-05 [a, b, b] 1 days 0 days
5 2 A b 2020-01-06 [a, b, b] 2 days 1 days