Python 按组计算日期差,第一行考虑初年的第一天。
我有一个数据框架,我需要按类型和年份分组,以计算组内的日期差异 输入Python 按组计算日期差,第一行考虑初年的第一天。,python,dataframe,datetime,group-by,Python,Dataframe,Datetime,Group By,我有一个数据框架,我需要按类型和年份分组,以计算组内的日期差异 输入 我已经尝试了下面的解决方案,它给了我DIFF的行,但是我想考虑第一组中的第一个日期在组 df1['date'] = pd.to_datetime(df1['date']) df1['DateDiff'] = df1.groupby(['type','year']).date.diff().fillna(0) 我能够实现以下输出,这并不能解决我需要计算第一行值与一年中第一天的差值的需求 type date
我已经尝试了下面的解决方案,它给了我DIFF的行,但是我想考虑第一组中的第一个日期在组
df1['date'] = pd.to_datetime(df1['date'])
df1['DateDiff'] = df1.groupby(['type','year']).date.diff().fillna(0)
我能够实现以下输出,这并不能解决我需要计算第一行值与一年中第一天的差值的需求
type date year DateDiff
0 type1 2017-03-30 2017 0 days
1 type1 2017-05-10 2017 41 days
2 type1 2017-12-15 2017 219 days
3 type1 2018-01-15 2018 0 days
4 type1 2018-05-01 2018 106 days
5 type3 2018-01-30 2018 0 days
6 type3 2018-06-27 2018 148 days
7 type3 2019-03-20 2019 0 days
8 type3 2019-05-21 2019 62 days
期望输出为:
type date year DateDiff
0 type1 2017-03-30 2017 88 days ---- (2017 - 01- 01) - (2017 - 03 - 30)
1 type1 2017-05-10 2017 41 days ---- (2017 - 03 - 30) - (2017-05-10)
2 type1 2017-12-15 2017 219 days
3 type1 2018-01-15 2018 14 days ---- (2018-01-01) - (2018-01-15)
4 type1 2018-05-01 2018 106 days
5 type3 2018-01-30 2018 29 days
6 type3 2018-06-27 2018 148 days
7 type3 2019-03-20 2019 78 days
8 type3 2019-05-21 2019 62 days
可能还有更优雅的方式,但请参见以下我的尝试:
import pandas as pd
df = pd.DataFrame({'type': ['type1', 'type1', 'type1','type1','type1','type3','type3','type3','type3'],
'date': ['2017-3-30','2017-5-10','2017-12-15','2018-01-15','2018-05-01','2018-01-30','2018-06-27','2019-03-20','2019-05-21'],
'year': [2017, 2017, 2017,2018,2018,2018,2018,2019,2019]})
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')
result_lst = []
for year_type, sub_df in df.groupby(['year', 'type']):
year, type = year_type
sub_df['shift'] = sub_df['date'].shift(1)
sub_df.loc[sub_df.index[0], 'shift'] = pd.to_datetime(str(year), format='%Y')
sub_df['DateDiff'] = sub_df['date'] - sub_df['shift']
sub_df['year'] = year
sub_df['type'] = type
sub_df = sub_df.drop(columns=['shift'])
result_lst.append(sub_df)
df = pd.concat(result_lst, axis=0)
print(df)
结果与您的目标一致。如果您同意答案,请将其标记为正确,如果遗漏任何内容,请进行评论
import pandas as pd
df = pd.DataFrame({'type': ['type1', 'type1', 'type1','type1','type1','type3','type3','type3','type3'],
'date': ['2017-3-30','2017-5-10','2017-12-15','2018-01-15','2018-05-01','2018-01-30','2018-06-27','2019-03-20','2019-05-21'],
'year': [2017, 2017, 2017,2018,2018,2018,2018,2019,2019]})
df['date'] = pd.to_datetime(df['date'])
df = df.sort_values(by='date')
result_lst = []
for year_type, sub_df in df.groupby(['year', 'type']):
year, type = year_type
sub_df['shift'] = sub_df['date'].shift(1)
sub_df.loc[sub_df.index[0], 'shift'] = pd.to_datetime(str(year), format='%Y')
sub_df['DateDiff'] = sub_df['date'] - sub_df['shift']
sub_df['year'] = year
sub_df['type'] = type
sub_df = sub_df.drop(columns=['shift'])
result_lst.append(sub_df)
df = pd.concat(result_lst, axis=0)
print(df)