Python 熊猫:根据条件在每个组内创建新行
我有一个日期框(df) 它是这样的:Python 熊猫:根据条件在每个组内创建新行,python,pandas,loops,dataframe,group-by,Python,Pandas,Loops,Dataframe,Group By,我有一个日期框(df) 它是这样的: ID From_num To_num Date 0 James 78 96 2020-05-12 1 James 420 78 2020-02-02 2 James Started 420 2019-06-18 3 Max 298 36 2019-06-20 4 Max 36 78 2019-01-
ID From_num To_num Date
0 James 78 96 2020-05-12
1 James 420 78 2020-02-02
2 James Started 420 2019-06-18
3 Max 298 36 2019-06-20
4 Max 36 78 2019-01-30
5 Max 298 36 2018-10-23
6 Max Started 298 2018-08-29
7 Park Started 311 2020-05-21
8 Tom 60 150 2019-11-22
9 Tom 520 520 2019-08-26
10 Tom 99 78 2018-12-11
11 Tom Started 99 2018-10-09
12 Wong Started 39 2019-02-01
对于每个人(组),我希望在每个组的第一行(“ID”)上创建一个新的重复行,“ID”、“From_num”和“to_num”列中创建的行的值应与前一行相同,但“Date”值是旧的第一行的日期加上一天,例如对于James,新创建的行值是:“James”“78”“96”“2020-05-13”,与其余数据相同,因此我的预期结果是:
ID From_num To_num Date
0 James 78 96 2020-05-13 # row added, Date + 1
1 James 78 96 2020-05-12
2 James 420 78 2020-02-02
3 James Started 420 2019-06-18
4 Max 298 36 2019-06-21 # row added, Date + 1
5 Max 298 36 2019-06-20
6 Max 36 78 2019-01-30
7 Max 298 36 2018-10-23
8 Max Started 298 2018-08-29
9 Park Started 311 2020-05-22 # Row added, Date + 1
10 Park Started 311 2020-05-21
11 Tom 60 150 2019-11-23 # Row added, Date + 1
12 Tom 60 150 2019-11-22
13 Tom 520 520 2019-08-26
14 Tom 99 78 2018-12-11
15 Tom Started 99 2018-10-09
16 Wong Started 39 2019-02-02 # Row added Date + 1
17 Wong Started 39 2019-02-01
我希望订单/顺序与我的预期结果相同。如果你有什么好主意,请帮忙。非常感谢使用:
df['Date'] = pd.to_datetime(df['Date'])
df['order'] = df.groupby('ID').cumcount().add(1)
df1 = (
df.groupby('ID', as_index=False).first()
.assign(Date=lambda x: x['Date'] + pd.Timedelta(days=1), order=0)
)
df1 = pd.concat([df, df1]).sort_values(['ID', 'order'], ignore_index=True).drop('order', 1)
详细信息:
df['Date'] = pd.to_datetime(df['Date'])
df['order'] = df.groupby('ID').cumcount().add(1)
df1 = (
df.groupby('ID', as_index=False).first()
.assign(Date=lambda x: x['Date'] + pd.Timedelta(days=1), order=0)
)
df1 = pd.concat([df, df1]).sort_values(['ID', 'order'], ignore_index=True).drop('order', 1)
将Date
列转换为pandasdatetime
系列,并在列ID
上使用,并在数据帧中的每个组中施加总计顺序
print(df)
ID From_num To_num Date order
0 James 78 96 2020-05-13 1
1 James 78 96 2020-05-12 2
2 James 420 78 2020-02-02 3
3 James Started 420 2019-06-18 4
4 Max 298 36 2019-06-21 1
5 Max 298 36 2019-06-20 2
6 Max 36 78 2019-01-30 3
7 Max 298 36 2018-10-23 4
8 Max Started 298 2018-08-29 5
9 Park Started 311 2020-05-22 1
10 Park Started 311 2020-05-21 2
11 Tom 60 150 2019-11-23 1
12 Tom 60 150 2019-11-22 2
13 Tom 520 520 2019-08-26 3
14 Tom 99 78 2018-12-11 4
15 Tom Started 99 2018-10-09 5
16 Wong Started 39 2019-02-02 1
17 Wong Started 39 2019-02-01 2
通过在列ID
上使用创建一个新的数据帧df1
,并使用和分配order=0
进行聚合,并将Date
按1天的天数递增
print(df1)
ID From_num To_num Date order
0 James 78 96 2020-05-14 0 # Date incremented by 1 days
1 Max 298 36 2019-06-22 0 # and ordering added
2 Park Started 311 2020-05-23 0
3 Tom 60 150 2019-11-24 0
4 Wong Started 39 2019-02-03 0
使用concat对数据帧df
和df1
进行排序,并使用对列ID
和order
上的数据帧进行排序
print(df1)
ID From_num To_num Date
0 James 78 96 2020-05-14
1 James 78 96 2020-05-13
2 James 78 96 2020-05-12
3 James 420 78 2020-02-02
4 James Started 420 2019-06-18
5 Max 298 36 2019-06-22
6 Max 298 36 2019-06-21
7 Max 298 36 2019-06-20
8 Max 36 78 2019-01-30
9 Max 298 36 2018-10-23
10 Max Started 298 2018-08-29
11 Park Started 311 2020-05-23
12 Park Started 311 2020-05-22
13 Park Started 311 2020-05-21
14 Tom 60 150 2019-11-24
15 Tom 60 150 2019-11-23
16 Tom 60 150 2019-11-22
17 Tom 520 520 2019-08-26
18 Tom 99 78 2018-12-11
19 Tom Started 99 2018-10-09
20 Wong Started 39 2019-02-03
21 Wong Started 39 2019-02-02
22 Wong Started 39 2019-02-01