Python 熊猫-添加聚合特征
我在熊猫中有这个数据帧:Python 熊猫-添加聚合特征,python,pandas,Python,Pandas,我在熊猫中有这个数据帧: day customer amount 0 1 cust1 500 1 2 cust2 100 2 1 cust1 50 3 2 cust1 100 4 2 cust2 250 5 6 cust1 20 我想创建一个新的列“amount2days”,以便增加过去两天每位客户的金额,以获得以下数据框: day customer am
day customer amount
0 1 cust1 500
1 2 cust2 100
2 1 cust1 50
3 2 cust1 100
4 2 cust2 250
5 6 cust1 20
我想创建一个新的列“amount2days”,以便增加过去两天每位客户的金额,以获得以下数据框:
day customer amount amount2days ----------------------------
0 1 cust1 500 500 (no past transactions)
1 2 cust2 100 100 (no past transactions)
2 1 cust1 50 550 (500 + 50 = rows 0,2
3 2 cust1 100 650 (500 + 50 + 100, rows 0,2,3)
4 2 cust2 250 350 (100 + 250, rows 1,4)
5 6 cust1 20 20 (notice day is 6, and no day=5 for cust1)
i、 e.我想执行以下(伪)代码:
每行。最方便的方法是什么
我希望进行的求和是在一天内完成的,但天不一定要在每一行中递增,如示例所示。我仍然想计算过去两天的金额。我认为这只是在几天内滚动:
def get_roll(x):
s = pd.Series(x['amount'].values,
index=pd.to_datetime('1900-01-01') + pd.to_timedelta(x['day'], unit='D')
)
return pd.Series(s.rolling('2D').sum().values, index=x.index)
df['amount2days'] = (df.groupby('customer').apply(get_roll)
.reset_index(level=0, drop=True)
)
输出:
day customer amount amount2days
1 1 cust1 500 500.0
2 1 cust2 100 100.0
3 1 cust1 50 550.0
4 2 cust1 100 650.0
5 2 cust2 250 350.0
6 3 cust1 20 120.0
选项2:由于您希望在两天内获得累计金额,因此今天的金额仅与前一天的金额相加。因此,我们可以利用
shift
:
df['amount2days'] = df.groupby(['customer','day'])['amount'].cumsum()
# shift the last item of the previous day and add
df['amount2days'] += (df.drop_duplicates(['day','customer'],keep='last')
.groupby(['customer'])['amount2days'].shift()
.reindex(df.index)
.ffill()
.fillna(0)
)
不幸的是,请注意,第4行中的总和需要为650(第1天和第2天的总和为500+50+100是
day
只是数字还是datetime
类型?您可以选择滚动('2D')
ondatetime
类型。day
只是一个number@user112112请参阅更新的答案,了解使用按日滚动的修改版本。
df['amount2days'] = df.groupby(['customer','day'])['amount'].cumsum()
# shift the last item of the previous day and add
df['amount2days'] += (df.drop_duplicates(['day','customer'],keep='last')
.groupby(['customer'])['amount2days'].shift()
.reindex(df.index)
.ffill()
.fillna(0)
)