Python 如何在拆分应用合并并在每行重复解决方案中添加条件?
我有以下Python 如何在拆分应用合并并在每行重复解决方案中添加条件?,python,pandas,pandas-groupby,split-apply-combine,Python,Pandas,Pandas Groupby,Split Apply Combine,我有以下pandasdataframedf: cluster tag amount name 1 0 200 Michael 2 1 1200 John 2 1 900 Daniel 2 0 3000 David 2 0 600 Jonny
pandas
dataframedf
:
cluster tag amount name
1 0 200 Michael
2 1 1200 John
2 1 900 Daniel
2 0 3000 David
2 0 600 Jonny
3 0 900 Denisse
3 1 900 Mike
3 1 3000 Kely
3 0 2000 Devon
我需要做的是在df
中添加另一列,为每行写入,即名称(从名称列中)具有最高的金额
,其中标记为1。换句话说,解决方案如下所示:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely
df.group('clusters')['name','amount'].transform('max')[df['tag']==1]
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN
我试过这样的方法:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely
df.group('clusters')['name','amount'].transform('max')[df['tag']==1]
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN
但问题是,该名称在每一行上都会重复。它将如下所示:
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David John
2 0 600 Jonny John
3 0 900 Denisse Kely
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon Kely
df.group('clusters')['name','amount'].transform('max')[df['tag']==1]
cluster tag amount name highest_amount
1 0 200 Michael NaN
2 1 1200 John John
2 1 900 Daniel John
2 0 3000 David NaN
2 0 600 Jonny NaN
3 0 900 Denisse NaN
3 1 900 Mike Kely
3 1 3000 Kely Kely
3 0 2000 Devon NaN
有人能告诉我如何使用split apply combine添加一个条件,并在每行重复该解决方案吗?您可以通过两个阶段来完成此操作。首先计算映射序列,然后按簇映射:
s = df.query('tag == 1')\
.sort_values('amount', ascending=False)\
.drop_duplicates('cluster')\
.set_index('cluster')['name']
df['highest_name'] = df['cluster'].map(s)
print(df)
cluster tag amount name highest_name
0 1 0 200 Michael NaN
1 2 1 1200 John John
2 2 1 900 Daniel John
3 2 0 3000 David John
4 2 0 600 Jonny John
5 3 0 900 Denisse Kely
6 3 1 900 Mike Kely
7 3 1 3000 Kely Kely
8 3 0 2000 Devon Kely
如果要使用groupby
,有一种方法:
def func(x):
names = x.query('tag == 1').sort_values('amount', ascending=False)['name']
return names.iloc[0] if not names.empty else np.nan
df['highest_name'] = df['cluster'].map(df.groupby('cluster').apply(func))
我不确定这是否有效?我无法测试-如帖子所示,这是否有效?@roganjosh是的,它会有效,就像在df.group('clusters').[name','amount']
中使用
应用文件一样?如果是这样,我学到了一些新东西。@roganjosh ups,我的错误,没有“.”。我已经编辑了我的问题,这很有效。然而,我想知道是否有一种方法可以通过分组和转换来实现。你知道这是否可能吗?@callmeGuy,我添加了一个groupby
解决方案,但它不使用transform
。