Python 熊猫:每组只显示一次总数
我有以下数据:Python 熊猫:每组只显示一次总数,python,pandas,Python,Pandas,我有以下数据: 说明卡会员费 “苹果”“亚当”2 “苹果”“亚当”2 “梨”“鲍勃”7 “橙色”“爱丽丝”8 “橙色”“爱丽丝”8 “橙色”“爱丽丝”8 我正在尝试添加如下所示的总计列: 说明卡会员成本**总计** “苹果”“亚当”2 “苹果”“亚当”2 4 “梨”“鲍勃”7 “橙色”“爱丽丝”8 “橙色”“爱丽丝”8 “橙色”“爱丽丝”8 24 我尝试使用df[“Total”]=df.groupby('Card Member')['Cost'].transform('sum') 虽然它会在
说明卡会员费
“苹果”“亚当”2
“苹果”“亚当”2
“梨”“鲍勃”7
“橙色”“爱丽丝”8
“橙色”“爱丽丝”8
“橙色”“爱丽丝”8
我正在尝试添加如下所示的总计列:
说明卡会员成本**总计**
“苹果”“亚当”2
“苹果”“亚当”2 4
“梨”“鲍勃”7
“橙色”“爱丽丝”8
“橙色”“爱丽丝”8
“橙色”“爱丽丝”8 24
我尝试使用df[“Total”]=df.groupby('Card Member')['Cost'].transform('sum')
虽然它会在每一行之后生成总计,但我只希望总计在每行每个成员的末尾显示一次
这就是它产生的结果:
说明卡会员成本**总计**
“苹果”“亚当”2 4
“苹果”“亚当”2 4
“梨”“鲍勃”7
“橙色”“爱丽丝”8 24
“橙色”“爱丽丝”8 24
“橙色”“爱丽丝”8 24
正如你所看到的,总数被一次又一次地重复,这使得我的数据不那么可读。我只希望总行值显示一次,然后在每个成员的末尾显示,而不是让它们不断重复出现
我正在考虑循环并删除不等于下一次迭代的值,但如果不同成员的总数相同,这将导致问题。您可以使用重复的
提取最后一行:
s = ~df.duplicated(['Description','CardMember'], keep='last')
df.loc[s,'total'] = df.groupby(['Description', 'CardMember'], sort=False)['Cost'].transform('sum')
输出:
Description CardMember Cost total
0 "apple" "adam" 2 NaN
1 "apple" "adam" 2 4.0
2 "pear" "bob" 7 7.0
3 "orange" "alice" 8 NaN
4 "orange" "alice" 8 NaN
5 "orange" "alice" 8 24.0
您可以使用重复的提取最后一行
:
s = ~df.duplicated(['Description','CardMember'], keep='last')
df.loc[s,'total'] = df.groupby(['Description', 'CardMember'], sort=False)['Cost'].transform('sum')
输出:
Description CardMember Cost total
0 "apple" "adam" 2 NaN
1 "apple" "adam" 2 4.0
2 "pear" "bob" 7 7.0
3 "orange" "alice" 8 NaN
4 "orange" "alice" 8 NaN
5 "orange" "alice" 8 24.0
这应该行得通
df["total"] = 0
for name in df["Card Member"].unique():
df_sel = df[df["Card Memebr"] == name]
df_sel.iloc[len(df_sel) - 1, 4] = df_sel["Cost"].sum()
df[df["Card Member"] == name] = df_sel
这应该行得通
df["total"] = 0
for name in df["Card Member"].unique():
df_sel = df[df["Card Memebr"] == name]
df_sel.iloc[len(df_sel) - 1, 4] = df_sel["Cost"].sum()
df[df["Card Member"] == name] = df_sel
您可以在此处将参数设置为last
s = df.groupby('Card Member')['Cost'].transform('sum')
df.assign(total = s.mask(s.duplicated(keep = 'last'))
Desc mem cost total
0 "apple" "adam" 2 NaN
1 "apple" "adam" 2 4.0
2 "pear" "bob" 7 7.0
3 "orange" "alice" 8 NaN
4 "orange" "alice" 8 NaN
5 "orange" "alice" 8 24.0
您可以在此处将参数设置为last
s = df.groupby('Card Member')['Cost'].transform('sum')
df.assign(total = s.mask(s.duplicated(keep = 'last'))
Desc mem cost total
0 "apple" "adam" 2 NaN
1 "apple" "adam" 2 4.0
2 "pear" "bob" 7 7.0
3 "orange" "alice" 8 NaN
4 "orange" "alice" 8 NaN
5 "orange" "alice" 8 24.0
np.其中
版本:
df["Total"] = np.where(~df['Card Member'].duplicated('last'),
df.groupby('Card Member')['Cost'].transform('sum'),
None)
df['Description'].duplicated('last')
将每个复制组的最后一个值标记为False
,因此~df['Description'].duplicated('last')
可用于将这些值标记为True
,并且仅在这些行中输入您的groupby
计算。np
df["Total"] = np.where(~df['Card Member'].duplicated('last'),
df.groupby('Card Member')['Cost'].transform('sum'),
None)
df['Description'].duplicated('last')
将每个复制组的最后一个值标记为False
,因此~df['Description'].duplicated('last')
可用于将这些值标记为True
,并且仅在这些行中输入您的groupby
计算值。让我们用apply
s=df.groupby(['Description','Card'],as_index=False).MemberCost.apply(lambda x : pd.Series(x.sum(),index=[x.index[-1]])).reset_index(level=0,drop=True)
df['New']=s
df
Out[103]:
Description Card MemberCost New
0 "apple" "adam" 2 NaN
1 "apple" "adam" 2 4.0
2 "pear" "bob" 7 7.0
3 "orange" "alice" 8 NaN
4 "orange" "alice" 8 NaN
5 "orange" "alice" 8 24.0
让我们用apply
s=df.groupby(['Description','Card'],as_index=False).MemberCost.apply(lambda x : pd.Series(x.sum(),index=[x.index[-1]])).reset_index(level=0,drop=True)
df['New']=s
df
Out[103]:
Description Card MemberCost New
0 "apple" "adam" 2 NaN
1 "apple" "adam" 2 4.0
2 "pear" "bob" 7 7.0
3 "orange" "alice" 8 NaN
4 "orange" "alice" 8 NaN
5 "orange" "alice" 8 24.0