基于python中的自定义函数聚合数据帧中的每列_Python_Pandas_Dataframe_Aggregate

基于python中的自定义函数聚合数据帧中的每列

python pandas dataframe

基于python中的自定义函数聚合数据帧中的每列,python,pandas,dataframe,aggregate,Python,Pandas,Dataframe,Aggregate,这是我的数据框： df = [{'id': 1, 'name': 'bob', 'apple': 45, 'grape': 10, 'rate':0}, {'id': 1, 'name': 'bob', 'apple': 45, 'grape': 20, 'rate':0}, {'id': 2, 'name': 'smith', 'apple': 5, 'grape': 30, 'rate':0}, {'id': 2, 'name': 'smith', 'a

这是我的数据框：

df = [{'id': 1, 'name': 'bob', 'apple': 45, 'grape': 10, 'rate':0}, 
      {'id': 1, 'name': 'bob', 'apple': 45, 'grape': 20, 'rate':0},
      {'id': 2, 'name': 'smith', 'apple': 5, 'grape': 30, 'rate':0},
      {'id': 2, 'name': 'smith', 'apple': 10, 'grape': 40, 'rate':0}]

我想：其中apple=apple.sum（）和grape=grape.sum（），rate=grape/apple*100

       id           name     apple    grape   rate
0       1            bob      90       30      300 
1       2           smith     15       70      21.4

我尝试了以下方法：

df = pd.DataFrame(df)
def cal_rate(rate):
    return df['apple'] / df['grape']*100
agg_funcs = {'apple':'sum',
             'grape':'sum',
             'rate' : cal_rate}
df=df.groupby(['id','name').agg(agg_funcs).reset_index()

但是得到了这个结果：

       id           name     apple    grape   rate
0       1            bob      90       30      105 
1       2           smith     15       70      105

你能帮我吗？提前谢谢。

给你：

将熊猫作为pd导入
df=[{'id'：1，'name'：'bob'，'apple'：45，'grape'：10，'rate'：0}，
{'id'：1，'name'：'bob'，'apple'：45，'grape'：20，'rate'：0}，
{'id'：2，'name'：'smith'，'apple'：5，'grape'：30，'rate'：0}，
{'id'：2，'name'：'smith'，'apple'：10，'grape'：40，'rate'：0}]
df=pd.DataFrame（df）
def校准率（组）：
帧=df.loc[组索引]
返回帧['apple'].sum（）/帧['grape'].sum（）*100
agg_funcs={'apple'：'sum'，
‘葡萄’：‘sum’，
“速率”：校准速率}
df=df.groupby（['id'，'name']）.agg（agg_funcs）.reset_index（）
打印（df）

输出

   id   name  apple  grape   rate
0   1    bob     90     30  300.0
1   2  smith     15     70   21.4

你也可以这样做

df = df.groupby(['id', 'name']).agg({'apple':'sum', 'grape':'sum'}).reset_index()
df['rate'] = (df['apple'] / df['grape']) *100

这只是另一种方法

import pandas as pd
df = [{'id': 1, 'name': 'bob', 'apple': 45, 'grape': 10, 'rate':0},
      {'id': 1, 'name': 'bob', 'apple': 45, 'grape': 20, 'rate':0},
      {'id': 2, 'name': 'smith', 'apple': 5, 'grape': 30, 'rate':0},
      {'id': 2, 'name': 'smith', 'apple': 10, 'grape': 40, 'rate':0}]
df = pd.DataFrame(df)
df=df.groupby(['id','name']).sum().reset_index()
df['rate']=round((df['apple'] / df['grape'])*100,1)
print(df)

输出

   id   name  apple  grape   rate
0   1    bob     90     30  300.0
1   2  smith     15     70   21.4

为什么要对

id

列求和？您是否忘记了

reset_index（）

？只需执行

df.groupby（['id'，name'，as_index=False）。sum（）

而不是

agg

。但为什么要在cal_rate函数中将group作为参数传递？我可以用一行或类似的方式来做这个吗<代码>'apple'：'sum'，'grape'：'sum'，'rate'：frame['apple'].sum（）/frame['grape'].sum（）*100@balaji@ahmad你有其他人给你看。我从

agg

上下文中展示了自定义函数的用法。