Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/325.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/mysql/65.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在python中向groupby中的聚合添加函数?_Python_Pandas_Group By_Aggregation - Fatal编程技术网

如何在python中向groupby中的聚合添加函数?

如何在python中向groupby中的聚合添加函数?,python,pandas,group-by,aggregation,Python,Pandas,Group By,Aggregation,我试图通过在聚合之间进行额外的数学运算来获取groupby统计数据 我试过了 ...agg({ 'id':"count", 'repair':"count", ('repair':"count")/('id':"count") }) 分组后,我可以通过 gr['repair']/gr['id']*100 如何在groupby中获得这种类型的计算?考虑一个返回聚合数据集的自定义函数: def agg_func(g): g['id'] = g['id'].count() g['r

我试图通过在聚合之间进行额外的数学运算来获取groupby统计数据

我试过了

...agg({
'id':"count",
'repair':"count",
('repair':"count")/('id':"count")
})
分组后,我可以通过

gr['repair']/gr['id']*100

如何在groupby中获得这种类型的计算?

考虑一个返回聚合数据集的自定义函数:

def agg_func(g):
    g['id'] = g['id'].count()
    g['repair'] = g['repair'].count()
    g['repair_per_id'] = (g['repair'] / g['id']) * 100

    return g.aggregate('max')   # CAN ALSO USE: min, max, mean, median, mode 

agg_df = (df.groupby(['group'])
            .apply(agg_func)
            .reset_index(drop=True)
         )

要使用种子随机数据进行演示:

import numpy as np
import pandas as pd

data_tools = ['sas', 'stata', 'spss', 'python', 'r', 'julia']

np.random.seed(8192019)
random_df = pd.DataFrame({'group': np.random.choice(data_tools, 500),
                          'id': np.random.randint(1, 10, 500),
                          'repair': np.random.uniform(0, 100, 500)
                         })

# RANDOMLY ASSIGN NANs
random_df['repair'].loc[np.random.choice(random_df.index, 75)] = np.nan

# RUN AGGREGATIONS
agg_df = (random_df.groupby(['group'])
                   .apply(agg_func)
                   .reset_index(drop=True)
         )

print(agg_df)

#     group  id  repair  repair_per_id
# 0   julia  79      70      88.607595
# 1  python  89      74      83.146067
# 2       r  82      69      84.146341
# 3     sas  74      66      89.189189
# 4    spss  77      69      89.610390
# 5   stata  99      84      84.848485

您能否生成一个自包含的数据集,以及您的起点和预期输出?将
2015
随机显示在输出中,而将2016添加到其他行中,只会增加混乱。请按照@ALollz说明操作。不确定这是否是您想要的,但我认为
apply
可以在某种程度上帮助您。更多信息请查看此项:2015年是一些残留的打字错误(已编辑)。我检查了链接,看起来很像。利用这一点来解决@ALollz问题;在该链接中,您可以看到操作sample\u data\u group=sample\u data.groupby(['date','part','receive']),我如何在聚合内部执行操作,但使用聚合结果…非常感谢,先生,虽然我感谢了,但显然忘了写:)再次感谢。
def agg_func(g):
    g['id'] = g['id'].count()
    g['repair'] = g['repair'].count()
    g['repair_per_id'] = (g['repair'] / g['id']) * 100

    return g.aggregate('max')   # CAN ALSO USE: min, max, mean, median, mode 

agg_df = (df.groupby(['group'])
            .apply(agg_func)
            .reset_index(drop=True)
         )
import numpy as np
import pandas as pd

data_tools = ['sas', 'stata', 'spss', 'python', 'r', 'julia']

np.random.seed(8192019)
random_df = pd.DataFrame({'group': np.random.choice(data_tools, 500),
                          'id': np.random.randint(1, 10, 500),
                          'repair': np.random.uniform(0, 100, 500)
                         })

# RANDOMLY ASSIGN NANs
random_df['repair'].loc[np.random.choice(random_df.index, 75)] = np.nan

# RUN AGGREGATIONS
agg_df = (random_df.groupby(['group'])
                   .apply(agg_func)
                   .reset_index(drop=True)
         )

print(agg_df)

#     group  id  repair  repair_per_id
# 0   julia  79      70      88.607595
# 1  python  89      74      83.146067
# 2       r  82      69      84.146341
# 3     sas  74      66      89.189189
# 4    spss  77      69      89.610390
# 5   stata  99      84      84.848485