Python 如何在Pandas中分组并保留所有列

Python 如何在Pandas中分组并保留所有列,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我有这样一个数据框: year drug_name avg_number_of_ingredients 0 2019 NEXIUM I.V. 8 1 2016 ZOLADEX 10 2 2017 PRILOSEC 59 3 2017 BYDUREON BCise

我有这样一个数据框:

   year       drug_name  avg_number_of_ingredients
0  2019     NEXIUM I.V.                          8
1  2016         ZOLADEX                         10
2  2017        PRILOSEC                         59
3  2017  BYDUREON BCise                         24
4  2019        Lynparza                         28
   year     drug_name avg_number_of_ingredients
0  2019  drug a,b,c..     mean value for column
1  2018  drug a,b,c..     mean value for column
2  2017  drug a,b,c..     mean value for column
我需要按年份对药物名称和成分的平均数量进行分组,如下所示:

   year       drug_name  avg_number_of_ingredients
0  2019     NEXIUM I.V.                          8
1  2016         ZOLADEX                         10
2  2017        PRILOSEC                         59
3  2017  BYDUREON BCise                         24
4  2019        Lynparza                         28
   year     drug_name avg_number_of_ingredients
0  2019  drug a,b,c..     mean value for column
1  2018  drug a,b,c..     mean value for column
2  2017  drug a,b,c..     mean value for column

如果我做了
df.groupby('year')
,我会丢失药物名称。我该怎么做呢?

让我向您展示这个简单示例的解决方案。首先,我制作了与您相同的数据帧:

>>> df = pd.DataFrame(
    [
        {'year': 2019, 'drug_name': 'NEXIUM I.V.', 'avg_number_of_ingredients': 8},
        {'year': 2016, 'drug_name': 'ZOLADEX', 'avg_number_of_ingredients': 10},
        {'year': 2017, 'drug_name': 'PRILOSEC', 'avg_number_of_ingredients': 59},
        {'year': 2017, 'drug_name': 'BYDUREON BCise', 'avg_number_of_ingredients': 24},
        {'year': 2019, 'drug_name': 'Lynparza', 'avg_number_of_ingredients': 28},
    ]
)
>>> print(df)
   year       drug_name  avg_number_of_ingredients
0  2019     NEXIUM I.V.                          8
1  2016         ZOLADEX                         10
2  2017        PRILOSEC                         59
3  2017  BYDUREON BCise                         24
4  2019        Lynparza                         28
现在,我制作了一个
df_grouped
,它仍然包含关于药物名称的信息

>>> df_grouped = df.groupby('year', as_index=False).agg({'drug_name': ', '.join, 'avg_number_of_ingredients': 'mean'})
>>> print(df_grouped)
   year                 drug_name  avg_number_of_ingredients
0  2016                   ZOLADEX                       10.0
1  2017  PRILOSEC, BYDUREON BCise                       41.5
2  2019     NEXIUM I.V., Lynparza                       18.0
df.groupby('year',as_index=False).agg({'drugh_name':','join,'avg_number_of_components':'mean'})