Python 如何使用Groupby函数查找数据帧中的最高值
我有以下数据集。我想找出哪一个季度为研究中使用的每个应用程序生成的安装数量最高?Python 如何使用Groupby函数查找数据帧中的最高值,python,pandas,dataframe,Python,Pandas,Dataframe,我有以下数据集。我想找出哪一个季度为研究中使用的每个应用程序生成的安装数量最高? Installs CR Month Year Category 0 10000 Everyone January 2018 ART_AND_DESIGN 1 500000 Everyone January 2018 ART_AND_DESIGN 2 50000
Installs CR Month Year Category
0 10000 Everyone January 2018 ART_AND_DESIGN
1 500000 Everyone January 2018 ART_AND_DESIGN
2 5000000 Everyone August 2018 ART_AND_DESIGN
3 50000000 Teen June 2018 ART_AND_DESIGN
4 100000 Everyone June 2018 ART_AND_DESIGN
... ... ... ... ...
10836 5000 Everyone July 2017 FAMILY
10837 100 Everyone July 2018 FAMILY
10838 1000 Everyone January 2017 MEDICAL
10839 1000 Mature 17+ January 2015 BOOKS_AND_REFERENCE
10840 10000000 Everyone July 2018 LIFESTYLE
如果需要每个季度的最大值和
类别
使用:
q = (pd.to_datetime(df['Month'] + df['Year'].astype(str), format='%B%Y')
.dt.to_period('Q').rename('Quarter'))
df = df.groupby([q,'Category'])['Installs'].max().reset_index()
print (df)
Quarter Category Installs
0 2015Q1 BOOKS_AND_REFERENCE 1000
1 2017Q1 MEDICAL 1000
2 2017Q3 FAMILY 5000
3 2018Q1 ART_AND_DESIGN 500000
4 2018Q2 ART_AND_DESIGN 50000000
5 2018Q3 ART_AND_DESIGN 5000000
6 2018Q3 FAMILY 100
或者,如果需要按季度和类别聚合安装
,并获得最多安装
的查询者,则使用:
q = (pd.to_datetime(df['Month'] + df['Year'].astype(str), format='%B%Y')
.dt.to_period('Q').rename('Quarter'))
df1 = df.groupby([q,'Category'])['Installs'].sum().reset_index()
print (df1)
Quarter Category Installs
0 2015Q1 BOOKS_AND_REFERENCE 1000
1 2017Q1 MEDICAL 1000
2 2017Q3 FAMILY 5000
3 2018Q1 ART_AND_DESIGN 510000
4 2018Q2 ART_AND_DESIGN 50100000
5 2018Q3 ART_AND_DESIGN 5000000
6 2018Q3 FAMILY 100
7 2018Q3 LIFESTYLE 10000000
df2 = df1.loc[df1.groupby('Category')['Installs'].idxmax()]
print (df2)
Quarter Category Installs
4 2018Q2 ART_AND_DESIGN 50100000
0 2015Q1 BOOKS_AND_REFERENCE 1000
2 2017Q3 FAMILY 5000
7 2018Q3 LIFESTYLE 10000000
1 2017Q1 MEDICAL 1000
您能否从样本数据中添加预期的输出?