Python 如何根据函数中给定的列值对列进行分组和排序
我有一个数据框,如下所示,我需要编写一个函数,该函数应能给出以下结果: 输入参数:Python 如何根据函数中给定的列值对列进行分组和排序,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我有一个数据框,如下所示,我需要编写一个函数,该函数应能给出以下结果: 输入参数: 国家,例如“印度” 年龄,例如“学生” 我的输入数据框如下所示: Card Name Country Age Code Amount 0 AAA INDIA Young House 100 1 AAA Australia Old Hardware 200 2 AAA
“印度”
“学生”
Card Name Country Age Code Amount
0 AAA INDIA Young House 100
1 AAA Australia Old Hardware 200
2 AAA INDIA Student House 300
3 AAA US Young Hardware 600
4 AAA INDIA Student Electricity 200
5 BBB Australia Young Electricity 100
6 BBB INDIA Student Electricity 200
7 BBB Australia Young House 450
8 BBB INDIA Old House 150
9 CCC Australia Old Hardware 200
10 CCC Australia Young House 350
11 CCC INDIA Old Electricity 400
12 CCC US Young House 200
预期的产出将是
Code Total Amount Frequency Average
0 Electricity 400 2 200
1 House 300 1 300
前10名(在我们的例子中,我们只能根据金额的总和获得给定国家(=印度)和年龄(=学生)的前2名代码。此外,它还应给出一个新的列“频率”,该列将计算该组中的记录数,“平均值”列将是总的总和/频率
我试过了
df.groupby(['Country','Age','Code']).agg({'Amount': sum})['Amount'].groupby(level=0, group_keys=False).nlargest(10)
产生
Country Age Code
Australia Young House 800
Old Hardware 400
Young Electricity 100
INDIA Old Electricity 400
Student Electricity 400
House 300
Old House 150
Young House 100
US Young Hardware 600
House 200
Name: Amount, dtype: int64
不幸的是,这与预期的输出不同。
>>> df
Card Name Country Age Code Amount
0 AAA INDIA Young House 100
1 AAA Australia Old Hardware 200
2 AAA INDIA Student House 300
3 AAA US Young Hardware 600
4 AAA INDIA Student Electricity 200
5 BBB Australia Young Electricity 100
6 BBB INDIA Student Electricity 200
7 BBB Australia Young House 450
8 BBB INDIA Old House 150
9 CCC Australia Old Hardware 200
10 CCC Australia Young House 350
11 CCC INDIA Old Electricity 400
12 CCC US Young House 200
您可以先过滤数据帧:
>>> country = 'INDIA'
>>> age = 'Student'
>>> tmp = df[df.Country.eq(country) & df.Age.eq(age)].loc[:, ['Code', 'Amount']]
>>> tmp
Code Amount
2 House 300
4 Electricity 200
6 Electricity 200
。。。然后分组:
>>> result = tmp.groupby('Code')['Amount'].agg([['Total Amount', 'sum'], ['Frequency', 'size'], ['Average', 'mean']]).reset_index()
>>> result
Code Total Amount Frequency Average
0 Electricity 400 2 200
1 House 300 1 300
如果我正确理解了总金额的过滤标准,那么您可以发出
result.nlargest(10, 'Total Amount')
你能和我分享一些你已经尝试过的方法吗?@timegeb,再次感谢你告诉我这个过程。现在我已经做到了。