Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何根据函数中给定的列值对列进行分组和排序_Python_Pandas_Dataframe_Pandas Groupby - Fatal编程技术网

Python 如何根据函数中给定的列值对列进行分组和排序

Python 如何根据函数中给定的列值对列进行分组和排序,python,pandas,dataframe,pandas-groupby,Python,Pandas,Dataframe,Pandas Groupby,我有一个数据框,如下所示,我需要编写一个函数,该函数应能给出以下结果: 输入参数: 国家,例如“印度” 年龄,例如“学生” 我的输入数据框如下所示: Card Name Country Age Code Amount 0 AAA INDIA Young House 100 1 AAA Australia Old Hardware 200 2 AAA

我有一个数据框,如下所示,我需要编写一个函数,该函数应能给出以下结果:

输入参数:

  • 国家,例如
    “印度”
  • 年龄,例如
    “学生”
  • 我的输入数据框如下所示:

       Card Name    Country      Age         Code  Amount
    0        AAA      INDIA    Young        House     100
    1        AAA  Australia      Old     Hardware     200
    2        AAA      INDIA  Student        House     300
    3        AAA         US    Young     Hardware     600
    4        AAA      INDIA  Student  Electricity     200
    5        BBB  Australia    Young  Electricity     100
    6        BBB      INDIA  Student  Electricity     200
    7        BBB  Australia    Young        House     450
    8        BBB      INDIA      Old        House     150
    9        CCC  Australia      Old     Hardware     200
    10       CCC  Australia    Young        House     350
    11       CCC      INDIA      Old  Electricity     400
    12       CCC         US    Young        House     200
    
    预期的产出将是

              Code  Total Amount  Frequency  Average
    0  Electricity           400          2      200
    1        House           300          1      300
    
    前10名(在我们的例子中,我们只能根据金额的总和获得给定国家(=印度)和年龄(=学生)的前2名代码。此外,它还应给出一个新的列“频率”,该列将计算该组中的记录数,“平均值”列将是总的总和/频率

    我试过了

    df.groupby(['Country','Age','Code']).agg({'Amount': sum})['Amount'].groupby(level=0, group_keys=False).nlargest(10)
    
    产生

    Country    Age      Code       
    Australia  Young    House          800
               Old      Hardware       400
               Young    Electricity    100
    INDIA      Old      Electricity    400
               Student  Electricity    400
                        House          300
               Old      House          150
               Young    House          100
    US         Young    Hardware       600
                        House          200
    Name: Amount, dtype: int64
    
    不幸的是,这与预期的输出不同。

    >>> df                                                                                                                 
       Card Name    Country      Age         Code  Amount
    0        AAA      INDIA    Young        House     100
    1        AAA  Australia      Old     Hardware     200
    2        AAA      INDIA  Student        House     300
    3        AAA         US    Young     Hardware     600
    4        AAA      INDIA  Student  Electricity     200
    5        BBB  Australia    Young  Electricity     100
    6        BBB      INDIA  Student  Electricity     200
    7        BBB  Australia    Young        House     450
    8        BBB      INDIA      Old        House     150
    9        CCC  Australia      Old     Hardware     200
    10       CCC  Australia    Young        House     350
    11       CCC      INDIA      Old  Electricity     400
    12       CCC         US    Young        House     200
    
    您可以先过滤数据帧:

    >>> country = 'INDIA'                                                                                                  
    >>> age = 'Student'                                                                                                    
    >>> tmp = df[df.Country.eq(country) & df.Age.eq(age)].loc[:, ['Code', 'Amount']]                                       
    >>> tmp                                                                                                                
              Code  Amount
    2        House     300
    4  Electricity     200
    6  Electricity     200
    
    。。。然后分组:

    >>> result = tmp.groupby('Code')['Amount'].agg([['Total Amount', 'sum'], ['Frequency', 'size'], ['Average', 'mean']]).reset_index() 
    >>> result                             
              Code  Total Amount  Frequency  Average
    0  Electricity           400          2      200
    1        House           300          1      300
    
    如果我正确理解了总金额的过滤标准,那么您可以发出

    result.nlargest(10, 'Total Amount')
    

    你能和我分享一些你已经尝试过的方法吗?@timegeb,再次感谢你告诉我这个过程。现在我已经做到了。