Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/360.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/sorting/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 按组聚合排序_Python_Sorting_Numpy_Pandas - Fatal编程技术网

Python 按组聚合排序

Python 按组聚合排序,python,sorting,numpy,pandas,Python,Sorting,Numpy,Pandas,我已经看到了,但预期的结果与我的略有不同 设想一个数据帧分组如下: df.groupby(['product_name', 'usage_type']).total_cost.sum() product_name usage_type Lorem A 30.694665 B 0.000634 C 1.659360

我已经看到了,但预期的结果与我的略有不同

设想一个数据帧分组如下:

df.groupby(['product_name', 'usage_type']).total_cost.sum()

product_name   usage_type
Lorem          A               30.694665
               B                0.000634
               C                1.659360
               D                0.000031
               E             3339.140042
               F                0.074340
Ipsum          G                9.627360
               A               19.053377
               D               14.492155
Dolor          B                9.698245
               H             6993.792163
               C            31947.955679
               D             2150.400001
               E               26.337789
Name: total_cost, dtype: float6
我想要的输出是相同的结构,但有两个属性:

  • 按成本总和订购产品名称
  • 按字典顺序排列使用类型(另一种选择:按成本降序排列)
  • 这样,成本最高的产品首先出现,但仍然保留了细分


    如果它非常简单,我可以删除按使用类型进行的二级排序。

    从分组数据帧开始:

    import pandas as pd
    df2 = pd.read_table('data', sep='\s+').set_index(['product_name', 'usage_type'])
    #                                   val
    # product_name usage_type              
    # Lorem        A              30.694665
    #              B               0.000634
    #              C               1.659360
    #              D               0.000031
    #              E            3339.140042
    #              F               0.074340
    # Ipsum        G               9.627360
    #              A              19.053377
    #              D              14.492155
    # Dolor        B               9.698245
    #              H            6993.792163
    #              C           31947.955679
    #              D            2150.400001
    #              E              26.337789
    
    您可以将键值存储在新列中:

    df2['key1'] = df2.groupby(level='product_name')['val'].transform('sum')
    df2['key2'] = df2.index.get_level_values('usage_type')
    
    然后按这些关键列进行排序:

    # >>> df2.sort(['key1', 'key2'], ascending=[False,True])
    #                                   val          key1 key2
    # product_name usage_type                                 
    # Dolor        B               9.698245  41128.183877    B
    #              C           31947.955679  41128.183877    C
    #              D            2150.400001  41128.183877    D
    #              E              26.337789  41128.183877    E
    #              H            6993.792163  41128.183877    H
    # Lorem        A              30.694665   3371.569072    A
    #              B               0.000634   3371.569072    B
    #              C               1.659360   3371.569072    C
    #              D               0.000031   3371.569072    D
    #              E            3339.140042   3371.569072    E
    #              F               0.074340   3371.569072    F
    # Ipsum        A              19.053377     43.172892    A
    #              D              14.492155     43.172892    D
    #              G               9.627360     43.172892    G
    
    result = df2.sort(['key1', 'key2'], ascending=[False,True])['val']
    print(result)
    
    屈服

    product_name  usage_type
    Dolor         B                 9.698245
                  C             31947.955679
                  D              2150.400001
                  E                26.337789
                  H              6993.792163
    Lorem         A                30.694665
                  B                 0.000634
                  C                 1.659360
                  D                 0.000031
                  E              3339.140042
                  F                 0.074340
    Ipsum         A                19.053377
                  D                14.492155
                  G                 9.627360