Python 按组对数据帧进行排序并保持所需顺序

Python 按组对数据帧进行排序并保持所需顺序,python,pandas,sorting,dataframe,Python,Pandas,Sorting,Dataframe,我有一个如下所示的数据帧 df = pd.DataFrame({ "Junk":list("aaaaaabbbcccc"), "Region":['West','West','West','West','East','East','East','South','South','South','North','North','North'], "Sales":[1, 3, 4, 2, 4, 2, 5, 7, 9, 7, 5, 9, 5] }) +------+------

我有一个如下所示的数据帧

df = pd.DataFrame({
    "Junk":list("aaaaaabbbcccc"),
    "Region":['West','West','West','West','East','East','East','South','South','South','North','North','North'],
    "Sales":[1, 3, 4, 2, 4, 2, 5, 7, 9, 7, 5, 9, 5]
})

+------+--------+-------+
| Junk | Region | Sales |
+------+--------+-------+
| a    | West   |     1 |
| a    | West   |     3 |
| a    | West   |     4 |
| a    | West   |     2 |
| a    | East   |     4 |
| a    | East   |     2 |
| b    | East   |     5 |
| b    | South  |     7 |
| b    | South  |     9 |
| c    | South  |     7 |
| c    | North  |     5 |
| c    | North  |     9 |
| c    | North  |     5 |
+------+--------+-------+
我试着做两件事

  • 根据每个区域对数据帧进行排序
  • 我能用下面的代码实现它

    df.sort_values(by = ['Region','Sales'])
    
    
    +------+--------+-------+
    | Junk | Region | Sales |
    +------+--------+-------+
    | a    | East   |     2 |
    | a    | East   |     4 |
    | b    | East   |     5 |
    | c    | North  |     5 |
    | c    | North  |     5 |
    | c    | North  |     9 |
    | b    | South  |     7 |
    | c    | South  |     7 |
    | b    | South  |     9 |
    | a    | West   |     1 |
    | a    | West   |     2 |
    | a    | West   |     3 |
    | a    | West   |     4 |
    +------+--------+-------+
    
    但是我想保留
    Region
    列的顺序<代码>西部应该是第一位,然后是东部,然后是南部,然后是北部

    期望输出

    +--------+----------+---------+
    |  Junk  |  Region  |  Sales  |
    +--------+----------+---------+
    |  a     | West     |       1 |
    |  a     | West     |       2 |
    |  a     | West     |       3 |
    |  a     | West     |       4 |
    |  a     | East     |       2 |
    |  a     | East     |       4 |
    |  b     | East     |       5 |
    |  b     | South    |       7 |
    |  c     | South    |       7 |
    |  b     | South    |       9 |
    |  c     | North    |       5 |
    |  c     | North    |       5 |
    |  c     | North    |       9 |
    +--------+----------+---------+
    
  • 我只想对
    Region=East
    Region=North
    进行排序,其余的区域应该是这样的
  • 期望输出:

    +--------+----------+---------+
    |  Junk  |  Region  |  Sales  |
    +--------+----------+---------+
    |  a     | West     |       1 |
    |  a     | West     |       3 |
    |  a     | West     |       4 |
    |  a     | West     |       2 |
    |  a     | East     |       2 |
    |  a     | East     |       4 |
    |  b     | East     |       5 |
    |  b     | South    |       7 |
    |  b     | South    |       9 |
    |  c     | South    |       7 |
    |  c     | North    |       5 |
    |  c     | North    |       5 |
    |  c     | North    |       9 |
    +--------+----------+---------+
    

    西部
    东部
    南部
    北部
    映射到0,1,2,3

    >>> my_order = ['West','East','South','North']
    >>> order = {key: i for i, key in enumerate(my_order)}
    >>> order
    {'West': 0, 'East': 1, 'South': 2, 'North': 3}
    
    并使用排序键的映射:

    >>> df.iloc[df['Region'].map(order).sort_values().index]
    
    先创建列,然后进行排序:

    order = ['West', 'East', 'South', 'North']
    df['Region'] = pd.CategoricalIndex(df['Region'], ordered=True, categories=order)
    
    df = df.sort_values(by = ['Region','Sales'])
    print (df)
       Junk Region  Sales
    0     a   West      1
    3     a   West      2
    1     a   West      3
    2     a   West      4
    5     a   East      2
    4     a   East      4
    6     b   East      5
    7     b  South      7
    9     c  South      7
    8     b  South      9
    10    c  North      5
    12    c  North      5
    11    c  North      9
    
    使用
    map
    by dictionary的解决方案,使用create new column、order和remove helper column:

    order = {'West':1, 'East':2, 'South':3, 'North':4}
    
    df = df.assign(tmp=df['Region'].map(order)).sort_values(by = ['tmp','Sales']).drop('tmp', 1)
    print (df)
       Junk Region  Sales
    6     a   West      1
    0     a   West      2
    7     a   West      3
    8     a   West      4
    2     a   East      2
    1     a   East      4
    3     b   East      5
    4     b  South      7
    9     c  South      7
    5     b  South      9
    10    c  North      5
    12    c  North      5
    11    c  North      9
    
    对于第二种情况,需要按筛选行进行排序,但要防止数据对齐,请指定numpy数组:

    order = ['West', 'East', 'South', 'North']
    df['Region'] = pd.CategoricalIndex(df['Region'], ordered=True, categories=order)
    
    mask = df['Region'].isin(['North', 'East'])
    df[mask] = df[mask].sort_values(['Region','Sales']).values
    print (df)
       Junk Region  Sales
    0     a   West      1
    1     a   West      3
    2     a   West      4
    3     a   West      2
    4     a   East      2
    5     a   East      4
    6     b   East      5
    7     b  South      7
    8     b  South      9
    9     c  South      7
    10    c  North      5
    11    c  North      5
    12    c  North      9
    
    map
    备选方案:

    order = {'East':1, 'North':2}
    df = df.assign(tmp=df['Region'].map(order))
    
    mask = df['Region'].isin(['North', 'East'])
    df[mask] = df[mask].sort_values(['tmp','Sales']).values
    df = df.drop('tmp', axis=1)
    

    您可以使用
    groupby
    并利用
    sort
    参数。然后将
    应用
    排序\u值
    与条件一起使用:

    sort_regions = ['North', 'East']
    df.groupby('Region', sort=False).apply(
        lambda x: x.sort_values('Sales')
        if x['Region'].iloc[0] in sort_regions
        else x
    ).reset_index(drop=True)
    
    输出:

       Junk Region  Sales
    0     a   West      1
    1     a   West      3
    2     a   West      4
    3     a   West      2
    4     a   East      2
    5     a   East      4
    6     b   East      5
    7     b  South      7
    8     b  South      9
    9     c  South      7
    10    c  North      5
    11    c  North      5
    12    c  North      9
    

    那么,最终需要的输出是什么呢?
    df.sort_值(按=['Region','Sales'])。loc['West','East','South','North'],:]
    将以您想要的方式给您排序,对于您的第2点,我认为您可以做
    df[(df.Region='East')&(df.Region='nor'North')]…排序_值(按=['Region','Sales']))
    但如果您想保留数据框,请将其分开,然后附加两个数据框。请让我知道它是否适用于您,然后我会将完整的数据框作为answer@PuneetSinha它告诉我,
    “没有['West'、'East'、'South'、'North']]在[index]中”
    错误尝试此df.set_索引([“Region”])@Rookie_123我得到的数据帧与我的输入相同dataframe@Rookie_123我的顺序=[‘西’、‘东’、‘南’、‘北’]