Python 按组对数据帧进行排序并保持所需顺序_Python_Pandas_Sorting_Dataframe

Python 按组对数据帧进行排序并保持所需顺序

python pandas sorting dataframe

Python 按组对数据帧进行排序并保持所需顺序,python,pandas,sorting,dataframe,Python,Pandas,Sorting,Dataframe,我有一个如下所示的数据帧 df = pd.DataFrame({ "Junk":list("aaaaaabbbcccc"), "Region":['West','West','West','West','East','East','East','South','South','South','North','North','North'], "Sales":[1, 3, 4, 2, 4, 2, 5, 7, 9, 7, 5, 9, 5] }) +------+------

我有一个如下所示的数据帧

df = pd.DataFrame({
    "Junk":list("aaaaaabbbcccc"),
    "Region":['West','West','West','West','East','East','East','South','South','South','North','North','North'],
    "Sales":[1, 3, 4, 2, 4, 2, 5, 7, 9, 7, 5, 9, 5]
})

+------+--------+-------+
| Junk | Region | Sales |
+------+--------+-------+
| a    | West   |     1 |
| a    | West   |     3 |
| a    | West   |     4 |
| a    | West   |     2 |
| a    | East   |     4 |
| a    | East   |     2 |
| b    | East   |     5 |
| b    | South  |     7 |
| b    | South  |     9 |
| c    | South  |     7 |
| c    | North  |     5 |
| c    | North  |     9 |
| c    | North  |     5 |
+------+--------+-------+

我试着做两件事

根据每个区域对数据帧进行排序

我能用下面的代码实现它

df.sort_values(by = ['Region','Sales'])


+------+--------+-------+
| Junk | Region | Sales |
+------+--------+-------+
| a    | East   |     2 |
| a    | East   |     4 |
| b    | East   |     5 |
| c    | North  |     5 |
| c    | North  |     5 |
| c    | North  |     9 |
| b    | South  |     7 |
| c    | South  |     7 |
| b    | South  |     9 |
| a    | West   |     1 |
| a    | West   |     2 |
| a    | West   |     3 |
| a    | West   |     4 |
+------+--------+-------+

但是我想保留

Region

列的顺序<代码>西部应该是第一位，然后是东部，然后是南部，然后是北部

期望输出

+--------+----------+---------+
|  Junk  |  Region  |  Sales  |
+--------+----------+---------+
|  a     | West     |       1 |
|  a     | West     |       2 |
|  a     | West     |       3 |
|  a     | West     |       4 |
|  a     | East     |       2 |
|  a     | East     |       4 |
|  b     | East     |       5 |
|  b     | South    |       7 |
|  c     | South    |       7 |
|  b     | South    |       9 |
|  c     | North    |       5 |
|  c     | North    |       5 |
|  c     | North    |       9 |
+--------+----------+---------+

我只想对

Region=East

和

Region=North

进行排序，其余的区域应该是这样的

期望输出：

+--------+----------+---------+
|  Junk  |  Region  |  Sales  |
+--------+----------+---------+
|  a     | West     |       1 |
|  a     | West     |       3 |
|  a     | West     |       4 |
|  a     | West     |       2 |
|  a     | East     |       2 |
|  a     | East     |       4 |
|  b     | East     |       5 |
|  b     | South    |       7 |
|  b     | South    |       9 |
|  c     | South    |       7 |
|  c     | North    |       5 |
|  c     | North    |       5 |
|  c     | North    |       9 |
+--------+----------+---------+

将

西部

、

东部

、

南部

和

北部

映射到0,1,2,3

>>> my_order = ['West','East','South','North']
>>> order = {key: i for i, key in enumerate(my_order)}
>>> order
{'West': 0, 'East': 1, 'South': 2, 'North': 3}

并使用排序键的映射：

>>> df.iloc[df['Region'].map(order).sort_values().index]

先创建列，然后进行排序：

order = ['West', 'East', 'South', 'North']
df['Region'] = pd.CategoricalIndex(df['Region'], ordered=True, categories=order)

df = df.sort_values(by = ['Region','Sales'])
print (df)
   Junk Region  Sales
0     a   West      1
3     a   West      2
1     a   West      3
2     a   West      4
5     a   East      2
4     a   East      4
6     b   East      5
7     b  South      7
9     c  South      7
8     b  South      9
10    c  North      5
12    c  North      5
11    c  North      9

使用

map

by dictionary的解决方案，使用create new column、order和remove helper column：

order = {'West':1, 'East':2, 'South':3, 'North':4}

df = df.assign(tmp=df['Region'].map(order)).sort_values(by = ['tmp','Sales']).drop('tmp', 1)
print (df)
   Junk Region  Sales
6     a   West      1
0     a   West      2
7     a   West      3
8     a   West      4
2     a   East      2
1     a   East      4
3     b   East      5
4     b  South      7
9     c  South      7
5     b  South      9
10    c  North      5
12    c  North      5
11    c  North      9

对于第二种情况，需要按筛选行进行排序，但要防止数据对齐，请指定numpy数组：

order = ['West', 'East', 'South', 'North']
df['Region'] = pd.CategoricalIndex(df['Region'], ordered=True, categories=order)

mask = df['Region'].isin(['North', 'East'])
df[mask] = df[mask].sort_values(['Region','Sales']).values
print (df)
   Junk Region  Sales
0     a   West      1
1     a   West      3
2     a   West      4
3     a   West      2
4     a   East      2
5     a   East      4
6     b   East      5
7     b  South      7
8     b  South      9
9     c  South      7
10    c  North      5
11    c  North      5
12    c  North      9

map

备选方案：

order = {'East':1, 'North':2}
df = df.assign(tmp=df['Region'].map(order))

mask = df['Region'].isin(['North', 'East'])
df[mask] = df[mask].sort_values(['tmp','Sales']).values
df = df.drop('tmp', axis=1)

您可以使用

groupby

并利用

sort

参数。然后将

应用

和

排序\u值

与条件一起使用：

sort_regions = ['North', 'East']
df.groupby('Region', sort=False).apply(
    lambda x: x.sort_values('Sales')
    if x['Region'].iloc[0] in sort_regions
    else x
).reset_index(drop=True)

输出：

   Junk Region  Sales
0     a   West      1
1     a   West      3
2     a   West      4
3     a   West      2
4     a   East      2
5     a   East      4
6     b   East      5
7     b  South      7
8     b  South      9
9     c  South      7
10    c  North      5
11    c  North      5
12    c  North      9

那么，最终需要的输出是什么呢？

df.sort_值（按=['Region'，'Sales']）。loc['West'，'East'，'South'，'North']，：]

将以您想要的方式给您排序，对于您的第2点，我认为您可以做

df[（df.Region='East'）&（df.Region='nor'North'）]…排序_值（按=['Region'，'Sales']））

但如果您想保留数据框，请将其分开，然后附加两个数据框。请让我知道它是否适用于您，然后我会将完整的数据框作为answer@PuneetSinha它告诉我，

“没有['West'、'East'、'South'、'North']]在[index]中”

错误尝试此df.set_索引（[“Region”]）@Rookie_123我得到的数据帧与我的输入相同dataframe@Rookie_123我的顺序=[‘西’、‘东’、‘南’、‘北’]