Python 按组对数据帧进行排序并保持所需顺序
我有一个如下所示的数据帧Python 按组对数据帧进行排序并保持所需顺序,python,pandas,sorting,dataframe,Python,Pandas,Sorting,Dataframe,我有一个如下所示的数据帧 df = pd.DataFrame({ "Junk":list("aaaaaabbbcccc"), "Region":['West','West','West','West','East','East','East','South','South','South','North','North','North'], "Sales":[1, 3, 4, 2, 4, 2, 5, 7, 9, 7, 5, 9, 5] }) +------+------
df = pd.DataFrame({
"Junk":list("aaaaaabbbcccc"),
"Region":['West','West','West','West','East','East','East','South','South','South','North','North','North'],
"Sales":[1, 3, 4, 2, 4, 2, 5, 7, 9, 7, 5, 9, 5]
})
+------+--------+-------+
| Junk | Region | Sales |
+------+--------+-------+
| a | West | 1 |
| a | West | 3 |
| a | West | 4 |
| a | West | 2 |
| a | East | 4 |
| a | East | 2 |
| b | East | 5 |
| b | South | 7 |
| b | South | 9 |
| c | South | 7 |
| c | North | 5 |
| c | North | 9 |
| c | North | 5 |
+------+--------+-------+
我试着做两件事
df.sort_values(by = ['Region','Sales'])
+------+--------+-------+
| Junk | Region | Sales |
+------+--------+-------+
| a | East | 2 |
| a | East | 4 |
| b | East | 5 |
| c | North | 5 |
| c | North | 5 |
| c | North | 9 |
| b | South | 7 |
| c | South | 7 |
| b | South | 9 |
| a | West | 1 |
| a | West | 2 |
| a | West | 3 |
| a | West | 4 |
+------+--------+-------+
但是我想保留Region
列的顺序<代码>西部应该是第一位,然后是东部,然后是南部,然后是北部
期望输出
+--------+----------+---------+
| Junk | Region | Sales |
+--------+----------+---------+
| a | West | 1 |
| a | West | 2 |
| a | West | 3 |
| a | West | 4 |
| a | East | 2 |
| a | East | 4 |
| b | East | 5 |
| b | South | 7 |
| c | South | 7 |
| b | South | 9 |
| c | North | 5 |
| c | North | 5 |
| c | North | 9 |
+--------+----------+---------+
Region=East
和Region=North
进行排序,其余的区域应该是这样的+--------+----------+---------+
| Junk | Region | Sales |
+--------+----------+---------+
| a | West | 1 |
| a | West | 3 |
| a | West | 4 |
| a | West | 2 |
| a | East | 2 |
| a | East | 4 |
| b | East | 5 |
| b | South | 7 |
| b | South | 9 |
| c | South | 7 |
| c | North | 5 |
| c | North | 5 |
| c | North | 9 |
+--------+----------+---------+
将
西部
、东部
、南部
和北部
映射到0,1,2,3
>>> my_order = ['West','East','South','North']
>>> order = {key: i for i, key in enumerate(my_order)}
>>> order
{'West': 0, 'East': 1, 'South': 2, 'North': 3}
并使用排序键的映射:
>>> df.iloc[df['Region'].map(order).sort_values().index]
先创建列,然后进行排序:
order = ['West', 'East', 'South', 'North']
df['Region'] = pd.CategoricalIndex(df['Region'], ordered=True, categories=order)
df = df.sort_values(by = ['Region','Sales'])
print (df)
Junk Region Sales
0 a West 1
3 a West 2
1 a West 3
2 a West 4
5 a East 2
4 a East 4
6 b East 5
7 b South 7
9 c South 7
8 b South 9
10 c North 5
12 c North 5
11 c North 9
使用map
by dictionary的解决方案,使用create new column、order和remove helper column:
order = {'West':1, 'East':2, 'South':3, 'North':4}
df = df.assign(tmp=df['Region'].map(order)).sort_values(by = ['tmp','Sales']).drop('tmp', 1)
print (df)
Junk Region Sales
6 a West 1
0 a West 2
7 a West 3
8 a West 4
2 a East 2
1 a East 4
3 b East 5
4 b South 7
9 c South 7
5 b South 9
10 c North 5
12 c North 5
11 c North 9
对于第二种情况,需要按筛选行进行排序,但要防止数据对齐,请指定numpy数组:
order = ['West', 'East', 'South', 'North']
df['Region'] = pd.CategoricalIndex(df['Region'], ordered=True, categories=order)
mask = df['Region'].isin(['North', 'East'])
df[mask] = df[mask].sort_values(['Region','Sales']).values
print (df)
Junk Region Sales
0 a West 1
1 a West 3
2 a West 4
3 a West 2
4 a East 2
5 a East 4
6 b East 5
7 b South 7
8 b South 9
9 c South 7
10 c North 5
11 c North 5
12 c North 9
map
备选方案:
order = {'East':1, 'North':2}
df = df.assign(tmp=df['Region'].map(order))
mask = df['Region'].isin(['North', 'East'])
df[mask] = df[mask].sort_values(['tmp','Sales']).values
df = df.drop('tmp', axis=1)
您可以使用
groupby
并利用sort
参数。然后将应用
和排序\u值
与条件一起使用:
sort_regions = ['North', 'East']
df.groupby('Region', sort=False).apply(
lambda x: x.sort_values('Sales')
if x['Region'].iloc[0] in sort_regions
else x
).reset_index(drop=True)
输出:
Junk Region Sales
0 a West 1
1 a West 3
2 a West 4
3 a West 2
4 a East 2
5 a East 4
6 b East 5
7 b South 7
8 b South 9
9 c South 7
10 c North 5
11 c North 5
12 c North 9
那么,最终需要的输出是什么呢?
df.sort_值(按=['Region','Sales'])。loc['West','East','South','North'],:]
将以您想要的方式给您排序,对于您的第2点,我认为您可以做df[(df.Region='East')&(df.Region='nor'North')]…排序_值(按=['Region','Sales']))
但如果您想保留数据框,请将其分开,然后附加两个数据框。请让我知道它是否适用于您,然后我会将完整的数据框作为answer@PuneetSinha它告诉我,“没有['West'、'East'、'South'、'North']]在[index]中”
错误尝试此df.set_索引([“Region”])@Rookie_123我得到的数据帧与我的输入相同dataframe@Rookie_123我的顺序=[‘西’、‘东’、‘南’、‘北’]