Python 获取多个列的排序索引顺序
我有一些类似于下面的多索引熊猫系列的东西,其中的值是按团队、年份和性别索引的Python 获取多个列的排序索引顺序,python,pandas,Python,Pandas,我有一些类似于下面的多索引熊猫系列的东西,其中的值是按团队、年份和性别索引的 >>> import pandas as pd >>> import numpy as np >>> multi_index=pd.MultiIndex.from_product([['Team A','Team B', 'Team C', 'Team D'],[2015,2016],['Male','Female']], names = ['Team','Year
>>> import pandas as pd
>>> import numpy as np
>>> multi_index=pd.MultiIndex.from_product([['Team A','Team B', 'Team C', 'Team D'],[2015,2016],['Male','Female']], names = ['Team','Year','Gender'])
>>> np.random.seed(0)
>>> df=pd.Series(index=multi_index, data=np.random.randint(1, 10, 16))
>>> df
>>>
Team Year Gender
Team A 2015 Male 6
Female 1
2016 Male 4
Female 4
Team B 2015 Male 8
Female 4
2016 Male 6
Female 3
Team C 2015 Male 5
Female 8
2016 Male 7
Female 9
Team D 2015 Male 9
Female 2
2016 Male 7
Female 8
我的目标是获得4年/性别组合(2015年男性、2016年男性、2015年女性和2016年女性)的团队排名顺序的数据框架
我的方法是首先取消数据帧的堆栈,以便由团队对其进行索引
>>> unstacked_df = df.unstack(['Year','Gender'])
>>> print unstacked_df
>>>
>>>
Year 2015 2016
Gender Male Female Male Female
Team
Team A 6 1 4 4
Team B 8 4 6 3
Team C 5 8 7 9
Team D 9 2 7 8
然后通过循环并排序这4列中的每一列,从索引顺序创建一个数据帧
>>> team_orders = np.array([unstacked_df.sort_values(x).index.tolist() for x in unstacked_df.columns]).T
>>> result = pd.DataFrame(team_orders, columns=unstacked_df.columns)
>>> print result
Year 2015 2016
Gender Male Female Male Female
0 Team C Team A Team A Team B
1 Team A Team D Team B Team A
2 Team B Team B Team C Team D
3 Team D Team C Team D Team C
有没有一种更简单/更好的方法是我缺少的?从您的非堆栈版本开始,您可以使用
.argsort()
和.apply()
对每列进行排序,然后将其用作索引的查找:
df.unstack([1,2]).apply(lambda x: x.index[x.argsort()]).reset_index(drop=True)
Year 2015 2016
Gender Male Female Male Female
0 Team C Team A Team A Team B
1 Team A Team D Team B Team A
2 Team B Team B Team C Team D
3 Team D Team C Team D Team C
编辑:这里有一些关于为什么这样做的更多信息。只需使用.argsort()
,即可获得:
print df.unstack([1,2]).apply(lambda x: x.argsort())
Year 2015 2016
Gender Male Female Male Female
Team
Team A 2 0 0 1
Team B 0 3 1 0
Team C 1 1 2 3
Team D 3 2 3 2
查找位实质上只是对每列执行以下操作:
df.unstack([1,2]).index[[2,0,1,3]]
Index([u'Team C', u'Team A', u'Team B', u'Team D'], dtype='object', name=u'Team')
.reset_index()
摆脱了现在毫无意义的索引标签。非常好。我了解argsort如何给出对每列进行排序的索引,但我不清楚x.index[x.argsort()]如何给出正确排序的团队索引。