Python 按行对分组的列进行排序_Python_Sorting_Pandas_Numpy

Python 按行对分组的列进行排序

python sorting pandas numpy

Python 按行对分组的列进行排序,python,sorting,pandas,numpy,Python,Sorting,Pandas,Numpy,我正在做一些机器学习任务，我想把每一行从“编号对象”改为“按属性对象排序” 例如，我有两个团队中的5个英雄，由他们的统计数据表示（dN_uu%stat%和rN_u%stat%），我想要的是根据编号为3,4,0,2的统计数据对每个团队中的英雄进行排序，以便第一个是最强的，依此类推这是我当前的代码，但速度非常慢，因此我想使用本机pandas对象和操作： def sort_heroes(df): for match_id in df.index: for team in ['

我正在做一些机器学习任务，我想把每一行从“编号对象”改为“按属性对象排序”

例如，我有两个团队中的5个英雄，由他们的统计数据表示（dN_uu%stat%和rN_u%stat%），我想要的是根据编号为3,4,0,2的统计数据对每个团队中的英雄进行排序，以便第一个是最强的，依此类推

这是我当前的代码，但速度非常慢，因此我想使用本机pandas对象和操作：

def sort_heroes(df):
    for match_id in df.index:
        for team in ['r', 'd']:
            heroes = []
            for n in range(1,6):
                heroes.append(
                    [df.ix[match_id, '%s%s_%s' % (team, n, stat)]
                     for stat in stats])

            heroes.sort(key=lambda x: (x[3], x[4], x[0], x[2]))
            for n in range(1,6):
                for i, stat in enumerate(stats):
                    df.ix[match_id, '%s%s_%s' %
                          (team, n, stat)] = heroes[n - 1][i]

具有不完整但有用的数据表示的简短示例：

match_id  r1_xp  r1_gold  r2_xp  r2_gold  r3_xp  r3_gold  d1_xp  d1_gold d2_xp d2_gold
1         10     20       100    10       5000   300      0      0       15     5
2         1      1        1000   80       100    13       200    87      311    67

我想要的是按前缀（rN_u和dN_u）分组对这些列进行排序，首先按gold排序，然后按xp排序

match_id  r1_xp  r1_gold  r2_xp  r2_gold  r3_xp  r3_gold  d1_xp  d1_gold d2_xp d2_gold
1         5000   300      10     20       100    20       15     5       0      0
2         1000   80       100    13       1      1        200    87      311    67

您可以使用：

df.set_index('match_id', inplace=True)
#create MultiIndex  with 3 levels
arr = df.columns.str.extract('([rd])(\d*)_(.*)', expand=True).T.values
df.columns = pd.MultiIndex.from_arrays(arr)
#reshape df, sorting
df = df.stack([0,1]).reset_index().sort_values(['match_id','level_1','gold','xp'], 
                                                ascending=[True,False,False,False])
print (df)
   match_id level_1 level_2   gold      xp
4         1       r       3  300.0  5000.0
2         1       r       1   20.0    10.0
3         1       r       2   10.0   100.0
1         1       d       2    5.0    15.0
0         1       d       1    0.0     0.0
8         2       r       2   80.0  1000.0
9         2       r       3   13.0   100.0
7         2       r       1    1.0     1.0
5         2       d       1   87.0   200.0
6         2       d       2   67.0   311.0

#asign new values to level 2
df.level_2 = df.groupby(['match_id','level_1']).cumcount().add(1).astype(str)
#get original shape
df = df.set_index(['match_id','level_1','level_2']).stack().unstack([1,2,3]).astype(int)
df = df.sort_index(level=[0,1,2], ascending=[False, True, False], axis=1)
#Multiindex in columns to column names
df.columns = ['{}{}_{}'.format(x[0], x[1], x[2]) for x in df.columns]
df.reset_index(inplace=True)

您能否发布一个简短的数据框结构示例？您的数据框中有哪些行？每行表示有关英雄%team%%NUM%参与比赛的信息第一个想法：将您的数据框拆分为2个数据框，每组一个，然后将每个组拆分为2个数据框，每个数据框仅包含gold或xp，然后使用

df.sort（axis=1）

对每个数据帧按行排序，然后重建原始数据帧。但是，您需要注意（如您发布的示例中所示），您将丢失有关这些值属于哪个英雄的信息。不过，我不知道这是否有什么关系。可能有一个比实际拆分帧更优雅的解决方案，但这应该会给您带来更好的性能，因为您不再在python中执行循环。如果您在pandas数据帧上迭代，请始终使用内置迭代器之一，即

iterrows

，

iteritems

，或

itertuples

。

print (df)
   match_id  r1_xp  r1_gold  r2_xp  r2_gold  r3_xp  r3_gold  d1_xp  d1_gold  \
0         1   5000      300     10       20    100       10     15        5   
1         2   1000       80    100       13      1        1    200       87   

   d2_xp  d2_gold  
0      0        0  
1    311       67