Python 对具有字符串和数字的数据帧索引进行排序_Python_Pandas

Python 对具有字符串和数字的数据帧索引进行排序

python pandas

Python 对具有字符串和数字的数据帧索引进行排序,python,pandas,Python,Pandas,我的df数据帧索引如下所示： Com_Lag_01 Com_Lag_02 Com_Lag_03 Com_Lag_04 Com_Lag_05 Com_Lag_06 Com_Lag_07 Com_Lag_08 Com_Lag_09 Com_Lag_10 Com_Lag_101 Com_Lag_102 Com_Lag_103 ... Com_Lag_11 Com_Lag_111 Com_Lag_112 Com_Lag_113 Com_Lag_114 ... Com_Lag_12 Com_Lag_120

我的

df

数据帧索引如下所示：

Com_Lag_01
Com_Lag_02
Com_Lag_03
Com_Lag_04
Com_Lag_05
Com_Lag_06
Com_Lag_07
Com_Lag_08
Com_Lag_09
Com_Lag_10
Com_Lag_101
Com_Lag_102
Com_Lag_103
...
Com_Lag_11
Com_Lag_111
Com_Lag_112
Com_Lag_113
Com_Lag_114
...
Com_Lag_12
Com_Lag_120
...
Com_Lag_13
Com_Lag_14
Com_Lag_15

我想对这个索引进行排序，使数字从

Com\u Lag\u 1

到

Com\u Lag\u 120

。如果我使用

df.sort\u index（）

我将得到与上面相同的结果。有没有关于如何正确排序此索引的建议

可以尝试这样的方法，对索引的编号版本执行排序

import pandas as pd
# Create a DataFrame example
df = pd.DataFrame(\
    {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
    'Age': [27 ,25 ,22 ,34 ,31],\
    },\
    index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])

# Add of a column containing a numbered version of the index
df['indexNumber'] = [int(i.split('_')[-1]) for i in df.index]
# Perform sort of the rows
df.sort(['indexNumber'], ascending = [True], inplace = True)
# Deletion of the added column
df.drop('indexNumber', 1, inplace = True)

编辑2017-V1：

要避免设置CopyWarning，请执行以下操作：

df = df.assign(indexNumber=[int(i.split('_')[-1]) for i in df.index])

编辑2017-V2版本，适用于熊猫版本0.21.0

import pandas as pd
print(pd.__version__)
# Create a DataFrame example
df = pd.DataFrame(\
    {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
    'Age': [27 ,25 ,22 ,34 ,31],\
    },\
    index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_5'])

df.reindex(index=df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values().index)

无新列的解决方案，其索引为排序的系列的by

索引

：

a = df.index.to_series().str.rsplit('_').str[-1].astype(int).sort_values()
print (a)
Com_Lag_1      1
Com_Lag_3      3
Com_Lag_5      5
Com_Lag_12    12
Com_Lag_24    24
dtype: int32

df = df.reindex(index=a.index)
print (df)
            Age  Year
Com_Lag_1    27  1991
Com_Lag_3    22  2001
Com_Lag_5    31  1997
Com_Lag_12   25  2004
Com_Lag_24   34  2009

但如果需要重复的值，请添加新列：

df = pd.DataFrame(\
    {'Year': [1991 ,2004 ,2001 ,2009 ,1997],\
    'Age': [27 ,25 ,22 ,34 ,31],\
    },\
    index = ['Com_Lag_1' ,'Com_Lag_12' ,'Com_Lag_3' ,'Com_Lag_24' ,'Com_Lag_12'])

print (df)
            Age  Year
Com_Lag_1    27  1991
Com_Lag_12   25  2004
Com_Lag_3    22  2001
Com_Lag_24   34  2009
Com_Lag_12   31  1997

df['indexNumber'] = df.index.str.rsplit('_').str[-1].astype(int)
df = df.sort_values(['indexNumber']).drop('indexNumber', axis=1)
print (df)
            Age  Year
Com_Lag_1    27  1991
Com_Lag_3    22  2001
Com_Lag_12   25  2004
Com_Lag_12   31  1997
Com_Lag_24   34  2009

另一个解决办法是

    df.sort_index(key=lambda x: (x.to_series().str[8:].astype(int)), inplace=True)

8来自数值开始的位置

您必须对最后一个“u”进行反向查找，然后转换为整数并按此数字排序