Python 根据列的第一个数字对数据帧进行排序_Python_Pandas_Numpy

Python 根据列的第一个数字对数据帧进行排序

python pandas numpy

Python 根据列的第一个数字对数据帧进行排序,python,pandas,numpy,Python,Pandas,Numpy,我有这样一个数据帧： Produtos Estoque total Valor Total de estoque 0 70 10000 7180 1 70 2800000 2011550 2 70 125000 89800 3 71 540

我有这样一个数据帧：

        Produtos    Estoque total     Valor Total de estoque
0            70        10000                  7180
1            70      2800000               2011550
2            70       125000                 89800
3            71       540000                530980
4            71        89000                 79280
5            84       205000                572770
...         ...           ...                    ...
14988   1003254        46000               1329400
14989   1003273     30570000               5502600
14990   1003274     62000000               3720000
14991   1003275    200000000               3840000
14992   1003276       710000               2108700
14993   1003279      6750000                715330

我尝试对“Produtos”列进行排序，首先考虑第一个数字（如果相等，则考虑第二个数字，依此类推），如：

我发现我应该使用以下命令：

line.sort(key=lambda line: int(line.split()[0]))

但我很难以正确的方式使用。

首先获取第一个值的值，然后获取位置，最后重新排序：

编辑：也可以按第一个值和长度排序此技巧-按辅助对象的索引选择数据帧

像往常一样，来自@jezrael的回答是详细和极好的。但我找到了另一个选择，我认为值得分享：

对于熊猫版本>=1.1：我们可以在以下位置使用

key

arg：

（很抱歉，我无法将列与pd.read\u clipboard（）对齐）

结果

line.sort（key=lambda-line:int（line.split（）[0]））

看起来更像python方法，而不是熊猫。你是怎么用的？我不明白这是怎么回事。如果您使用

[0]

索引并对其进行排序，那么

Produtos

恰好已经就绪，这难道不是巧合吗？什么决定了

应该在

之前？@roganjosh-它只按第一个值排序，而不是按长度排序。所以它需要第二级排序，不是吗？假设先按第一个值，然后按绝对值？我不确定它到底是什么样子：）@jezrael如果删除

.str[0]

它应该通过将字符串直接按序列排序来实现这一点。与df.sort_值（by='Produtos'）不同，它不考虑字符串的长度。

line.sort(key=lambda line: int(line.split()[0]))

df = df.iloc[df['Produtos'].astype(str).str[0].argsort()]
print (df)
       Produtos  Estoque total  Valor Total de estoque
14988   1003254          46000                 1329400
14989   1003273       30570000                 5502600
14990   1003274       62000000                 3720000
14991   1003275      200000000                 3840000
14992   1003276         710000                 2108700
14993   1003279        6750000                  715330
0            70          10000                    7180
1            70        2800000                 2011550
2            70         125000                   89800
3            71         540000                  530980
4            71          89000                   79280
5            84         205000                  572770

print (df)
          Produtos  Estoque total  Valor Total de estoque
0               70          10000                  7180.0
1               70        2800000               2011550.0
2               71         125000                 89800.0
3              710         540000                530980.0
4             7100          89000                 79280.0
5               84         205000                572770.0
14988  10032546000        1329400                     NaN
14989        10032       30570000               5502600.0
14990         1003       62000000               3720000.0
14991          100      200000000               3840000.0
14992           10         710000               2108700.0
14993      1003279        6750000                715330.0

s = df['Produtos'].astype(str)
i = pd.DataFrame(np.c_[s.str[0].astype(int), s.str.len()]).sort_values([0,1]).index
print (i)
Int64Index([10, 9, 8, 7, 11, 6, 0, 1, 2, 3, 4, 5], dtype='int64')
df = df.iloc[i]
print (df)
          Produtos  Estoque total  Valor Total de estoque
14992           10         710000               2108700.0
14991          100      200000000               3840000.0
14990         1003       62000000               3720000.0
14989        10032       30570000               5502600.0
14993      1003279        6750000                715330.0
14988  10032546000        1329400                     NaN
0               70          10000                  7180.0
1               70        2800000               2011550.0
2               71         125000                 89800.0
3              710         540000                530980.0
4             7100          89000                 79280.0
5               84         205000                572770.0

df.sort_values(by = ['Estoque'], key = lambda x:x.astype(str).str[0])