Python 熊猫：是否有一种通过提供索引标签列表对行进行排序的原生方法？_Python_Pandas

Python 熊猫：是否有一种通过提供索引标签列表对行进行排序的原生方法？

python pandas

Python 熊猫：是否有一种通过提供索引标签列表对行进行排序的原生方法？,python,pandas,Python,Pandas,让我们以这个数据帧为例： import pandas as pd L0 = ['d','a','b','c','d','a','b','c','d','a','b','c'] L1 = ['z','z','z','z','x','x','x','x','y','y','y','y'] L2 = [1,6,3,8,7,6,7,6,3,5,6,5] df = pd.DataFrame({"A":L0,"B":L1,"C":L2}) df = df.pivot(columns="A",index="B

让我们以这个数据帧为例：

import pandas as pd
L0 = ['d','a','b','c','d','a','b','c','d','a','b','c']
L1 = ['z','z','z','z','x','x','x','x','y','y','y','y']
L2 = [1,6,3,8,7,6,7,6,3,5,6,5]
df = pd.DataFrame({"A":L0,"B":L1,"C":L2})
df = df.pivot(columns="A",index="B",values="C")

旋转轴后，列和行按字母顺序排列

重新排列列很容易，可以使用列标签的自定义列表来完成：

df = df[['d','a','b','c']]

但行的重新排序没有这样的直接功能，我能想到的最优雅的方法是使用列标签功能并前后转换：

df = df.T[['z','x','y']].T

这样做完全没有效果：

df.loc[['x','y','z'],:] = df.loc[['z','x','y'],:]

通过提供索引标签的自定义列表，没有直接的方法对数据帧的行进行排序吗？

您可以使用，或者使用比

loc

更快的方法：

对于

索引

：

idx = ['z','x','y']
df = df.reindex(idx)
print (df)
A  a  b  c  d
B            
z  6  3  8  1
x  6  7  6  7
y  5  6  5  3

或：

正如所指出的那样：

对于列：

cols = ['d','a','b','c']
df = df.reindex(columns=cols)
print (df)
A  d  a  b  c
B            
x  7  6  7  6
y  3  5  6  5
z  1  6  3  8

cols = ['d','a','b','c']
df = df.reindex_axis(cols, axis=1)
print (df)
A  d  a  b  c
B            
x  7  6  7  6
y  3  5  6  5
z  1  6  3  8

两者：

计时：

In [43]: %timeit (df.loc[['z', 'x', 'y'], ['d', 'a', 'b', 'c']])
1000 loops, best of 3: 653 µs per loop

In [44]: %timeit (df.reindex(columns=cols, index=idx))
1000 loops, best of 3: 402 µs per loop

仅索引：

In [49]: %timeit (df.reindex(idx))
The slowest run took 5.16 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 271 µs per loop

In [50]: %timeit (df.reindex_axis(idx))
The slowest run took 6.50 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 252 µs per loop


In [51]: %timeit (df.loc[['z', 'x', 'y']])
The slowest run took 5.51 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 418 µs per loop

In [52]: %timeit (df.loc[['z', 'x', 'y'], :])
The slowest run took 4.87 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 542 µs per loop

您可以使用或，速度更快的是

loc

：

对于

索引

：

idx = ['z','x','y']
df = df.reindex(idx)
print (df)
A  a  b  c  d
B            
z  6  3  8  1
x  6  7  6  7
y  5  6  5  3

或：

正如所指出的那样：

对于列：

cols = ['d','a','b','c']
df = df.reindex(columns=cols)
print (df)
A  d  a  b  c
B            
x  7  6  7  6
y  3  5  6  5
z  1  6  3  8

cols = ['d','a','b','c']
df = df.reindex_axis(cols, axis=1)
print (df)
A  d  a  b  c
B            
x  7  6  7  6
y  3  5  6  5
z  1  6  3  8

两者：

计时：

In [43]: %timeit (df.loc[['z', 'x', 'y'], ['d', 'a', 'b', 'c']])
1000 loops, best of 3: 653 µs per loop

In [44]: %timeit (df.reindex(columns=cols, index=idx))
1000 loops, best of 3: 402 µs per loop

仅索引：

In [49]: %timeit (df.reindex(idx))
The slowest run took 5.16 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 271 µs per loop

In [50]: %timeit (df.reindex_axis(idx))
The slowest run took 6.50 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 252 µs per loop


In [51]: %timeit (df.loc[['z', 'x', 'y']])
The slowest run took 5.51 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 418 µs per loop

In [52]: %timeit (df.loc[['z', 'x', 'y'], :])
The slowest run took 4.87 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 542 µs per loop

使用loc
是一种非常自然的方法

df.loc[['z', 'x', 'y']]

A  d  a  b  c
B            
z  1  6  3  8
x  7  6  7  6
y  3  5  6  5

您可以使用

df = df.loc[['z', 'x', 'y']]

两个轴一次通过

loc

df.loc[['z', 'x', 'y'], ['d', 'a', 'b', 'c']]

A  d  a  b  c
B            
z  1  6  3  8
x  7  6  7  6
y  3  5  6  5

使用numpy.searchsorted

l = list('zxy')
a = df.index.values.searchsorted(l)
pd.DataFrame(
    df.values[a],
    df.index[a], df.columns
)

A  d  a  b  c
B            
z  1  6  3  8
x  7  6  7  6
y  3  5  6  5

使用loc
是一种非常自然的方法

df.loc[['z', 'x', 'y']]

A  d  a  b  c
B            
z  1  6  3  8
x  7  6  7  6
y  3  5  6  5

您可以使用

df = df.loc[['z', 'x', 'y']]

两个轴一次通过

loc

df.loc[['z', 'x', 'y'], ['d', 'a', 'b', 'c']]

A  d  a  b  c
B            
z  1  6  3  8
x  7  6  7  6
y  3  5  6  5

使用numpy.searchsorted

l = list('zxy')
a = df.index.values.searchsorted(l)
pd.DataFrame(
    df.values[a],
    df.index[a], df.columns
)

A  d  a  b  c
B            
z  1  6  3  8
x  7  6  7  6
y  3  5  6  5

你可以只做

df=df.loc['z'，'x'，'y']，：]

你可以只做

df=df.loc['z'，'x'，'y']，：]

你需要和

df.loc['z'，'x'，'y']]

进行比较，否则就不公平了。同样，

loc

比较慢。我认为因为

reindex

和

reindex\u axis

主要是为它而实现的，

loc

是为选择而实现的。这很有意义@Khris-hmmm，因此，如果需要最快的解决方案，numpy解决方案将获胜。对于我当前的问题，我想要最简单的解决方案，但你永远不知道我将来何时需要最快的解决方案。：）你需要只与df.loc[['z'，x'，y']]进行比较，否则比较不公平。同样，

loc

比较慢。我认为因为

reindex

和

reindex\u axis

主要是为它而实现的，

loc

是为选择而实现的。这很有意义@Khris-hmmm，因此，如果需要最快的解决方案，numpy解决方案将获胜。对于我当前的问题，我想要最简单的解决方案，但你永远不知道我将来何时需要最快的解决方案。：）