Python 如何使用数据帧中的值获取列名？_Python_Pandas_Dataframe

Python 如何使用数据帧中的值获取列名？

python pandas dataframe

Python 如何使用数据帧中的值获取列名？,python,pandas,dataframe,Python,Pandas,Dataframe,假设有一个750x750矩阵放在一个数据帧中，比如df df= c1 c2 c3 ... c750 c1 5 2 5 ... 3 c2 3 1 5 ... 80 c3 4 2 7 ... 10 . . . . ... . . . . . ... . . . . . ... . c75

假设有一个750x750矩阵放在一个数据帧中，比如df

df=

        c1   c2   c3  ... c750
c1      5    2    5   ...   3 
c2      3    1    5   ...   80
c3      4    2    7   ...   10
.       .    .    .   ...   .
.       .    .    .   ...   .
.       .    .    .   ...   .
c750    8    3    5   ...   1

我想找出每行包含列的4个最高值，我可以通过以下方式轻松实现：

a = df.values
a.sort(axis=1)
sorted_table = a[:,-4::]
b = a[:,::-1]

然而，我得到的结果只是一个列表，没有索引和列名

[[ 98.      29.      15.      10.]
 [ 93.      91.      75.      60.]
 [ 48.      21.      17.      10.]
.
.
.
...]

如果我想知道排序值引用的是哪个列名，我应该怎么做

我想展示：

 df=

c1      c512    c20    c57     c310 
c2      c317    c133   c584    c80
c3      c499    c289   c703    c100
.       .    .    .   ...    .
.       .    .    .   ...    .
.       .    .    .   ...    .
c750    c89    c31    c546     c107

在哪里

  c512 is referring  to 98

  c20 is referring to 29

  c57 is referring to 15

and so and so.

我怀疑这是最好的答案，但我认为它是有效的。我讨厌在pandas中使用

for

循环，但我想不出一个pandas方法来实现它

import pandas as pd
import numpy as np

#array_size = 10

#--- Generate Data and create toy Dataframe ---
array_size = 750
np.random.seed(1)
data = np.random.randint(0, 1000000, array_size**2)
data = data.reshape((array_size, array_size))
df = pd.DataFrame(data, columns=['c'+str(i) for i in range(1, (array_size)+1)])
df.index = df.columns

#--- Transpose the dataframe to more familiarly sort by columns instead of rows ---
df = df.T

#--- Rank values in dataframe using max method where highest value is rank 1 ---
df = df.rank(method='max', ascending=False)

#--- Create empty dataframe to put data into ---
new_df = pd.DataFrame()

#--- For loop for each column to get top ranks less than 5, sort them, reset index, drop i column
for i in df.columns:
  s = df[i][df[i] < 5].sort_values().reset_index().drop(i, axis=1)
  new_df = pd.concat([new_df, s.T])

#--- The new_df index will say 'index', this reassigns the transposed column names to new_df's index
new_df.index = df.columns
print(new_df)

您可以使用

df.apply（myfunc，axis=1）

而不是

df.sort

。这将允许您操作列名及其值。您有想要的输出示例吗？我看到的一个问题是，一个列可能有多个值最高的行，因此按该行排序可能会显示您想要的方式。您想如何显示哪些列名属于具有最高值的每一行？@Jarad，我想显示上面更新的数据。希望你能给我一些建议。

         0     1     2     3
c1    c479  c545  c614  c220
c2    c249  c535  c231  c680
c3    c657  c603  c137  c740
c4    c674  c424  c426  c127
...    ...   ...   ...   ...
c747  c251  c536  c321  c296
c748   c55  c383  c437  c103
c749  c138  c495  c299  c295
c750  c178  c556  c491  c445