Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/292.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 根据条件提取列名称_Python_Pandas - Fatal编程技术网

Python 根据条件提取列名称

Python 根据条件提取列名称,python,pandas,Python,Pandas,基于pandas数据帧df,我执行了一个排名,可以在rank\u df中看到 现在,我想创建一个新的数据框results,它由三列[“first”、“second”、“third”]组成。此数据框应填写rank\u df的相应列名。例如,结果的第一行可能包括['ticker\u 3'、'ticker\u 1'、'ticker\u 4']。换句话说,results的first列应始终包含排名最高的rank_df的列名。等等 import numpy as np import pandas as p

基于pandas数据帧
df
,我执行了一个排名,可以在
rank\u df
中看到

现在,我想创建一个新的数据框
results
,它由三列
[“first”、“second”、“third”]
组成。此数据框应填写
rank\u df
的相应列名。例如,
结果的第一行
可能包括
['ticker\u 3'、'ticker\u 1'、'ticker\u 4']
。换句话说,
results
first
列应始终包含排名最高的rank_df的列名。等等

import numpy as np
import pandas as pd

np.random.seed(123)

cols = ["ticker_" + str(i + 1) for i in range(5)]
df = pd.DataFrame(np.random.rand(3, 5), columns=cols)
df
输出:

   ticker_1  ticker_2  ticker_3  ticker_4  ticker_5
0  0.696469  0.286139  0.226851  0.551315  0.719469
1  0.423106  0.980764  0.684830  0.480932  0.392118
2  0.343178  0.729050  0.438572  0.059678  0.398044
   ticker_1  ticker_2  ticker_3  ticker_4  ticker_5
0       2.0       4.0       5.0       3.0       1.0
1       4.0       1.0       2.0       3.0       5.0
2       4.0       1.0       2.0       5.0       3.0
value     first    second     third
index                              
0      ticker_5  ticker_1  ticker_4
1      ticker_2  ticker_3  ticker_4
2      ticker_2  ticker_3  ticker_5
生成秩_df:

rank_df = df.rank(axis=1, method="first", ascending=False)
rank_df
输出:

   ticker_1  ticker_2  ticker_3  ticker_4  ticker_5
0  0.696469  0.286139  0.226851  0.551315  0.719469
1  0.423106  0.980764  0.684830  0.480932  0.392118
2  0.343178  0.729050  0.438572  0.059678  0.398044
   ticker_1  ticker_2  ticker_3  ticker_4  ticker_5
0       2.0       4.0       5.0       3.0       1.0
1       4.0       1.0       2.0       3.0       5.0
2       4.0       1.0       2.0       5.0       3.0
value     first    second     third
index                              
0      ticker_5  ticker_1  ticker_4
1      ticker_2  ticker_3  ticker_4
2      ticker_2  ticker_3  ticker_5
需要产生结果

# NaNs in this final DataFrame needs to be filled with the respective column names
results = pd.DataFrame(None, index=rank_df.index, columns=["first", "second", "third"])

IIUC,您可以尝试使用
argsort

print(df)
    ticker_1  ticker_2  ticker_3  ticker_4  ticker_5
0  0.548814  0.715189  0.602763  0.544883  0.423655
1  0.645894  0.437587  0.891773  0.963663  0.383442
2  0.791725  0.528895  0.568045  0.925597  0.071036

results[:] = df.columns.to_numpy()[np.argsort(-df)][:,:3] #change 3 to n as reqd
print(results)


另一种方法是使用熊猫造型:

rank_df.reset_index().melt('index').pivot('index', 'value', 'variable')\
       .rename(columns={1.0:'first', 2.0:'second', 3.0:'third'}).iloc[:, :3]
输出:

   ticker_1  ticker_2  ticker_3  ticker_4  ticker_5
0  0.696469  0.286139  0.226851  0.551315  0.719469
1  0.423106  0.980764  0.684830  0.480932  0.392118
2  0.343178  0.729050  0.438572  0.059678  0.398044
   ticker_1  ticker_2  ticker_3  ticker_4  ticker_5
0       2.0       4.0       5.0       3.0       1.0
1       4.0       1.0       2.0       3.0       5.0
2       4.0       1.0       2.0       5.0       3.0
value     first    second     third
index                              
0      ticker_5  ticker_1  ticker_4
1      ticker_2  ticker_3  ticker_4
2      ticker_2  ticker_3  ticker_5

您能否添加示例数据帧和预期输出?是的,请向我们展示示例输入和输出。