Python 查找具有最高值的列（熊猫）_Python_Pandas_Dataframe

Python 查找具有最高值的列（熊猫）

python pandas dataframe

Python 查找具有最高值的列（熊猫）,python,pandas,dataframe,Python,Pandas,Dataframe,我有一个Pandas数据框，它有几个列，范围从0到100。我想在dataframe中添加一列，其中包含每行具有最大值的列的名称。因此： one two three four COLUMN_I_WANT_TO_CREATE 5 40 12 19 two 90 15 58 23 one 74 95 34 12 two 44 81 22 97 four 10 59 59 44

我有一个Pandas数据框，它有几个列，范围从0到100。我想在dataframe中添加一列，其中包含每行具有最大值的列的名称。因此：

one   two   three four  COLUMN_I_WANT_TO_CREATE
5     40    12    19    two
90    15    58    23    one
74    95    34    12    two
44    81    22    97    four
10    59    59    44    [either two or three, selected randomly]

等等

如果解决方案可以随机解决关系，则可获得额外积分。

您可以使用参数

axis=1

：

print df
   one  two  three  four
0    5   40     12    19
1   90   15     58    23
2   74   95     34    12
3   44   81     22    97

df['COLUMN_I_WANT_TO_CREATE'] = df.idxmax(axis=1)
print df
   one  two  three  four COLUMN_I_WANT_TO_CREATE
0    5   40     12    19                     two
1   90   15     58    23                     one
2   74   95     34    12                     two
3   44   81     22    97                    four

随机重复最大值更复杂

您可以首先通过

x[（x==x.max（））]

查找所有值。然后您需要

索引

值，如适用。但它仅适用于

系列

，因此

索引

转换为

系列

by。最后，您可以通过以下方式仅选择

系列的第一个值：
没有随机平局决议，我认为这显然是一个复制品，并可能其他。也许我们应该集中精力。是的，你是对的，希望我在搜索中找到那一个。是sample进行随机分组吗？我不太明白它是怎么做到的。是的，请检查链接。啊，这不是我所说的随机解决关系的意思。（我更新了我的问题来澄清。）@jezrael：这不是OP想要的那种取样。他希望在每行的最大值之间随机，而不是在最大值列之间随机。
print df
   one  two  three  four
0    5   40     12    19
1   90   15     58    23
2   74   95     34    12
3   44   81     22    97
4   10   59     59    44
5   59   59     59    59
6   10   59     59    59
7   59   59     59    59

#first run
df['COL']=df.apply(lambda x:x[(x==x.max())].index.to_series().sample(frac=1).iloc[0], axis=1)
print df
   one  two  three  four    COL
0    5   40     12    19    two
1   90   15     58    23    one
2   74   95     34    12    two
3   44   81     22    97   four
4   10   59     59    44  three
5   59   59     59    59    one
6   10   59     59    59    two
7   59   59     59    59  three

#one of next run
df['COL']=df.apply(lambda x:x[(x==x.max())].index.to_series().sample(frac=1).iloc[0], axis=1)
print df
   one  two  three  four    COL
0    5   40     12    19    two
1   90   15     58    23    one
2   74   95     34    12    two
3   44   81     22    97   four
4   10   59     59    44    two
5   59   59     59    59    one
6   10   59     59    59  three
7   59   59     59    59   four