Pandas 如何将(1)设置为数据帧中的最大元素,将(0)设置为其他所有元素?

Pandas 如何将(1)设置为数据帧中的最大元素,将(0)设置为其他所有元素?,pandas,Pandas,假设我有一个熊猫数据框 df = pd.DataFrame(index = [ix for ix in range(10)], columns=list('abcdef'), data=np.random.randn(10,6)) df: 产生 df: a b c d e f 0 0 0 0 0 0 1 1 0 0 0 1 0 0 2 0 1 0 0 0 0 3 0 0 0 0 0 1 4 0 0 0 0 1 0 5 0 0 0

假设我有一个熊猫数据框

df = pd.DataFrame(index = [ix for ix in range(10)], columns=list('abcdef'), data=np.random.randn(10,6))
df:

产生 df:

a b c d e f 0 0 0 0 0 0 1 1 0 0 0 1 0 0 2 0 1 0 0 0 0 3 0 0 0 0 0 1 4 0 0 0 0 1 0 5 0 0 0 1 0 0 6 0 1 0 0 0 0 7 0 0 1 0 0 0 8 0 0 0 0 1 0 9 0 0 0 1 0 0
有人有更好的方法吗?可以通过去除for循环或应用lambda?

遵循良信电器的模式,这是一个较短的版本:

import numpy as np


def max_binary(df):
        binary = np.where( df == df.max() , 1 , 0 )
        return binary


df.apply( max_binary , axis = 1)
df.apply( lambda x: np.where(x == x.max() , 1 , 0) , axis = 1)
使用并检查是否相等使用并将布尔值df转换为int使用,这将把
True
False
转换为
1
0

In [21]:
df = pd.DataFrame(index = [ix for ix in range(10)], columns=list('abcdef'), data=np.random.randn(10,6))
df

Out[21]:
          a         b         c         d         e         f
0  0.797000  0.762125 -0.330518  1.117972  0.817524  0.041670
1  0.517940  0.357369 -1.493552 -0.947396  3.082828  0.578126
2  1.784856  0.672902 -1.359771 -0.090880 -0.093100  1.099017
3 -0.493976 -0.390801 -0.521017  1.221517 -1.303020  1.196718
4  0.687499 -2.371322 -2.474101 -0.397071  0.132205  0.034631
5  0.573694 -0.206627 -0.106312 -0.661391 -0.257711 -0.875501
6 -0.415331  1.185901  1.173457  0.317577 -0.408544 -1.055770
7 -1.564962 -0.408390 -1.372104 -1.117561 -1.262086 -1.664516
8 -0.987306  0.738833 -1.207124  0.738084  1.118205 -0.899086
9  0.282800 -1.226499  1.658416 -0.381222  1.067296 -1.249829

In [22]:
df = df.eq(df.max(axis=1), axis=0).astype(int)
df

Out[22]:
   a  b  c  d  e  f
0  0  0  0  1  0  0
1  0  0  0  0  1  0
2  1  0  0  0  0  0
3  0  0  0  1  0  0
4  1  0  0  0  0  0
5  1  0  0  0  0  0
6  0  1  0  0  0  0
7  0  1  0  0  0  0
8  0  0  0  0  1  0
9  0  0  1  0  0  0
计时

In [24]:
# @Raihan Masud's method
%timeit df.apply( lambda x: np.where(x == x.max() , 1 , 0) , axis = 1)
# mine
%timeit df.eq(df.max(axis=1), axis=0).astype(int)
100 loops, best of 3: 7.94 ms per loop
1000 loops, best of 3: 640 µs per loop

In [25]:
# @Nader Hisham's method
%%timeit 
def max_binary(df):
    binary = np.where( df == df.max() , 1 , 0 )
    return binary
​
df.apply( max_binary , axis = 1)
100 loops, best of 3: 9.63 ms per loop
你可以看到,我的方法比@Raihan的方法快12倍多

In [4]:
%%timeit
for i in range(len(df)):
    df.loc[i][df.loc[i].idxmax(axis=1)] = 1
    df.loc[i][df.loc[i] != 1] = 0

10 loops, best of 3: 21.1 ms per loop

for
的循环速度也明显较慢

@RaihanMasud很高兴这有帮助,您可以检查答案以确认它对您有效,答案左侧的这个真实标志感谢@EdChum。你试过我原来的帖子了吗?我很想知道和你的相比,这个花了多少时间<范围内i的代码>长度(df)):。。。df.loc[i][df.loc[i].idxmax(轴=1)]=1。。。df.loc[i][df.loc[i]!=1]=0我编辑了这篇文章,使用for循环是最慢的方法
In [21]:
df = pd.DataFrame(index = [ix for ix in range(10)], columns=list('abcdef'), data=np.random.randn(10,6))
df

Out[21]:
          a         b         c         d         e         f
0  0.797000  0.762125 -0.330518  1.117972  0.817524  0.041670
1  0.517940  0.357369 -1.493552 -0.947396  3.082828  0.578126
2  1.784856  0.672902 -1.359771 -0.090880 -0.093100  1.099017
3 -0.493976 -0.390801 -0.521017  1.221517 -1.303020  1.196718
4  0.687499 -2.371322 -2.474101 -0.397071  0.132205  0.034631
5  0.573694 -0.206627 -0.106312 -0.661391 -0.257711 -0.875501
6 -0.415331  1.185901  1.173457  0.317577 -0.408544 -1.055770
7 -1.564962 -0.408390 -1.372104 -1.117561 -1.262086 -1.664516
8 -0.987306  0.738833 -1.207124  0.738084  1.118205 -0.899086
9  0.282800 -1.226499  1.658416 -0.381222  1.067296 -1.249829

In [22]:
df = df.eq(df.max(axis=1), axis=0).astype(int)
df

Out[22]:
   a  b  c  d  e  f
0  0  0  0  1  0  0
1  0  0  0  0  1  0
2  1  0  0  0  0  0
3  0  0  0  1  0  0
4  1  0  0  0  0  0
5  1  0  0  0  0  0
6  0  1  0  0  0  0
7  0  1  0  0  0  0
8  0  0  0  0  1  0
9  0  0  1  0  0  0
In [24]:
# @Raihan Masud's method
%timeit df.apply( lambda x: np.where(x == x.max() , 1 , 0) , axis = 1)
# mine
%timeit df.eq(df.max(axis=1), axis=0).astype(int)
100 loops, best of 3: 7.94 ms per loop
1000 loops, best of 3: 640 µs per loop

In [25]:
# @Nader Hisham's method
%%timeit 
def max_binary(df):
    binary = np.where( df == df.max() , 1 , 0 )
    return binary
​
df.apply( max_binary , axis = 1)
100 loops, best of 3: 9.63 ms per loop
In [4]:
%%timeit
for i in range(len(df)):
    df.loc[i][df.loc[i].idxmax(axis=1)] = 1
    df.loc[i][df.loc[i] != 1] = 0

10 loops, best of 3: 21.1 ms per loop