Python 如何选择行并替换表中的某些列_Python_Pandas

Python 如何选择行并替换表中的某些列

python pandas

Python 如何选择行并替换表中的某些列,python,pandas,Python,Pandas,如果我有如下数据 import pandas as pd dic = {'A': [np.nan, 4, np.nan, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]} df = pd.DataFrame(dic) df 我想选择列A为NaN的原始值，并用np.NaN替换列B的值，如下所示 A B C 0 NaN 9 0 1 4.0 2 0 2 NaN 5 5 3 4.0 3 3 我试图做df[df.A.i

如果我有如下数据

import pandas as pd
dic = {'A': [np.nan, 4, np.nan, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]}
df = pd.DataFrame(dic)
df

我想选择列A为

NaN

的原始值，并用np.NaN替换列B的值，如下所示

     A  B   C
0   NaN 9   0
1   4.0 2   0
2   NaN 5   5
3   4.0 3   3

我试图做

df[df.A.isna（）][“B”]=np.nan

，但没有成功。

根据，我应该通过

df.iloc

选择数据。但问题是，如果df有许多行，我就不能按输入索引选择数据。

选项1
事实上你很接近。在

上使用

pd.Series.isnull

，并使用

df.loc

为

赋值：

    A   B   C
0   NaN NaN 0
1   4.0 2.0 0
2   NaN NaN 5
3   4.0 3.0 3

选项2

np.其中

：

df.loc[df.A.isnull(), 'B'] = np.nan
df

     A    B  C
0  NaN  NaN  0
1  4.0  2.0  0
2  NaN  NaN  5
3  4.0  3.0  3

使用或与反转条件一起使用-默认情况下替换为

NaN

df['B'] = np.where(df.A.isnull(), np.nan, df.B)
df

     A    B  C
0  NaN  NaN  0
1  4.0  2.0  0
2  NaN  NaN  5
3  4.0  3.0  3

使用非常相似-定义两种输出：

df['B'] = df.B.where(df.A.notnull())

计时：

print (df)
     A    B  C
0  NaN  NaN  0
1  4.0  2.0  0
2  NaN  NaN  5
3  4.0  3.0  3

因为我的对手做出了合乎逻辑的选择

dic = {'A': [np.nan, 4, np.nan, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]}
df = pd.DataFrame(dic)
df = pd.concat([df] * 10000, ignore_index=True)


In [61]: %timeit df['B'] = np.where(df.A.isnull(), np.nan, df.B)
The slowest run took 7.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 405 µs per loop

In [62]: %timeit df['B'] = df.B.mask(df.A.isnull())
The slowest run took 70.14 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 3.54 ms per loop

In [63]: %timeit df['B'] = df.B.where(df.A.notnull())
The slowest run took 5.65 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.04 ms per loop

In [65]: %timeit df.B += df.A * 0
The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 913 µs per loop

In [67]: %timeit df.loc[df.A.isnull(), 'B'] = np.nan
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.88 ms per loop

有趣的解决方案；）今晚我似乎有点开玩笑（：嗯，may zing…！这是一个聪明的异想天开的答案。其他两个答案都需要再次投票才能让社区了解它们的价值。

df.loc

是我想要的答案。谢谢你的时间安排！（和+1）。谢谢。这很有效。但我认为

df.loc[df.a.isnull（），'B']=np.nan

更具可读性。@Dawei-对我个人来说，更具可读性

mask

或

np.where

；）但所有解决方案都很好，所以取决于您使用什么解决方案；）

print (df)
     A    B  C
0  NaN  NaN  0
1  4.0  2.0  0
2  NaN  NaN  5
3  4.0  3.0  3

dic = {'A': [np.nan, 4, np.nan, 4], 'B': [9, 2, 5, 3], 'C': [0, 0, 5, 3]}
df = pd.DataFrame(dic)
df = pd.concat([df] * 10000, ignore_index=True)


In [61]: %timeit df['B'] = np.where(df.A.isnull(), np.nan, df.B)
The slowest run took 7.57 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 405 µs per loop

In [62]: %timeit df['B'] = df.B.mask(df.A.isnull())
The slowest run took 70.14 times longer than the fastest. This could mean that an intermediate result is being cached.
1 loop, best of 3: 3.54 ms per loop

In [63]: %timeit df['B'] = df.B.where(df.A.notnull())
The slowest run took 5.65 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 1.04 ms per loop

In [65]: %timeit df.B += df.A * 0
The slowest run took 12.44 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 913 µs per loop

In [67]: %timeit df.loc[df.A.isnull(), 'B'] = np.nan
The slowest run took 4.56 times longer than the fastest. This could mean that an intermediate result is being cached.
100 loops, best of 3: 2.88 ms per loop

df.B += df.A * 0
df

     A    B  C
0  NaN  NaN  0
1  4.0  2.0  0
2  NaN  NaN  5
3  4.0  3.0  3