Python 将多列中的多个字符串和数字替换为Pandas中的NaN
如果我有以下数据帧,我希望通过将多个字符串和数字替换为Python 将多列中的多个字符串和数字替换为Pandas中的NaN,python,python-3.x,pandas,Python,Python 3.x,Pandas,如果我有以下数据帧,我希望通过将多个字符串和数字替换为NaNs来清理数据:即68、Tardeo Road和0来自state、dept的567和错误和123来自phonenumber: id state dept \ 0 1 Abu Dhabi {Marketing} 1 2
NaN
s来清理数据:即68、Tardeo Road
和0
来自state
、dept
的567
和错误代码>和123
来自phonenumber
:
id state dept \
0 1 Abu Dhabi {Marketing}
1 2 MO {Other}
2 3 68, Tardeo Road {"Human Resources"}
3 4 National Capital Territory of Delhi {"Human Resources"}
4 5 Aargau Canton {Marketing}
5 6 Aargau Canton 567
6 18 NB {"Finance & Administration"}
7 19 0 {Sales}
8 20 Abu Dhabi {"Human Resources"}
9 21 Aargau {"Finance & Administration"}
phonenumber
0 123
1 5635888000
2 18006708450
3 #ERROR!
4 12032722596
5 18003928343
6 NaN
7 #ERROR!
8 NaN
9 NaN
我尝试了以下代码:
解决方案1:
mask = (df.state == '0') | (df.state == '68, Tardeo Road')
df.loc[mask, ['state']] = np.nan
解决方案2:
df.loc[(df.state == '68, Tardeo Road') | (df.state == 0), 'state'] = np.nan
解决方案3:
df.loc[df.state == '0', 'state'] = np.nan
df.loc[df.state == '68, Tardeo Road', 'state'] = np.nan
所有这些方法都有效,但如果我将它们应用于多个列,则会有点长
只是想知道是否有可能让它更简洁高效?例如,使用str.replace
。谢谢。您可以更换:
df = df.replace({'state':['68, Tardeo Road','0'],
'dept':['567'],
'phonenumber':['#ERROR!','123']}, np.nan)
输出:
id state dept phonenumber
-- ---- ----------------------------------- ---------------------------- -------------
0 1 Abu Dhabi {Marketing} nan
1 2 MO {Other} 5635888000
2 3 nan {"Human Resources"} 18006708450
3 4 National Capital Territory of Delhi {"Human Resources"} nan
4 5 Aargau Canton {Marketing} 12032722596
5 6 Aargau Canton nan 18003928343
6 18 NB {"Finance & Administration"} nan
7 19 nan {Sales} nan
8 20 Abu Dhabi {"Human Resources"} nan
9 21 Aargau {"Finance & Administration"} nan
它应该是|
而不是&
。一个值怎么能同时是0
和68…
?谢谢,重新测试后,三种解决方案都能工作。但如果有可能使它更简洁?尤其是当我们有很多柱子的时候。