Python 在pandas中用NaN替换某些字符串_Python_Pandas

Python 在pandas中用NaN替换某些字符串

python pandas

Python 在pandas中用NaN替换某些字符串,python,pandas,Python,Pandas,我需要遍历两列（位置和事件）中的值，并用NaN替换字符串“Gate-3”“NO Access” 下面是示例DF Time Location Event Badge ID 18:28:59 Gate-2 Access Granted 81002 18:28:12 Gate-1 Access Granted 80557 18:27:55 Gate-3 Access Granted

我需要遍历两列（位置和事件）中的值，并用NaN替换字符串“Gate-3”“NO Access”

下面是示例DF

Time        Location    Event               Badge ID
18:28:59    Gate-2      Access Granted      81002
18:28:12    Gate-1      Access Granted      80557
18:27:55    Gate-3      Access Granted      80557
18:27:44    Gate-3      NO Access           80398
18:25:38    Gate-1      NO Access           80978
18:25:30    Gate-2      Access Granted      73680
18:23:56    Gate-1      Access Granted      73680
18:23:52    Gate-2      Access Granted      80557
18:23:19    Gate-2      NO Access           128
18:23:16    Gate-1      Access Granted      80557

预期产量为

       Time Location           Event  Badge ID
0  18:28:59   Gate-2  Access Granted     81002
1  18:28:12   Gate-1  Access Granted     80557
2  18:27:55      NaN  Access Granted     80557
3  18:27:44      NaN             NaN     80398
4  18:25:38   Gate-1             NaN     80978
5  18:25:30   Gate-2  Access Granted     73680
6  18:23:56   Gate-1  Access Granted     73680
7  18:23:52   Gate-2  Access Granted     80557
8  18:23:19   Gate-2             NaN       128
9  18:23:16   Gate-1  Access Granted     80557

如果我没有误解你的问题，那么这个怎么样

import pandas as pd import numpy as np df.loc[df.Location == 'Gate-3', 'Location'] = np.nan df.loc[df.Event == 'NO Access', 'Event'] = np.nan

您可以在加载XLS文件时通过指定
na_值
参数进行设置

df = pd.read_excel('file.xls', na_values=['Gate-3', 'NO Access']) print(df) Time Location Event Badge ID 0 18:28:59 Gate-2 Access Granted 81002 1 18:28:12 Gate-1 Access Granted 80557 2 18:27:55 NaN Access Granted 80557 3 18:27:44 NaN NaN 80398 4 18:25:38 Gate-1 NaN 80978 5 18:25:30 Gate-2 Access Granted 73680 6 18:23:56 Gate-1 Access Granted 73680 7 18:23:52 Gate-2 Access Granted 80557 8 18:23:19 Gate-2 NaN 128 9 18:23:16 Gate-1 Access Granted 80557

在我看来，这比加载数据后必须清理数据要好。
您可以得到一个布尔掩码，其中您的条件与

mask = df.Location.eq('Gate-3') & df.Event.eq('NO Access') # df is your dataframe
您可以使用该掩码设置要设置的任何列
NaN
，如下所示：

df.loc[mask, ['Location', 'Event']] = np.nan # imported numpy as np

编辑：
看来你改变了规格。如果要在位置或事件列与sentinel值匹配的位置设置
NaN
，请使用两个掩码

locmask = df.Location.eq('Gate-3') df.loc[locmask, 'Location'] = np.nan evmask = df.Event.eq('NO Access') df.loc[evmask, 'Event'] = np.nan

根据条件设置列值不需要迭代。相反，您将使用布尔索引
示例：

data = {'Time':['18:28:59', '18:28:59', '18:28:59'], 'Location':['Gate-2', 'Gate-3', 'Gate-1', ], 'Event':['Access Granted', 'NO Access', 'NO Access'], 'BadgeID':[81002, 80557, 80557]} df = pd.DataFrame(data) Time Location Event BadgeID 0 18:28:59 Gate-2 Access Granted 81002 1 18:28:59 Gate-3 NO Access 80557 2 18:28:59 Gate-1 NO Access 80557
“loc”方法是一个基于标签的索引器，它接受布尔数组以及其他选项
条件表达式：

df.Location == 'Gate-3'
返回布尔数组或序列

0 False 1 True 2 False Name: Location, dtype: bool
您可以使用内置函数类型（）对此进行检查
此系列用作原始数据帧loc方法的行索引
loc方法采用行索引器和列索引器。所以下面的声明

df.loc[df.Location == 'Gate-3', 'Location'] = np.nan
翻译为：
将位置为Gate-3的行与位置列的交点设置为空值

如果在位置中找到“Gate-3”，并且在事件列中找到“NO Access”，则添加NaN您是否从CSV读取此内容？那么预期的输出是什么呢？我正在从.xlsx文件中阅读这篇文章，它只回答了我一半的问题。与其说是“NAN”，不如将其设置为
np.NAN
（您需要先
将numpy导入为np
）。这将创建一个真正的NaN值。谢谢，我正在考虑一些循环解决方案。“这是一个班轮。”尼尔森肯斯：你能举个例子吗。尼尔森肯斯完全同意你的看法。我以为OP想要字符串“NAN”：@user10317766我已经编辑了我的答案，现在看看。OP似乎也改变了规格，变成了OR。谢谢，它看起来更方便。@timgeb今天似乎是一种常见的模式。。。人们不热衷于回答问题/叹气，我也有你的答案，坚持下去，很高兴看到更多的人对标签感兴趣。谢谢。建议：为
na_值
参数传递dict更可靠，那么您可以放弃假设
Gate-3
不能出现在事件列中，
'NO Access'
不能出现在位置列中。
df.loc[df.Location == 'Gate-3', 'Location'] = np.nan