Python 如何仅在字符串完全显示时从字符串中删除特定单词
我有一个看起来像这样的数据框:Python 如何仅在字符串完全显示时从字符串中删除特定单词,python,regex,pandas,Python,Regex,Pandas,我有一个看起来像这样的数据框: 1 Hello? 2 Control. 3 that nan far. 4 Just in the last 20 years since your father di... 5 nan your
1 Hello?
2 Control.
3 that nan far.
4 Just in the last 20 years since your father di...
5 nan your father made all the financial nan nan...
我想从文本中删除子字符串“nan”。为此,我一直在使用以下方法:
df['words_no_nan'] = df['words'].replace(regex=True,to_replace=r'nan',value=r'')
这导致:
1 Hello?
2 Control.
3 that far.
4 Just in the last 20 years since your father di...
5 your father made all the ficial
这基本上是有效的,但当“nan”出现在更大的单词中时,它会删除它。例如,在第5行中,子字符串“financial”变为“ficial”。当且仅当“nan”完全出现时,而不是作为子字符串的一部分(如财务)时,如何删除它?尝试使用单词boundary
\b
,使其仅匹配boundry之前或之后的nan
df['word'].str.replace(r'\bnan\b','',regex=True)
输出:
0 Hello?
1 Control.
2 that far.
3 Just in the last 20 years since your father di...
4 your father made all the financial ...