Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/regex/16.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Regex 基于名称阻止列中的文本_Regex_Python 3.x_Pandas_Text_Replace - Fatal编程技术网

Regex 基于名称阻止列中的文本

Regex 基于名称阻止列中的文本,regex,python-3.x,pandas,text,replace,Regex,Python 3.x,Pandas,Text,Replace,背景 这个问题是另一个问题 我有以下的df,故意有各种问题 import pandas as pd df = pd.DataFrame({'Text' : ['But now Smith,J J is Here from Smithsville', 'Maryland is HYDER,A MARY Found here ', 'hey here is

背景

这个问题是另一个问题

我有以下的
df
,故意有各种问题

import pandas as pd
df = pd.DataFrame({'Text' : ['But now Smith,J J is Here from Smithsville', 
                                   'Maryland is HYDER,A MARY Found here ', 
                                   'hey here is Annual Doe,Jane Ann until ',
                                'The tuckered was Tucker,Tom is Not here but'], 

                      'P_ID': [1,2,3,4], 
                      'P_Name' : ['SMITH,J J', 'HYDER,A MARY', 'DOE,JANE ANN', 'TUCKER,TOM T'],
                      'N_ID' : ['A1', 'A2', 'A3', 'A4']

                     })
输出

   N_ID P_ID P_Name         Text
0   A1  1   SMITH,J J       But now Smith,J J is Here from Smithsville
1   A2  2   HYDER,A MARY    Maryland is HYDER,A MARY Found here
2   A3  3   DOE,JANE ANN    hey here is Annual Doe,Jane Ann until
3   A4  4   TUCKER,TOM T    The tuckered was Tucker,Tom is Not here but
    N_ID P_ID P_Name Text   New_Text
0                           But now **BLOCK** is Here from Smithsville
1                           Maryland is **BLOCK**  Found here
2                           hey here is Annual **BLOCK**  until
3                           The tuckered was **BLOCK** is Not here but
目标

1) 对于
p_Name
中的名称,例如
SMITH,J
块名,在相应的
Text
列中包含
**块**

2) 创建
新文本

所需输出

   N_ID P_ID P_Name         Text
0   A1  1   SMITH,J J       But now Smith,J J is Here from Smithsville
1   A2  2   HYDER,A MARY    Maryland is HYDER,A MARY Found here
2   A3  3   DOE,JANE ANN    hey here is Annual Doe,Jane Ann until
3   A4  4   TUCKER,TOM T    The tuckered was Tucker,Tom is Not here but
    N_ID P_ID P_Name Text   New_Text
0                           But now **BLOCK** is Here from Smithsville
1                           Maryland is **BLOCK**  Found here
2                           hey here is Annual **BLOCK**  until
3                           The tuckered was **BLOCK** is Not here but
问题

如何实现所需的输出?

这应该可以:

df['New_Text'] = df.apply(lambda x:x['Text'].lower().replace(x['P_Name'].lower(), '**BLOCK**'), axis=1)
您的示例存在一些空白问题,但它应该适用于正确构造的示例

输出(修改空白问题,最后一行没有完全匹配)
如果要删除空格,请使用
replace
函数
regex=True

# new data frame without the whitespace inconsistencies
df = pd.DataFrame({'Text' : ['But now Smith,J J is Here from Smithsville', 
                                   'Maryland is HYDER,A MARY Found here ', 
                                   'hey here is Annual Doe,Jane Ann until ',
                                'The tuckered was Tucker,Tom T is Not here but'], 

                      'P_ID': [1,2,3,4], 
                      'P_Name' : ['SMITH,J J', 'HYDER,A MARY', 'DOE,JANE ANN', 'TUCKER,TOM T'],
                      'N_ID' : ['A1', 'A2', 'A3', 'A4']

                     })

print(df.Text.str.lower().replace(df.P_Name.str.lower(), '**BLOCK**', regex=True))

0    but now **BLOCK** is here from smithsville
1             maryland is **BLOCK** found here 
2           hey here is annual **BLOCK** until 
3    the tuckered was **BLOCK** is not here but
Name: Text, dtype: object

空白区问题是故意的。我的实际数据与上面的数据非常相似,包括空白。上面的代码会彻底改变以解释空白吗?好吧,这不是原始问题的一部分。如果是这种情况,则需要模糊匹配。或者删除所有空白,并进行一些非常有创意的空白插入。但是你的新问题很难回答,所以要有耐心!是的,但我想我在最初的背景陈述中并不清楚。我可以调整上面的问题以消除空白问题。谢谢