Python 使用正则表达式更改数字
背景 我有以下几点Python 使用正则表达式更改数字,python,regex,pandas,text,replace,Python,Regex,Pandas,Text,Replace,背景 我有以下几点 import pandas as pd df = pd.DataFrame({'Text' : ['But the here is \nBase ID: 666666 \nDate is Here 123456 ', '999998 For \nBase ID: 123456 \nDate there', 'So so
import pandas as pd
df = pd.DataFrame({'Text' : ['But the here is \nBase ID: 666666 \nDate is Here 123456 ',
'999998 For \nBase ID: 123456 \nDate there',
'So so \nBase ID: 939393 \nDate hey the 123455 ',],
'ID': [1,2,3],
'P_ID': ['A','B','C'],
})
输出
ID P_ID Text
0 1 A But the here is \nBase ID: 666666 \nDate is Here 123456
1 2 B 999998 For \nBase ID: 123456 \nDate there
2 3 C So so \nBase ID: 939393 \nDate hey the 123455
ID P_ID Text New_Text
0 But the here is \nBase ID:**BLOCK** \nDate is Here 123456
1 999998 For \nBase ID:**BLOCK** \nDate there
2 So so \nBase ID:**BLOCK** \nDate hey the 123455
尝试过
我尝试了以下方法来**BLOCK**
数据库ID:和\n数据库ID之间的6位数字
df['New_Text'] = df['Text'].str.replace('ID:(.+?)','ID:**BLOCK**')
我得到如下结果
ID P_ID Text New_Text
0 But the here is \nBase ID:**BLOCK**666666 \nDate is Here 123456
1 999998 For \nBase ID:**BLOCK**123456 \nDate there
2 So so \nBase ID:**BLOCK**939393 \nDate hey the 123455
但是我没有得到我想要的
所需输出
ID P_ID Text
0 1 A But the here is \nBase ID: 666666 \nDate is Here 123456
1 2 B 999998 For \nBase ID: 123456 \nDate there
2 3 C So so \nBase ID: 939393 \nDate hey the 123455
ID P_ID Text New_Text
0 But the here is \nBase ID:**BLOCK** \nDate is Here 123456
1 999998 For \nBase ID:**BLOCK** \nDate there
2 So so \nBase ID:**BLOCK** \nDate hey the 123455
问题
如何调整str.replace('ID:(.+?)','ID:*BLOCK**')
部分代码以获得所需的输出
df['New_Text'] = df['Text'].str.replace(r'ID: *\d+ *', 'ID:**BLOCK** ')
有关所用正则表达式模式的详细分解信息,请参阅。尝试df['New\u Text']=df['Text'].str.replace('ID:(.+?)\n','ID:*BLOCK**\n')
regexp匹配尽可能短的字符串,在您的示例“”中,您可以尝试使用下面的一段代码来获得所需的输出
df['New_Text'] = df['Text'].str.replace('ID:\s+[0-9]+','ID:**BLOCK**')
输出:
0 But the here is \nCase ID:**BLOCK** \nDate is Here 123456
1 999998 For \nCase ID:**BLOCK** \nDate there
2 So so \nCase ID:**BLOCK** \nDate hey the 123455
正则表达式细分:
'\s+'-表示空格
'[0-9]+'-要指定一个数字
请尝试ID:\s*(\s+)