Python 使用正则表达式更改数字

Python 使用正则表达式更改数字,python,regex,pandas,text,replace,Python,Regex,Pandas,Text,Replace,背景 我有以下几点 import pandas as pd df = pd.DataFrame({'Text' : ['But the here is \nBase ID: 666666 \nDate is Here 123456 ', '999998 For \nBase ID: 123456 \nDate there', 'So so

背景

我有以下几点

import pandas as pd
df = pd.DataFrame({'Text' : ['But the here is \nBase ID: 666666    \nDate is Here 123456 ', 
                                   '999998 For \nBase ID: 123456    \nDate  there', 
                                   'So so \nBase ID: 939393    \nDate hey the 123455 ',],
                      'ID': [1,2,3],
                       'P_ID': ['A','B','C'],

                     })
输出

    ID  P_ID    Text
0   1   A   But the here is \nBase ID: 666666 \nDate is Here 123456
1   2   B   999998 For \nBase ID: 123456 \nDate there
2   3   C   So so \nBase ID: 939393 \nDate hey the 123455
  ID P_ID Text New_Text
0               But the here is \nBase ID:**BLOCK** \nDate is Here 123456
1               999998 For \nBase ID:**BLOCK** \nDate there
2               So so \nBase ID:**BLOCK** \nDate hey the 123455
尝试过

我尝试了以下方法来
**BLOCK**
数据库ID:和
\n数据库ID之间的6位数字

df['New_Text'] = df['Text'].str.replace('ID:(.+?)','ID:**BLOCK**')
我得到如下结果

  ID P_ID Text New_Text
0               But the here is \nBase ID:**BLOCK**666666 \nDate is Here 123456
1               999998 For \nBase ID:**BLOCK**123456 \nDate there
2               So so \nBase ID:**BLOCK**939393 \nDate hey the 123455
但是我没有得到我想要的

所需输出

    ID  P_ID    Text
0   1   A   But the here is \nBase ID: 666666 \nDate is Here 123456
1   2   B   999998 For \nBase ID: 123456 \nDate there
2   3   C   So so \nBase ID: 939393 \nDate hey the 123455
  ID P_ID Text New_Text
0               But the here is \nBase ID:**BLOCK** \nDate is Here 123456
1               999998 For \nBase ID:**BLOCK** \nDate there
2               So so \nBase ID:**BLOCK** \nDate hey the 123455
问题

如何调整
str.replace('ID:(.+?)','ID:*BLOCK**')
部分代码以获得所需的输出

df['New_Text'] = df['Text'].str.replace(r'ID: *\d+ *', 'ID:**BLOCK** ')

有关所用正则表达式模式的详细分解信息,请参阅。

尝试
df['New\u Text']=df['Text'].str.replace('ID:(.+?)\n','ID:*BLOCK**\n')


regexp匹配尽可能短的字符串,在您的示例“”中,您可以尝试使用下面的一段代码来获得所需的输出

df['New_Text'] = df['Text'].str.replace('ID:\s+[0-9]+','ID:**BLOCK**')
输出:

0    But the here is \nCase ID:**BLOCK**    \nDate is Here 123456 
1    999998 For \nCase ID:**BLOCK**    \nDate  there              
2    So so \nCase ID:**BLOCK**    \nDate hey the 123455           
正则表达式细分:

'\s+'-表示空格

'[0-9]+'-要指定一个数字

请尝试
ID:\s*(\s+)