Python 使用正则表达式更改数字_Python_Regex_Pandas_Text_Replace

Python 使用正则表达式更改数字

python regex pandas text replace

Python 使用正则表达式更改数字,python,regex,pandas,text,replace,Python,Regex,Pandas,Text,Replace,背景我有以下几点 import pandas as pd df = pd.DataFrame({'Text' : ['But the here is \nBase ID: 666666 \nDate is Here 123456 ', '999998 For \nBase ID: 123456 \nDate there', 'So so

背景

我有以下几点

import pandas as pd
df = pd.DataFrame({'Text' : ['But the here is \nBase ID: 666666    \nDate is Here 123456 ', 
                                   '999998 For \nBase ID: 123456    \nDate  there', 
                                   'So so \nBase ID: 939393    \nDate hey the 123455 ',],
                      'ID': [1,2,3],
                       'P_ID': ['A','B','C'],

                     })

输出

    ID  P_ID    Text
0   1   A   But the here is \nBase ID: 666666 \nDate is Here 123456
1   2   B   999998 For \nBase ID: 123456 \nDate there
2   3   C   So so \nBase ID: 939393 \nDate hey the 123455

  ID P_ID Text New_Text
0               But the here is \nBase ID:**BLOCK** \nDate is Here 123456
1               999998 For \nBase ID:**BLOCK** \nDate there
2               So so \nBase ID:**BLOCK** \nDate hey the 123455

尝试过

我尝试了以下方法来

**BLOCK**

数据库ID:和

\n数据库ID之间的6位数字
df['New_Text'] = df['Text'].str.replace('ID:(.+?)','ID:**BLOCK**')

我得到如下结果
  ID P_ID Text New_Text
0               But the here is \nBase ID:**BLOCK**666666 \nDate is Here 123456
1               999998 For \nBase ID:**BLOCK**123456 \nDate there
2               So so \nBase ID:**BLOCK**939393 \nDate hey the 123455

但是我没有得到我想要的
所需输出
    ID  P_ID    Text
0   1   A   But the here is \nBase ID: 666666 \nDate is Here 123456
1   2   B   999998 For \nBase ID: 123456 \nDate there
2   3   C   So so \nBase ID: 939393 \nDate hey the 123455

  ID P_ID Text New_Text
0               But the here is \nBase ID:**BLOCK** \nDate is Here 123456
1               999998 For \nBase ID:**BLOCK** \nDate there
2               So so \nBase ID:**BLOCK** \nDate hey the 123455

问题
如何调整str.replace（'ID:（.+？）'，'ID:*BLOCK**'）
部分代码以获得所需的输出
df['New_Text'] = df['Text'].str.replace(r'ID: *\d+ *', 'ID:**BLOCK** ')

有关所用正则表达式模式的详细分解信息，请参阅。
尝试df['New\u Text']=df['Text'].str.replace（'ID:（.+？）\n'，'ID:*BLOCK**\n'）

regexp匹配尽可能短的字符串，在您的示例“”中，您可以尝试使用下面的一段代码来获得所需的输出
df['New_Text'] = df['Text'].str.replace('ID:\s+[0-9]+','ID:**BLOCK**')

输出：
0    But the here is \nCase ID:**BLOCK**    \nDate is Here 123456 
1    999998 For \nCase ID:**BLOCK**    \nDate  there              
2    So so \nCase ID:**BLOCK**    \nDate hey the 123455           

正则表达式细分：
'\s+'-表示空格

'[0-9]+'-要指定一个数字
请尝试ID:\s*（\s+）