Python 正则表达式匹配"；字；包含两条连续的数字和字母条纹，反之亦然，并将其拆分_Python_Regex_List

Python 正则表达式匹配"；字；包含两条连续的数字和字母条纹，反之亦然，并将其拆分

python regex list

Python 正则表达式匹配"；字；包含两条连续的数字和字母条纹，反之亦然，并将其拆分,python,regex,list,Python,Regex,List,我有以下一行文字，如下所示： text= 'Cms12345678 Gleandaleacademy Fee Collection 00001234Abcd Renewal 123Acgf456789' 我正在尝试拆分数字后接字符或字符后接数字，仅获得以下输出： output_text = 'Cms 12345678 Gleandaleacademy Fee Collection 00001234 Abcd Renewal 123Acgf456789 output_text = 'cms 1

我有以下一行文字，如下所示：

text= 'Cms12345678 Gleandaleacademy Fee Collection 00001234Abcd Renewal 123Acgf456789'

我正在尝试拆分

数字后接字符

或

字符后接数字

，仅获得以下输出：

output_text = 'Cms 12345678 Gleandaleacademy Fee Collection 00001234 Abcd Renewal 123Acgf456789

output_text = 'cms 12345678 gleandaleacademy fee collection 00001234 abcd renewal 123 acgf 456789 '

我尝试了以下方法：

import re
text = 'Cms12345678 Gleandaleacademy Fee Collection 00001234Abcd Renewal 123Acgf456789'
text = text.lower().strip()
text = text.split(' ')
output_text =[]
for i in text:
    if bool(re.match(r'[a-z]+\d+|\d+\w+',i, re.IGNORECASE))==True:
        out_split = re.split('(\d+)',i)
        for j in out_split:
            output_text.append(j)
    else:
        output_text.append(i)
output_text = ' '.join(output_text)

其输出为：

output_text = 'Cms 12345678 Gleandaleacademy Fee Collection 00001234 Abcd Renewal 123Acgf456789

output_text = 'cms 12345678 gleandaleacademy fee collection 00001234 abcd renewal 123 acgf 456789 '

由于

re.match

中的正则表达式不正确，此代码也正在显示文本的最后一个元素

123acgf456789

请帮助我获得正确的输出。

您可以使用

re.sub（r'\b（？：（[a-zA-Z]+）（\d+）（\d+）（[a-zA-Z]+）\b'，r'\1\3\2\4'，文本）

见

详细信息

```
\b
```
-单词边界
```
（？：
```
-非捕获组的开始（将单词边界应用于所有备选词所必需的）：
- ```
（[a-zA-Z]+）（\d+）
```
  -第1组：一个或多个字母，第2组：一个或多个数字
- ```
|
```
  -或
- ```
（\d+）（[a-zA-Z]+）
```
  -第3组：一个或多个数字，第4组：一个或多个字母
```
）
```
-组结束
```
\b
```
-单词边界

在替换过程中，

\1

和

\2

或

\3

和

\4

被初始化，因此将它们连接为

\1\3

和

\2\4

会产生正确的结果

见a：

重新导入
text=“Cms1291682971 Glendaleacademy费用收取0000548和B续费402Ecfev845410001”
打印（re.sub（r'\b（？：（[a-zA-Z]+）（\d+）|（\d+）（[a-zA-Z]+）\b'，r'\1\3\2\4'，文本））
#=>Cms 1291682971 Glendaleacademy费用收取0000548和B续费402Ecfev845410001

re.sub（r'）（？@WiktorStribiżew，但这在402ecfev845410001
术语中给出了错误的输出空格。我的预期输出是Cms 1291682971 gleandaleacade我的费用收集0000548和b续费402ecfev845410001
，以防万一，请尝试re.sub（r'\b（[^\W\d'+）\d++）\d+（（？（2）\d+\d+）+）\b'或re.sub（r'\b（？（[a-zA-Z]+）（\d+）（\d+）（[a-zA-Z]+）\b'，r'\1\3\2\4'，text）
@WiktorStribiż第二个正则表达式为我工作了，谢谢。但是r'\1\3\2\4'
的意思是什么。\1
等等。