Python 如何使用正则表达式捕获重复字符集？_Python_Regex

Python 如何使用正则表达式捕获重复字符集？

python regex

Python 如何使用正则表达式捕获重复字符集？,python,regex,Python,Regex,在上面的代码中，我试图捕获重复字符组例如，我需要这样的答案： 111 222 等等但是当我运行上面的代码时，我得到了这个错误： import re line = "..12345678910111213141516171820212223" regex = re.compile(r'((?:[a-zA-Z0-9])\1+)') print ("not coming here") matches = re.findall(regex,line) print (matches) 回溯（最近一次

在上面的代码中，我试图捕获重复字符组

例如，我需要这样的答案： 111 222 等等

但是当我运行上面的代码时，我得到了这个错误：

import re
line = "..12345678910111213141516171820212223"
regex = re.compile(r'((?:[a-zA-Z0-9])\1+)')
print ("not coming here")
matches = re.findall(regex,line)
print (matches)

回溯（最近一次呼叫最后一次）：
文件“First.py”，第3行，在
正则表达式=重新编译（r'（（？：[a-zA-Z0-9]）\1+）
文件“C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\re.py”，lin
e 224，在编译中
返回编译（模式、标志）
文件“C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\re.py”，lin
e 293，in_编译
p=sre_compile.compile（模式、标志）
文件“C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre\u compile
.py”，第536行，编译
p=sre_parse.parse（p，标志）
文件“C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre\u parse.p
y“，第829行，在语法分析中
p=_parse_sub（源，模式，0）
文件“C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre\u parse.p
y“，第437行，在
itemsappend（_解析（源、状态））
文件“C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre\u parse.p
y“，第778行，在
p=_parse_sub（源、状态）
文件“C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre\u parse.p
y“，第437行，在
itemsappend（_解析（源、状态））
文件“C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre\u parse.p
y“，第524行，在
代码=\转义（源、此、状态）
文件“C:\Users\bhatsubh\AppData\Local\Programs\Python\Python35\lib\sre\u parse.p
y“，第415行，在逃逸中
莱恩（逃跑））
sre_constants.error:无法引用位置16处的开放组

有人请告诉我哪里出了问题。

你（可能）想要的

看。

在Python中：

([a-zA-Z0-9])\1+

在另一个组中找不到组引用。如果你只想把那些重复的字符打印出来，你可以使用一个小技巧，使用

re.sub

：

import re
line = "..12345678910111213141516171820212223"
regex = re.compile(r'([a-zA-Z0-9])\1+')

matches = [match.group(0) for match in regex.finditer(line)]
print (matches)
# ['111', '222']

使用

.findall

可以做到这一点，但使用

.finditer

更简单，如Jan的回答所示

def foo(m):
     print(m.group(0))
     return ''

_ = re.sub(r'(\w)\1+', foo, line) # use [a-zA-Z0-9] if you don't want to match underscores
111
222

输出

import re

line = "..12345678910111213141516171820212223"
regex = re.compile(r'(([a-zA-Z0-9])\2+)')

matches = [t[0] for t in regex.findall(line)]
print(matches)

我们使用

\2

，因为

\1

指的是外括号中的模式，

\2

指的是内括号中的模式。

是的，但这不适用于findall-它只打印重复组中的第一个字符。@cᴏʟᴅsᴘᴇᴇᴅ: 没错，这就是为什么我也添加了一些代码（使用列表理解），也不错：）@Jan谢谢！不过你的不太老套，我更喜欢它。我想，OP是决定的人选。

[\w\d]

\w

。此外，

\w

也与

\u

匹配，因此它与OP使用的不同。另外，由于OP使用Python3.5，

\w

默认匹配所有Unicode字母和数字（因此，

re。如果OP只需要处理ASCII，则可能需要一个

修饰符）。@WiktorStribiżew Yep，我有点意识到这一点，但由于其简洁性，我将其保留了下来。不过，他补充说，这应该是明智之举。

import re

line = "..12345678910111213141516171820212223"
regex = re.compile(r'(([a-zA-Z0-9])\2+)')

matches = [t[0] for t in regex.findall(line)]
print(matches)

['111', '222']