什么Python正则表达式将捕获文本中重叠的两个单词序列(包括收缩)?

什么Python正则表达式将捕获文本中重叠的两个单词序列(包括收缩)?,python,regex,python-3.x,Python,Regex,Python 3.x,需要对模式进行哪些调整才能获得所需的输出 from re import findall s= '''one can't two won't three''' pat = r'(?=(\b\w+[\w\'\-’]*\b \b\w+[\w\'\-’]*\b))' s2 = findall(pat, s) print(s2) # actual output # ["one can't", "can't two", 't two', "two won't", "won't three", 't

需要对模式进行哪些调整才能获得所需的输出

from re import findall

s= '''one can't two won't three'''

pat = r'(?=(\b\w+[\w\'\-’]*\b \b\w+[\w\'\-’]*\b))'

s2 = findall(pat, s)
print(s2)

# actual output
# ["one can't", "can't two", 't two', "two won't", "won't three", 't three']

# desired output
# ["one can't", "can't two", "two won't", "won't three"]

由于问题是单词边界
\b
在撇号之后匹配,因此简单的修复方法是使用lookback断言匹配之前没有撇号

回头看:

(?<!\')
(?
完整的正则表达式:

(?<!\')(?=(\b\w+[\w\'\-’]*\b \b\w+[\w\'\-’]*\b))
(?
请参见。

这个怎么样

(?:^|\s+)(?=(\S+\s+\S+))

这是有效的。我想除了一个词外,可以删除所有的单词边界标记:PAT= R’(…@ NicholasNickleby),你可能想要保留一个结尾,但是,中间的两个可以去掉。