什么Python正则表达式将捕获文本中重叠的两个单词序列(包括收缩)?
需要对模式进行哪些调整才能获得所需的输出什么Python正则表达式将捕获文本中重叠的两个单词序列(包括收缩)?,python,regex,python-3.x,Python,Regex,Python 3.x,需要对模式进行哪些调整才能获得所需的输出 from re import findall s= '''one can't two won't three''' pat = r'(?=(\b\w+[\w\'\-’]*\b \b\w+[\w\'\-’]*\b))' s2 = findall(pat, s) print(s2) # actual output # ["one can't", "can't two", 't two', "two won't", "won't three", 't
from re import findall
s= '''one can't two won't three'''
pat = r'(?=(\b\w+[\w\'\-’]*\b \b\w+[\w\'\-’]*\b))'
s2 = findall(pat, s)
print(s2)
# actual output
# ["one can't", "can't two", 't two', "two won't", "won't three", 't three']
# desired output
# ["one can't", "can't two", "two won't", "won't three"]
由于问题是单词边界
\b
在撇号之后匹配,因此简单的修复方法是使用lookback断言匹配之前没有撇号
回头看:
(?<!\')
(?
完整的正则表达式:
(?<!\')(?=(\b\w+[\w\'\-’]*\b \b\w+[\w\'\-’]*\b))
(?
请参见。这个怎么样
(?:^|\s+)(?=(\S+\s+\S+))
这是有效的。我想除了一个词外,可以删除所有的单词边界标记:PAT= R’(…@ NicholasNickleby),你可能想要保留一个结尾,但是,中间的两个可以去掉。