Regex 正则表达式：在前瞻之前捕获第一个事件_Regex

Regex 正则表达式：在前瞻之前捕获第一个事件

regex

Regex 正则表达式：在前瞻之前捕获第一个事件,regex,Regex,我试图捕获特定单词前的URL。唯一的问题是这个词也可能是这个领域的一部分示例：（我试图在晚餐前捕捉所有内容） https://breakfast.example.com/lunch/dinner/ https://breakfast.example.brunch.com:8080/lunch/dinner http://dinnerdemo.example.com/dinner/ 我能够使用： ^(.*://.*/)(?=dinner/?) ^（.*://.*/）（？=晚餐/？）我遇到的问题

我试图捕获特定单词前的URL。唯一的问题是这个词也可能是这个领域的一部分

示例：（我试图在晚餐前捕捉所有内容）

https://breakfast.example.com/lunch/dinner/ https://breakfast.example.brunch.com:8080/lunch/dinner http://dinnerdemo.example.com/dinner/ 我能够使用：

^(.*://.*/)(?=dinner/?) ^（.*://.*/）（？=晚餐/？）我遇到的问题是前瞻性不够懒惰因此，以下是失败的：

https://breakfast.example.com/lunch/dinner/login.html?returnURL=https://breakfast.example.com/lunch/dinner/ https://breakfast.example.com/lunch/dinner/login.html?returnURL=https://breakfast.example.com/lunch/dinner/ 正如它所捕获的：

https://breakfast.example.com/lunch/dinner/login.html?returnURL=https://breakfast.example.com/lunch/ https://breakfast.example.com/lunch/dinner/login.html?returnURL=https://breakfast.example.com/lunch/ 我俩都不明白为什么以及如何修复我的正则表达式。

也许我走错了方向，但我怎样才能抓住我所有的例子呢？

你可以利用一些惰性：

^(.*?:\/\/).*?/(?=dinner/?)

在ReGeX的中间使用<代码> *>代码>你把所有的东西都吃到最后一个冒号，在那里找到匹配。< /P>

顺便说一下，

<代码> *>代码>是一个非常糟糕的练习。在长字符串中，它会导致可怕的回溯性能下降<代码>*？更好，因为它是不情愿的，而不是贪婪的。

前瞻不一定是懒惰的，前瞻只是一个检查，在您的情况下是一个准固定字符串

很明显，要使其变为懒惰，需要先创建子模式，然后再进行前瞻

^https?:\/\/(?:[^\/]+\/)*?(?=dinner(?:\/|$))

注意：

（？：/|$）

就像一个边界，确保单词“晚餐”后面有一个斜杠或字符串的结尾。

您的主要缺陷是使用贪婪匹配

而不是非贪婪的

*？

下面使用perl执行您希望的匹配，但是正则表达式可以轻松地应用于任何语言。请注意晚餐周围使用的单词边界，这可能是您想要的，也可能不是您想要的：

使用严格；
使用警告；
而（）{
if（m{^（.*？：/.*？/.*？（=\bdinner\b）}）{
打印$1，“\n”；
}
}
__资料__
https://breakfast.example.com/lunch/dinner/
https://breakfast.example.brunch.com:8080/lunch/dinner
http://dinnerdemo.example.com/dinner/

产出：

https://breakfast.example.com/lunch/
https://breakfast.example.brunch.com:8080/lunch/
http://dinnerdemo.example.com/

还有另一种方式

 # Multi-line optional
 # ^(?:(?!://).)*://[^?/\r\n]+/(?:(?!dinner)[^?/\r\n]+/)*(?=dinner)


 ^                    # BOL
 (?:
      (?! :// )
      . 
 )*
 ://
 [^?/\r\n]+           # Domain
 /     
 (?:
      (?! dinner )    # Dirs ?
      [^?/\r\n]+ 
      /          
 )*
 (?= dinner )

https://breakfast.example.com/lunch/

晚餐/

https://breakfast.example.brunch.com:8080/lunch/

晚餐

http://dinnerdemo.example.com/

晚餐/

https://breakfast.example.com/lunch/

dinner/login.html？returnURL=

使用python 3.7

import re
s = '''
https://breakfast.example.com/lunch/dinner/
https://breakfast.example.brunch.com:8080/lunch/dinner
http://dinnerdemo.example.com/dinner/
'''
pat = re.compile(r'.*(?=dinner)', re.M)
mo = re.findall(pat, s)
for line in mo:
    print(line, end=' ')

打印输出：

https://breakfast.example.com/lunch/
https://breakfast.example.brunch.com:8080/lunch/
http://dinnerdemo.example.com/

你用什么语言？

https://breakfast.example.com/lunch/
https://breakfast.example.brunch.com:8080/lunch/
http://dinnerdemo.example.com/