Python 为什么这个正则表达式拆分返回的组件比预期的多？_Python_Regex

Python 为什么这个正则表达式拆分返回的组件比预期的多？

python regex

Python 为什么这个正则表达式拆分返回的组件比预期的多？,python,regex,Python,Regex,因此，下面的正则表达式（用python3编写）只是将添加到更大正则表达式中的一部分，用于将url拆分为模式、域和路径。这部分是提取路径 link = "http://google.com/whatever/who/jx.html" components = re.split(r'(?<![:/])(/.*$)', link) link=”http://google.com/whatever/who/jx.html" components=re.split（r’（？因此，字符串被拆分为匹配

因此，下面的正则表达式（用python3编写）只是将添加到更大正则表达式中的一部分，用于将url拆分为模式、域和路径。这部分是提取路径

link = "http://google.com/whatever/who/jx.html"
components = re.split(r'(?<![:/])(/.*$)', link)

link=”http://google.com/whatever/who/jx.html"
components=re.split（r’（？因此，字符串被拆分为匹配前的内容、匹配本身和匹配后的内容。您可以得到这些元素（匹配用方括号表示）：
因此，最终生成的数组：
['http://google.com', '/whatever/who/jx.html', '']

指定人：

它认为最好在这里使用稍微不同的模式：
>>> import re
>>> link = "http://google.com/whatever/who/jx.html"
>>> re.match("(https?://.+?)(/.*$)", link).groups()
('http://google.com', '/whatever/who/jx.html')
>>>

下面是上面使用的正则表达式模式匹配的细分：
(        # The start of the first capture group
http     # http
s?       # An optional s
://      # ://
.+?      # One or more characters matched non-greedily
)        # The close of the first capture group
(        # The start of the second capture group
/        # /
.*       # Zero or more characters
$        # The end of the string
)        # The close of the second capture group

这不是对您问题的直接回答，但不要使用正则表达式解析URL。请使用urllib.parse
。我仍然不太理解这种行为，从我所看到的情况来看，在初始拆分后没有模式/.*$
。组（/.*$）
匹配斜杠（字面意思），然后是任何数量的东西，然后是一个线尾锚。由于正则表达式确保匹配锚定到线尾，所以总是在接近线尾处找到匹配。@CommuSoft Aha明白了，您应该在问题结束时为我发布答案。非常感谢。应该包括https
以及我们可以使用^
alo吗如果只拆分url字符串，则使用“$
”。@Braj-您可以，但这是不必要的。在Python中，re.match默认在字符串开头匹配。
(        # The start of the first capture group
http     # http
s?       # An optional s
://      # ://
.+?      # One or more characters matched non-greedily
)        # The close of the first capture group
(        # The start of the second capture group
/        # /
.*       # Zero or more characters
$        # The end of the string
)        # The close of the second capture group