Python 使用re.findall()提取url的完美正则表达式

Python 使用re.findall()提取url的完美正则表达式,python,regex,python-3.x,Python,Regex,Python 3.x,我在谷歌上搜索正则表达式来提取url,但在一个示例中它们不起作用,或者python解释器只是挂起 该url为“正则表达式,用于python中带有re.findall的url: http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+ 如果您需要捕获组: (http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-

我在谷歌上搜索正则表达式来提取url,但在一个示例中它们不起作用,或者python解释器只是挂起


该url为“

正则表达式,用于python中带有re.findall的url:

http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+
如果您需要捕获组:

(http[s]?:\/\/(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+)


http matches the characters http literally (case sensitive)
[s]? match a single character present in the list
Quantifier: ? Between zero and one time, as many times as possible, giving back as needed
s the literal character s (case sensitive)
: matches the character : literally
\/ matches the character / literally
\/ matches the character / literally
Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
[a-zA-Z] match a single character present in the list below
a-z a single character in the range between a and z (case sensitive)
A-Z a single character in the range between A and Z (case sensitive)
2nd Alternative: [0-9]
[0-9] match a single character present in the list below
0-9 a single character in the range between 0 and 9
3rd Alternative: [$-_@.&+]
[$-_@.&+] match a single character present in the list below
$-_ a single character in the range between $ and _
@.&+ a single character in the list @.&+ literally (case sensitive)
4th Alternative: [!*\(\),]
[!*\(\),] match a single character present in the list below
!* a single character in the list !* literally
\( matches the character ( literally
\) matches the character ) literally
, the literal character ,
5th Alternative: (?:%[0-9a-fA-F][0-9a-fA-F])
(?:%[0-9a-fA-F][0-9a-fA-F]) Non-capturing group
% matches the character % literally
[0-9a-fA-F] match a single character present in the list below
0-9 a single character in the range between 0 and 9
a-f a single character in the range between a and f (case sensitive)
A-F a single character in the range between A and F (case sensitive)
[0-9a-fA-F] match a single character present in the list below
0-9 a single character in the range between 0 and 9
a-f a single character in the range between a and f (case sensitive)
A-F a single character in the range between A and F (case sensitive)

@yole
((https?:\/\/)([\da-z\.-]+)\([a-z\.]{2,6})([\/\w\.-]*)*\/))
,这里的一些正则表达式OP:请确切地告诉我们当您尝试匹配该正则表达式时会发生什么。“不要工作”没什么好谈的。这个怎么样<以下代码:::::(以下以下以下以下::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::((((:::::::::::::::::::::::::::::::::::::::::::((((((((()))))以下以下以下以下以下以下以下以下以下:::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::(二){二}{二}{二}{二}{二}{二}{二{四}}二{四}{五}{二}{}{二}{}{二}{四}}}{四}}{四}}}{五}}}{二}{}}{}}}}}{二}}{}}}{}}}}}{}}}}}}}}}}{}}}}}}}{*[a-z-z-z-z-x{00a1}{{00a1}{{00a1}{00a1}{{00AZ-z{{00a1}{00a1}{{00a1}{{00a1}{00a1}{0.0-1}0-0-10-0-9.[10.[a-z-z-z-x{{{00a1}}{0-0-0-0-0-0-0-0-0-9}}{0-0-0-0-0-0-0-10}{0-0-0-10}{0-0-0-0-0-0-0-10}}}}{0-0-0-0-0-10}{0-0-0-0-0-1}{0-0-0-0-0-0-0-0-10}}{0-1}}}}代码>语法错误:(unicode错误)“UnicodeScape”编解码器无法解码376-377位置的字节:截断\xXX escape谢谢您的扩展,但现在我发现了一个可能的错误。您不是想在正则表达式中选择从
$
的范围,是吗?大概您只是想匹配
$
-
中的任何一个幸运的是,通过匹配
%
符号,这完全打破了百分比编码逻辑。