Python、github搜索正则表达式_Python_Regex

Python、github搜索正则表达式

python regex

Python、github搜索正则表达式,python,regex,Python,Regex,我用Python编写了一个搜索github网页的常规表达式： github = re.findall( "https?:\/\/(?:www\.)?github\.com\/[A-Za-z0-9_-]+\/?", text) 但现在它搜索以https开头的链接。如何修改它，以便正则表达式搜索以https或仅以www开头的字符串现在，我的正则表达式将发现： https://github.com/helloman 除此之外： https://www.github.com/hellom

我用Python编写了一个搜索github网页的常规表达式：

github = re.findall(
    "https?:\/\/(?:www\.)?github\.com\/[A-Za-z0-9_-]+\/?", 
text)

但现在它搜索以https开头的链接。如何修改它，以便正则表达式搜索以https或仅以www开头的字符串

现在，我的正则表达式将发现：

https://github.com/helloman

除此之外：

https://www.github.com/helloman

但不是这个：

www.github.com/helloman

如何将其更改为接受所有三个选项？

这将完成以下工作：

(?:https?://)?(?:www[.])?github[.]com/[\w-]+/?

这是一个概念证明：

Python 3.7.5 (default, Oct 17 2019, 12:16:48) 
[GCC 9.2.1 20190827 (Red Hat 9.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> github=re.compile('(?:https?://)?(?:www[.])?github[.]com/[\w-]+/?')
>>> github.findall('www.github.com/accdias/dotfiles.git')
['www.github.com/accdias/']
>>> github.findall('github.com/accdias/dotfiles.git')
['github.com/accdias/']
>>> github.findall('https://github.com/accdias/dotfiles.git')
['https://github.com/accdias/']
>>> github.findall('http://github.com/accdias/dotfiles.git')
['http://github.com/accdias/']
>>> github.findall('http://www.github.com/accdias/dotfiles.git')
['http://www.github.com/accdias/']
>>> github.findall('https://www.github.com/accdias/dotfiles.git')
['https://www.github.com/accdias/']
>>>

我希望它能有所帮助。

您只缺少几个括号

附言

它现在也将匹配github.com/xxx。我不确定那是你想要的

这个问题我不清楚。你能发布一些示例URL吗？已编辑，希望现在更好我用所有树示例测试了你的正则表达式，它已经做了你想要的。我看不出有什么问题。你能澄清一下吗？我很确定它对

www.github.com/XXX

这样的地址不起作用，所以，你想找到以

www.

或

https？：/（？：www\）？

开头的URL。您可以使用或语法执行此操作：

（thing）|（另一件事）

。或者使用收集所有URL，然后使用URL解析器（我认为URL解析器是由

urllib

提供的）检查域

github.findall（'www.github.com/accdias/dotfiles.git'）=[]

，但OP需要一个接受此URL的正则表达式哦！现在我明白了。谢谢你的澄清。我的印象是OP想要排除那些没有协议的人。IMHO

//

比

/{2}

更清晰，你错过了连字符，OP说

[A-Za-z0-9_-]+

就是

[\w-]+

，而不是

\w+

一个。@Toto，确实如此。我会更新答案。谢谢你提出来。现在我从我使用的文本中得到了

[（''，www.'）]

(https:\/\/)?(www\.)?github\.com\/[A-Za-z0-9_-]+\/?