如何在Python中使用re查找包含一个单词和另一个单词的URL？_Python_Regex

如何在Python中使用re查找包含一个单词和另一个单词的URL？

python regex

如何在Python中使用re查找包含一个单词和另一个单词的URL？,python,regex,Python,Regex,假设我在一个html文件中有两种类型的链接。我想过滤掉所有类型为1的链接。在Python中如何使用re模块类型1： http://www.domain.com/firstlevel/02-02-13/secondlevel-slug.html 第2类： http://www.domain.com/levelone/02-02-13/secondlevel-slug.html 我想找到同时包含firstlevel和secondlevel的所有链接我就是这样尝试的： import re te

假设我在一个html文件中有两种类型的链接。我想过滤掉所有类型为1的链接。在Python中如何使用

re

模块

类型1：

http://www.domain.com/firstlevel/02-02-13/secondlevel-slug.html

第2类：

http://www.domain.com/levelone/02-02-13/secondlevel-slug.html

我想找到同时包含

firstlevel

和

secondlevel

的所有链接

我就是这样尝试的：

import re
text = "here goes the code with various links of type 1 and type 2…"
findURL = re.findall('.*firstlevel.*secondlevel.*',text)

以下是我认为正则表达式的意思：

find all strings that has ONE OR MORE occurances of ANY CHARACTER 
followed by the word firstlevel 
followed by ONE OR MORE occurances of ANY CHARACTER
followed by the word secondlevel 
followed by ONE OR MORE occurances of ANY CHARACTER

然而，结果我得到了一个空列表

我做错了什么？

您必须确定链接的开始和结束。即

findURL = re.findall('http:.*firstlevel.*secondlevel.*\.html', text)

HTH.

您可以通过字符串比较来实现这一点：

文本中的“firstlevel”和文本中的“secondlevel”

。

>>> import re
>>> p=re.compile('(http://\S+firstlevel\S+secondlevel\S+\.html)')
>>> text = 'random text http://www.domain.com/firstlevel/02-02-13/secondlevel-slug.html more random text http://www.domain.com/levelone/02-02-13/secondlevel-slug.html'
>>> i = p.finditer(text)
>>> for m in i:
...    print(m.group()
...
http://www.domain.com/firstlevel/02-02-13/secondlevel-slug.html
>>>