Python正则表达式匹配行是否以结束？_Python_Regex

Python正则表达式匹配行是否以结束？

python regex

Python正则表达式匹配行是否以结束？,python,regex,Python,Regex,这正是我想要解决的问题： <p>Some.Title.html<br /> <a href="https://www.somelink.com/yep.html" rel="nofollow">https://www.somelink.com/yep.html</a><br /> Some.Title.txt<br /> <a href="https://www.somelink.com/yeppers

这正是我想要解决的问题：

        <p>Some.Title.html<br />
<a href="https://www.somelink.com/yep.html" rel="nofollow">https://www.somelink.com/yep.html</a><br />
Some.Title.txt<br />
<a href="https://www.somelink.com/yeppers.txt" rel="nofollow">https://www.somelink.com/yeppers.txt</a><br />

Some.Title.html



Some.Title.txt

我尝试了以下几种变体：

match = re.compile('^(.+?)<br \/><a href="https://www.somelink.com(.+?)">',re.DOTALL).findall(html)

match=re.compile（'^（+？）

使用Beautiful soup and requests模块将非常适合这样做，而不是像上面的评论员所说的那样使用正则表达式

import requests
import bs4

html_site = 'www.google.com' #or whatever site you need scraped
site_data = requests.get(html_site) # downloads site into a requests object
site_parsed = bs4.BeautifulSoup(site_data.text) #converts site text into bs4 object
a_tags = site_parsed.select('a') #this will select all 'a' tags and return list of them

这只是一个简单的代码，它将选择html站点中的所有标记，并将它们存储在一个列表中，格式如上图所示。我建议您查看有关bs4和实际文档的漂亮教程。

提示：不要使用

regex

解析html，使用为此而构建的东西，如BeautifulSoup。我不知道如何解析html使用漂亮的汤。我很少会遇到这样的情况。感谢你的建议，我真的应该为这些愚蠢的时刻学习。只是如果你真的需要深入研究html解析，建议你使用专门为此编写的东西，因为

regex

无法处理嵌套模式。你想要什么tput？我很少接触到这样的东西=>难得的学习机会！Beautiful Soup是一个更好的解决方案，适合这个用例，因为@ViníciusAguiar提到这可能会有帮助：）正如上面提到的其他评论，请尝试Beautiful Soup。

import requests
import bs4

html_site = 'www.google.com' #or whatever site you need scraped
site_data = requests.get(html_site) # downloads site into a requests object
site_parsed = bs4.BeautifulSoup(site_data.text) #converts site text into bs4 object
a_tags = site_parsed.select('a') #this will select all 'a' tags and return list of them