如何在Python中通过文本获取href链接
这是web html内容的一部分:如何在Python中通过文本获取href链接,python,Python,这是web html内容的一部分: <a href="https://www.cnbeta.com/articles/science/1062069.htm"><strong>阅读全文</strong></a> 您可能想试试beautifulsou 例如: from bs4 import BeautifulSoup sample_html = """ <a href="https
<a href="https://www.cnbeta.com/articles/science/1062069.htm"><strong>阅读全文</strong></a>
您可能想试试
beautifulsou
例如:
from bs4 import BeautifulSoup
sample_html = """
<a href="https://www.cnbeta.com/articles/science/1062069.htm"><strong>阅读全文</strong></a>
<a href="https://www.cnbeta.com/articles/science/1062068.htm"><strong>RANDOM TEXT!</strong></a>
"""
soup = BeautifulSoup(sample_html, "html.parser").find_all(lambda t: t.name == "a" and t.text.startswith("阅"))
print([a["href"] for a in soup])
你有密码吗?如何解析HTML?这是否回答了您的问题?不,它可能会找到所有带有a的标签,我只想找到one@TomerikooThe最简单的方法是循环所有
标记,一旦找到包含此文本的标记,就停止。
from bs4 import BeautifulSoup
sample_html = """
<a href="https://www.cnbeta.com/articles/science/1062069.htm"><strong>阅读全文</strong></a>
<a href="https://www.cnbeta.com/articles/science/1062068.htm"><strong>RANDOM TEXT!</strong></a>
"""
soup = BeautifulSoup(sample_html, "html.parser").find_all(lambda t: t.name == "a" and t.text.startswith("阅"))
print([a["href"] for a in soup])
['https://www.cnbeta.com/articles/science/1062069.htm']