Python 美化组查找符合条件的链接_Python_Html_Parsing_Beautifulsoup_Html Parsing

Python 美化组查找符合条件的链接

python html parsing

Python 美化组查找符合条件的链接,python,html,parsing,beautifulsoup,html-parsing,Python,Html,Parsing,Beautifulsoup,Html Parsing,我正在尝试收集通过beautiful soup收集的网页上的所有链接，其中包含/d2l/lp/ouHome/home.d2l？ou= 实际链接如下所示： "http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234567" "http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234561" "http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234564" "http

我正在尝试收集通过beautiful soup收集的网页上的所有链接，其中包含

/d2l/lp/ouHome/home.d2l？ou=

实际链接如下所示：

"http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234567"
"http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234561"
"http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234564"
"http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234562"
"http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234563"

您可以将a作为

href

参数值传递给

find_all（）

：

演示：

>>重新导入
>>>从bs4导入BeautifulSoup
>>> 
>>>data=”“”
... 
...     
...     
...     
...     
...     
... 
... """
>>> 
>>>汤=美汤（数据）
>>>links=soup.find_all（'a'，href=re.compile（r'/d2l/lp/ouHome/home\.d2l\？ou=\d+'））
>>>对于链接中的链接：
...     打印link.text
... 
链接1
链接2
链接3
链接4
链接5

无论出于什么原因，这对我都不起作用。我认为这是因为链接在标记内部，而标记位于子类的底部。@JacksonBlankenship好的，那么，请提供您正在处理的HTML或您正在解析的网页的链接。谢谢。@JacksonBlankenship好的，谢谢。但是，当前代码有什么问题？我已经针对您提供的HTML执行了它，并获得了8个不同的链接，从

ouit-Tech训练营到PSY-1113-001-心理学元素.hmmm。。好吧，我可能做错了什么。谢谢你的帮助@JacksonBlankenship好的，谢谢，如果您需要调试代码的帮助，请告诉我。
soup.find_all('a', href=re.compile(r'/d2l/lp/ouHome/home\.d2l\?ou=\d+'))

>>> import re
>>> from bs4 import BeautifulSoup
>>> 
>>> data = """
... <div>
...     <a href="http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234567">link1</a>
...     <a href="http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234561">link2</a>
...     <a href="http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234564">link3</a>
...     <a href="http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234562">link4</a>
...     <a href="http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234563">link5</a>
... </div>
... """
>>> 
>>> soup = BeautifulSoup(data)
>>> links = soup.find_all('a', href=re.compile(r'/d2l/lp/ouHome/home\.d2l\?ou=\d+'))
>>> for link in links:
...     print link.text
... 
link1
link2
link3
link4
link5