Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/85.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 美化组查找符合条件的链接_Python_Html_Parsing_Beautifulsoup_Html Parsing - Fatal编程技术网

Python 美化组查找符合条件的链接

Python 美化组查找符合条件的链接,python,html,parsing,beautifulsoup,html-parsing,Python,Html,Parsing,Beautifulsoup,Html Parsing,我正在尝试收集通过beautiful soup收集的网页上的所有链接,其中包含/d2l/lp/ouHome/home.d2l?ou= 实际链接如下所示: "http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234567" "http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234561" "http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234564" "http

我正在尝试收集通过beautiful soup收集的网页上的所有链接,其中包含
/d2l/lp/ouHome/home.d2l?ou=

实际链接如下所示:

"http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234567"
"http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234561"
"http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234564"
"http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234562"
"http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234563"
您可以将a作为
href
参数值传递给
find_all()

演示:

>>重新导入
>>>从bs4导入BeautifulSoup
>>> 
>>>data=”“”
... 
...     
...     
...     
...     
...     
... 
... """
>>> 
>>>汤=美汤(数据)
>>>links=soup.find_all('a',href=re.compile(r'/d2l/lp/ouHome/home\.d2l\?ou=\d+'))
>>>对于链接中的链接:
...     打印link.text
... 
链接1
链接2
链接3
链接4
链接5

无论出于什么原因,这对我都不起作用。我认为这是因为链接在标记内部,而标记位于子类的底部。@JacksonBlankenship好的,那么,请提供您正在处理的HTML或您正在解析的网页的链接。谢谢。@JacksonBlankenship好的,谢谢。但是,当前代码有什么问题?我已经针对您提供的HTML执行了它,并获得了8个不同的链接,从
ouit-Tech训练营
PSY-1113-001-心理学元素
.hmmm。。好吧,我可能做错了什么。谢谢你的帮助@JacksonBlankenship好的,谢谢,如果您需要调试代码的帮助,请告诉我。
soup.find_all('a', href=re.compile(r'/d2l/lp/ouHome/home\.d2l\?ou=\d+'))
>>> import re
>>> from bs4 import BeautifulSoup
>>> 
>>> data = """
... <div>
...     <a href="http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234567">link1</a>
...     <a href="http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234561">link2</a>
...     <a href="http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234564">link3</a>
...     <a href="http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234562">link4</a>
...     <a href="http://learn.ou.edu/d2l/lp/ouHome/home.d2l?ou=1234563">link5</a>
... </div>
... """
>>> 
>>> soup = BeautifulSoup(data)
>>> links = soup.find_all('a', href=re.compile(r'/d2l/lp/ouHome/home\.d2l\?ou=\d+'))
>>> for link in links:
...     print link.text
... 
link1
link2
link3
link4
link5