Python 使用BeautifulSoup在一个父级内解析多个href
我的程序中有一行,使用BeautifulSoup的find(): 这是上述行的输出:Python 使用BeautifulSoup在一个父级内解析多个href,python,beautifulsoup,screen-scraping,Python,Beautifulsoup,Screen Scraping,我的程序中有一行,使用BeautifulSoup的find(): 这是上述行的输出: <td class="monsters"> <a href="/m154"><div class="mim mim-154"></div></a> <a href="/m153"><div class="mim mim-153"></div></a> <a href="/m152"><d
<td class="monsters">
<a href="/m154"><div class="mim mim-154"></div></a>
<a href="/m153"><div class="mim mim-153"></div></a>
<a href="/m152"><div class="mim mim-152"></div></a>
<a href="/m155"><div class="mim mim-155"></div></a>
<a href="/m147"><div class="mim mim-147"></div></a>
</td>
我试图通过将
find()
更改为find_all()
,将我的print
行转换为for循环,然后在foor循环中使用.a['href']
检索href。然而,无论我尝试什么,我总是只能得到一个条目,而不是五个条目。对检索多个href有什么建议吗?看到find_all()
返回一个数组,将find_all()直接置于a
的父级之上是否有意义?您想做的事情如下所示:
cell = table.find('td', 'monsters')
for a_tag in cell.find_all('a'):
print(a['href'])
输入:
page = """<td class="monsters">
<a href="/m154"><div class="mim mim-154"></div></a>
<a href="/m153"><div class="mim mim-153"></div></a>
<a href="/m152"><div class="mim mim-152"></div></a>
<a href="/m155"><div class="mim mim-155"></div></a>
<a href="/m147"><div class="mim mim-147"></div></a>
</td>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "html.parser") # your source page parsed as html
links = soup.find_all('a', href=True) # get all links having href attribute
for i in links:
print(i['href'])
完整代码,类似于上面的帖子
import bs4
HTML= """<html>
<table>
<tr>
<td class="monsters">
<a href="/m154"><div class="mim mim-154"></div></a>
<a href="/m153"><div class="mim mim-153"></div></a>
<a href="/m152"><div class="mim mim-152"></div></a>
<a href="/m155"><div class="mim mim-155"></div></a>
<a href="/m147"><div class="mim mim-147"></div></a>
</td>
</tr>
</table>
</html>
"""
table = bs4.BeautifulSoup(HTML, 'lxml')
anker = table.find('td', 'monsters').find_all('a')
[print(a['href']) for a in anker]
导入bs4
HTML=”“”
"""
table=bs4.BeautifulSoup(HTML,“lxml”)
anker=table.find('td','monsters')。find_all('a'))
[打印(a['href'])以获得一个in-anker版本]
如果html代码多于表,则此答案也适用。为了澄清,代码中的链接
只是带有a
标记的任何内容,对吗?是的,如果要对其进行分类,则可以使用另一个标记(父标记或子标记)或标记的class/id
page = """<td class="monsters">
<a href="/m154"><div class="mim mim-154"></div></a>
<a href="/m153"><div class="mim mim-153"></div></a>
<a href="/m152"><div class="mim mim-152"></div></a>
<a href="/m155"><div class="mim mim-155"></div></a>
<a href="/m147"><div class="mim mim-147"></div></a>
</td>"""
from bs4 import BeautifulSoup
soup = BeautifulSoup(page, "html.parser") # your source page parsed as html
links = soup.find_all('a', href=True) # get all links having href attribute
for i in links:
print(i['href'])
/m154
/m153
/m152
/m155
/m147
import bs4
HTML= """<html>
<table>
<tr>
<td class="monsters">
<a href="/m154"><div class="mim mim-154"></div></a>
<a href="/m153"><div class="mim mim-153"></div></a>
<a href="/m152"><div class="mim mim-152"></div></a>
<a href="/m155"><div class="mim mim-155"></div></a>
<a href="/m147"><div class="mim mim-147"></div></a>
</td>
</tr>
</table>
</html>
"""
table = bs4.BeautifulSoup(HTML, 'lxml')
anker = table.find('td', 'monsters').find_all('a')
[print(a['href']) for a in anker]