Python 如何从BeautifulSoup结果中获得第三个链接_Python_Python 2.7_Beautifulsoup

Python 如何从BeautifulSoup结果中获得第三个链接

python python-2.7

Python 如何从BeautifulSoup结果中获得第三个链接,python,python-2.7,beautifulsoup,Python,Python 2.7,Beautifulsoup,我使用以下代码使用BeautifulSoup检索一组链接。它返回所有的链接，但我想得到第三个链接，通过那个链接进行解析，然后从那个链接得到第三个链接，依此类推。我如何修改下面的代码来实现这一点 import urllib from BeautifulSoup import * url = raw_input('Enter - ') html = urllib.urlopen(url).read() soup = BeautifulSoup(html) # Retrieve all of th

我使用以下代码使用BeautifulSoup检索一组链接。它返回所有的链接，但我想得到第三个链接，通过那个链接进行解析，然后从那个链接得到第三个链接，依此类推。我如何修改下面的代码来实现这一点

import urllib
from BeautifulSoup import *

url = raw_input('Enter - ')
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)

# Retrieve all of the anchor tags
tags = soup('a')
for tag in tags:
    print tag.get('href', None)
    print tag.contents[0]

首先，您应该停止使用

BeautifulSoup

version3-它非常旧，不再维护。切换到通过以下方式安装：

pip install beautifulsoup4

并将导入更改为：

from bs4 import BeautifulSoup

然后，您需要使用

find_all（）

递归地按索引获取第三个链接，直到页面上没有第三个链接为止。以下是一种方法：

import urllib
from bs4 import BeautifulSoup

url = raw_input('Enter - ')

while True:
    html = urllib.urlopen(url)
    soup = BeautifulSoup(html, "html.parser")

    try:
        url = soup.find_all('a')[2]["href"]
        # if the link is not absolute, you might need `urljoin()` here
    except IndexError:
        break  # could not get the 3rd link - exiting the loop

另一个选项是使用，以获得第三个锚点循环，直到css select返回None：

import urllib
from bs4 import BeautifulSoup

url = raw_input('Enter - ')
html = urllib.urlopen(url)
soup = BeautifulSoup(html, "html.parser")
a = soup.select_one("a:nth-of-type(3)")
while a:
    html = urllib.urlopen(a["href"])
    soup = BeautifulSoup(html, "html.parser")
    a = soup.select_one("a:nth-of-type(3)")

如果您想找到具有href属性的第三个锚点，可以使用

“类型（3）[href]的第n个锚点”

感谢您的回复。在上面的代码中，“tags=soup（'a'）返回一个列表。然后，当执行“print”时，我得到了许多链接。因此，它似乎给了我所有的链接，而没有使用“find_all”。我感到困惑的是，为什么我不能“简单地打印标记[2]，我假设它是循环迭代到的第三个链接。@martinbshp yes，

soup（）

是通往

汤的快捷方式。find_all（）

。是的，您需要获得答案中所示的

href

属性值。哦！我现在知道了。您的回答促使我返回并重新思考这件事，看看标签是如何的，我需要查询的索引不是for循环中的标签变量。谢谢。