Python-如何在没有类的网页上找到链接？_Python_Beautifulsoup_Web Crawler

Python-如何在没有类的网页上找到链接？

python web-crawler

Python-如何在没有类的网页上找到链接？,python,beautifulsoup,web-crawler,Python,Beautifulsoup,Web Crawler,我是一名初级python程序员，我正在尝试制作一个webcrawler作为练习。目前我面临着一个无法找到正确解决方案的问题。问题是，我试图从没有类的页面获取链接位置/地址，因此我不知道如何过滤该特定链接。最好给你看看。如您所见，我正在尝试获取“历史价格”链接的href属性的内部内容。以下是我的python代码： import requests from bs4 import BeautifulSoup def find_historicalprices_link(url): s

我是一名初级python程序员，我正在尝试制作一个webcrawler作为练习。目前我面临着一个无法找到正确解决方案的问题。问题是，我试图从没有类的页面获取链接位置/地址，因此我不知道如何过滤该特定链接。最好给你看看。

如您所见，我正在尝试获取“历史价格”链接的href属性的内部内容。以下是我的python代码：

import requests
from bs4 import BeautifulSoup

def find_historicalprices_link(url):
    source = requests.get(url)
    text = source.text
    soup = BeautifulSoup(text, 'html.parser')
    link = soup.find_all('li', 'fjfe-nav-sub')
    href = str(link.get('href'))
    find_spreadsheet(href)

def find_spreadsheet(url):
    source = requests.get(url)
    text = source.text
    soup = BeautifulSoup(text, 'html.parser')
    link = soup.find('a', {'class' : 'nowrap'})
    href = str(link.get('href'))
    download_spreadsheet(href)

def download_spreadsheet(url):
    response = requests.get(url)
    text = response.text
    lines = text.split("\\n")
    filename = r'google.csv'
    file = open(filename, 'w')
    for line in lines:
        file.write(line + "\n")
    file.close()

find_historicalprices_link('https://www.google.com/finance?q=NASDAQ%3AGOOGL&ei=3lowWYGRJNSvsgGPgaywDw')

在函数“find_spreadsheet（url）”中，我可以通过查找名为“nowrap”的类来轻松过滤链接。不幸的是，“历史价格”链接没有这样的类，现在我的脚本只给出以下错误：

AttributeError: ResultSet object has no attribute 'get'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

如何确保我的爬虫程序只从“历史价格”中获取href？
先谢谢你

更新：
我找到了做这件事的方法。只需查找附加了特定文本的链接，我就可以找到所需的href。
解决方案：

soup.find（'a'，string=“Historical prices”）

以下代码sniplet对您有帮助吗？我认为您可以使用以下代码解决您的问题，我希望：

from bs4 import BeautifulSoup

html = """<a href='http://www.google.com'>Something else</a>
          <a href='http://www.yahoo.com'>Historical prices</a>"""

soup = BeautifulSoup(html, "html5lib")

urls = soup.find_all("a")

print(urls)

print([a["href"] for a in urls if a.text == "Historical prices"])

从bs4导入美化组
html=”“”
"""
soup=BeautifulSoup（html，“html5lib”）
URL=soup.find_all（“a”）
打印（URL）
打印（[a[“href”]用于URL中的a，如果a.text==“历史价格”]）

以下代码片段是否对您有所帮助？我认为您可以使用以下代码解决您的问题，我希望：

from bs4 import BeautifulSoup

html = """<a href='http://www.google.com'>Something else</a>
          <a href='http://www.yahoo.com'>Historical prices</a>"""

soup = BeautifulSoup(html, "html5lib")

urls = soup.find_all("a")

print(urls)

print([a["href"] for a in urls if a.text == "Historical prices"])

从bs4导入美化组
html=”“”
"""
soup=BeautifulSoup（html，“html5lib”）
URL=soup.find_all（“a”）
打印（URL）
打印（[a[“href”]用于URL中的a，如果a.text==“历史价格”]）

您读到错误了吗？这一行给您带来了问题：link=soup.find_all（'li'，'fjfe nav sub'）href=str（link.get（'href'））链接是一个列表，而不是一个element@jarcobi889好的，那么我需要做什么来解决这个问题呢？我已经将find_all（）改为find（），现在它只返回“None”，您读到错误了吗？这一行给您带来了问题：link=soup.find_all（'li'，'fjfe nav sub'）href=str（link.get（'href'））链接是一个列表，而不是一个element@jarcobi889好的，那么我需要做什么来解决这个问题呢？我已经将find_all（）更改为find（），现在它只返回“None”不，不幸的是它没有。但我找到了办法。我这样做的方式是只寻找与该特定文本的链接。soup.find（'a'，string=“历史价格”）。我刚刚学会了如何使用thisThx，这对我也有帮助。我不知道这种可能性！不，不幸的是没有。但我找到了办法。我这样做的方式是只寻找与该特定文本的链接。soup.find（'a'，string=“历史价格”）。我刚刚学会了如何使用thisThx，这对我也有帮助。我不知道这种可能性！