Python 美化组属性错误_Python_Html_Web Scraping_Beautifulsoup_Python Requests

Python 美化组属性错误

python html web-scraping

Python 美化组属性错误,python,html,web-scraping,beautifulsoup,python-requests,Python,Html,Web Scraping,Beautifulsoup,Python Requests,我正在尝试使用BeautifulSoup和请求抓取谷歌购物。这是我的代码，非常简单： from bs4 import BeautifulSoup import requests import lxml import json def gshop(q): q = q.replace(' ', '+') headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64

我正在尝试使用BeautifulSoup和请求抓取谷歌购物。这是我的代码，非常简单：

from bs4 import BeautifulSoup
import requests
import lxml
import json

def gshop(q):
    q = q.replace(' ', '+')
    
    headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }
    
    response = requests.get(f'https://www.google.com/search?q={q}&tbm=shop', headers=headers).text

    soup = BeautifulSoup(response, 'lxml')
    data = []

    for container in soup.findAll('div', class_='sh-dgr__content'):
        title = container.find('h4', class_='A2sOrd').text
        price = container.find('span', class_='a8Pemb').text
        supplier = container.find('div', class_='aULzUe IuHnof').text
        buy = 'https://google.com'+(container.find('a', class_='eaGTj mQaFGe shntl')['href'])
        rating = container.find('span', class_='Rsc7Yb').text
        data.append({
            "Title": title,
            "Price": price,
            "Rating": rating,
            "Supplier": supplier,
            "Link": buy
        })

    return json.dumps(data, indent = 2, ensure_ascii = False)

print(gshop('toys'))

这会引发一个错误：

Traceback (most recent call last):
  File "c:/Users/Maanav/Desktop/ValRal/main.py", line 45, in <module>
    print(gshop('toys'))
  File "c:/Users/Maanav/Desktop/ValRal/main.py", line 34, in gshop
    rating = container.find('span', class_='Rsc7Yb').text
AttributeError: 'NoneType' object has no attribute 'text'

回溯（最近一次呼叫最后一次）：
文件“c:/Users/Maanav/Desktop/ValRal/main.py”，第45行，在
印刷品（gshop（‘玩具’））
gshop中第34行的文件“c:/Users/Maanav/Desktop/ValRal/main.py”
评级=容器。查找（'span'，class='Rsc7Yb'）。文本
AttributeError:“非类型”对象没有属性“文本”

请查看谷歌购物url的源代码，以便更好地理解我的代码。出了什么问题？

由@simpleApp在评论中解决：

有时，谷歌购物清单上的产品可能没有评级，或者卖家可能没有添加供应商名称。这将使程序停止运行。为了防止这种情况发生，我们必须使用异常处理

from bs4 import BeautifulSoup
import requests
import lxml
import json

def gshop(q):
    q = q.replace(' ', '+')
    
    headers = {
    "User-Agent":
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
    }
    
    response = requests.get(f'https://www.google.com/search?q={q}&tbm=shop', headers=headers).text

    soup = BeautifulSoup(response, 'lxml')
    data = []

    for container in soup.findAll('div', class_='sh-dgr__content'):
        try:
            title = container.find('h4', class_='A2sOrd').text
        except:
            title = None
        try:
            price = container.find('span', class_='a8Pemb').text
        except:
            price = None
        try:
            supplier = container.find('div', class_='aULzUe IuHnof').text
        except:
            supplier = None
        try:
            buy = 'https://google.com'+(container.find('a', class_='eaGTj mQaFGe shntl')['href'])
        except:
            buy = None
        try:
            rating = container.find('span', class_='Rsc7Yb').text
        except:
            rating = None
        data.append({
            "Title": title,
            "Price": price,
            "Rating": rating,
            "Supplier": supplier,
            "Link": buy
        })

    return json.dumps(data, indent = 2, ensure_ascii = False)

这么多观点，却没有答案：（如果您尝试打印

response.url

，您将得到

https://www.google.com/search?q=toys&tbm=shop

这不会产生任何结果。我无法复制您的代码，因为进入该URL会要求您登录google，而soup根本无法读取该页面。@solopiu它不会要求我登录您的代码对于每个产品，您将找到所有值。例如，某些产品没有评级。因此程序将通过异常。尝试在标题和评级上设置异常。例如-

try:title=container.find（'h4'，class='A2sOrd'）。文本除外：title=“无”

另一种调试方法是在HTML上编写响应，并查看返回的内容。

使用open（“r3.HTML”，“w”）作为f:f.write（response）

谢谢你，赛尔夫！这为我解决了问题！