Python 无法使用BeautifulSoup';s get_text()函数,返回属性错误

Python 无法使用BeautifulSoup';s get_text()函数,返回属性错误,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,在我的计算机科学课程中,我正在尝试编写一个web scraper python脚本,用于查找playstation商店中使用python和beautiful soup销售的所有游戏。现在,我只是想让程序在第一页列出所有的游戏,它们的价格和销售百分比(如果有)。但是,对于所有正在销售的游戏,终端返回一个属性错误:“nontype”对象没有属性“get_text”。这是我的密码: from urllib.request import urlopen as uReq from bs4 import B

在我的计算机科学课程中,我正在尝试编写一个web scraper python脚本,用于查找playstation商店中使用python和beautiful soup销售的所有游戏。现在,我只是想让程序在第一页列出所有的游戏,它们的价格和销售百分比(如果有)。但是,对于所有正在销售的游戏,终端返回一个属性错误:“nontype”对象没有属性“get_text”。这是我的密码:

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://store.playstation.com/en-ca/category/85448d87-aa7b-4318-9997-7d25f4d275a4/1'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
containers = page_soup.find_all("section",{"class":"ems-sdk-product-tile__details"})



for container in containers: 

   title = container.span.get_text() 

   salePercentContainer = container.find("span",{"class":"psw-body-2 discount-badge discount-badge-- 
   undefined"})
   salePercent = salePercentContainer.get_text()
   if salePercent is None:
      salePercent = 'none'


priceContainer = container.strike
price = priceContainer#.text
if price is None:
    Rprice = container.find_all("span",{"class":"price"})
    price = Rprice[0].text

print("title: " + title)
print("sale percent: " + str(salePercent))
print("price: " + str(price))

输入一个
try:except
,这样它就不会因为
元素
而失败,因为这些元素没有您想要的东西

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://store.playstation.com/en-ca/category/85448d87-aa7b-4318-9997-7d25f4d275a4/1'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
containers = page_soup.find_all("section",{"class":"ems-sdk-product-tile__details"})



for container in containers:
    try:
        title = container.span.get_text()
        print(title)

        salePercentContainer = container.find("span",{"class":"psw-body-2 discount-badge discount-badge--undefined"})
        salePercent = salePercentContainer.get_text()
        print(salePercent)
        if salePercent is None:
            salePercent = 'none'
     except Exception as e:
        pass

priceContainer = container.strike
print(priceContainer)
price = priceContainer  # .text
if price is None:
    Rprice = container.find_all("span", {"class": "price"})
    price = Rprice[0].text
    print(price)

print("title: " + title)
print("sale percent: " + str(salePercent))
print("price: " + str(price))
输出:-

Just Cause 4
Rocket Arena
Vigor
Rocket League®
Fortnite
Days Gone
-50%
God of War
Genshin Impact
Mortal Kombat X
Rogue Company
Crash Bandicoot™ N. Sane Trilogy
eFootball PES 2021 LITE
Apex Legends™
Fallout 4
-70%
Stranded Deep
MONSTER HUNTER: WORLD™
RESIDENT EVIL 7 biohazard
-50%
The Last Guardian
Bloodborne™
Horizon Zero Dawn: Complete Edition
Persona 5
Battlefield™ 1
NHL® 21
-52%
Wreckfest
-30%
The Last Of Us™ Remastered 
Until Dawn
inFAMOUS Second Son
Detroit: Become Human
Red Dead Online
SHAREfactory™
Brawlhalla
Hyper Scape
Rec Room
Bless Unleashed
RACING BROS
F1 2020
-50%
NBA 2K21
-50%
Spellbreak
SMITE
Grand Theft Auto V
Injustice™ 2
-75%
UFC® 4
-50%
SPIDER-MAN: FAR FROM HOME VIRTUAL REALITY EXPERIENCE
Dead Island Definitive Edition
MX vs ATV All Out
Hello Neighbor
NARUTO TO BORUTO: SHINOBI STRIKER
Tomb Raider: Definitive Edition
None
$26.99
title: Tomb Raider: Definitive Edition
sale percent: -50%
price: $26.99

输入一个
try:except
,这样它就不会因为
元素
而失败,因为这些元素没有您想要的东西

from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup

my_url = 'https://store.playstation.com/en-ca/category/85448d87-aa7b-4318-9997-7d25f4d275a4/1'

uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()

page_soup = soup(page_html, "html.parser")
containers = page_soup.find_all("section",{"class":"ems-sdk-product-tile__details"})



for container in containers:
    try:
        title = container.span.get_text()
        print(title)

        salePercentContainer = container.find("span",{"class":"psw-body-2 discount-badge discount-badge--undefined"})
        salePercent = salePercentContainer.get_text()
        print(salePercent)
        if salePercent is None:
            salePercent = 'none'
     except Exception as e:
        pass

priceContainer = container.strike
print(priceContainer)
price = priceContainer  # .text
if price is None:
    Rprice = container.find_all("span", {"class": "price"})
    price = Rprice[0].text
    print(price)

print("title: " + title)
print("sale percent: " + str(salePercent))
print("price: " + str(price))
输出:-

Just Cause 4
Rocket Arena
Vigor
Rocket League®
Fortnite
Days Gone
-50%
God of War
Genshin Impact
Mortal Kombat X
Rogue Company
Crash Bandicoot™ N. Sane Trilogy
eFootball PES 2021 LITE
Apex Legends™
Fallout 4
-70%
Stranded Deep
MONSTER HUNTER: WORLD™
RESIDENT EVIL 7 biohazard
-50%
The Last Guardian
Bloodborne™
Horizon Zero Dawn: Complete Edition
Persona 5
Battlefield™ 1
NHL® 21
-52%
Wreckfest
-30%
The Last Of Us™ Remastered 
Until Dawn
inFAMOUS Second Son
Detroit: Become Human
Red Dead Online
SHAREfactory™
Brawlhalla
Hyper Scape
Rec Room
Bless Unleashed
RACING BROS
F1 2020
-50%
NBA 2K21
-50%
Spellbreak
SMITE
Grand Theft Auto V
Injustice™ 2
-75%
UFC® 4
-50%
SPIDER-MAN: FAR FROM HOME VIRTUAL REALITY EXPERIENCE
Dead Island Definitive Edition
MX vs ATV All Out
Hello Neighbor
NARUTO TO BORUTO: SHINOBI STRIKER
Tomb Raider: Definitive Edition
None
$26.99
title: Tomb Raider: Definitive Edition
sale percent: -50%
price: $26.99

数据在html源代码中是json格式的。你可以把它拉出来,也可以解析它

只需过滤数据框即可显示所需内容

import requests
import pandas as pd
import json
from bs4 import BeautifulSoup

rows = []
for page in range(1,11):
    url = 'https://store.playstation.com/en-ca/category/85448d87-aa7b-4318-9997-7d25f4d275a4/%s' %page
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    jsonStr = soup.find_all('script',{'type':'application/json'})[2].text
    jsonData = json.loads(jsonStr)
    
    state = jsonData['props']['apolloState']
    
    
    print ('Page: %s' %page)
    for k, v in state.items():
        if 'Product:' in k and '.price' in k:
            skuId = k.split('.price')[0][1:]
            title = jsonData['props']['apolloState'][skuId]['name']
            v.update({'title':title})
            rows.append(v)
        
df = pd.DataFrame(rows)
输出:

print (df)
    basePrice discountedPrice  ... __typename                         title
0       $0.00        Included  ...   SkuPrice                  Just Cause 4
1       $6.99           $6.99  ...   SkuPrice                  Rocket Arena
2        Free            Free  ...   SkuPrice                         Vigor
3        Free            Free  ...   SkuPrice                Rocket League®
4        Free            Free  ...   SkuPrice                      Fortnite
..        ...             ...  ...        ...                           ...
472    $39.99           $9.99  ...   SkuPrice  MX vs. ATV Supercross Encore
473    $29.99          $29.99  ...   SkuPrice      ASTRO BOT Rescue Mission
474    $33.49          $33.49  ...   SkuPrice                    Descenders
475    $54.99          $21.99  ...   SkuPrice                    Fallout 76
476    $24.99          $24.99  ...   SkuPrice         LEGO® Jurassic World™

[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
   basePrice discountedPrice discountText  isFree  isExclusive               serviceBranding                  upsellServiceBranding    upsellText __typename                      title
5     $49.99          $24.99         -50%   False        False  {'type': 'json', 'json': []}   {'type': 'json', 'json': ['PS_NOW']}      Included   SkuPrice                  Days Gone
13    $39.99          $11.99         -70%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                  Fallout 4
16    $26.99          $13.49         -50%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice  RESIDENT EVIL 7 biohazard
22    $79.99          $38.39         -52%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                    NHL® 21
23    $39.99          $27.99         -30%   False        False  {'type': 'json', 'json': []}  {'type': 'json', 'json': ['PS_PLUS']}  Save 5% more   SkuPrice                  Wreckfest
显示折扣:

print (df)
    basePrice discountedPrice  ... __typename                         title
0       $0.00        Included  ...   SkuPrice                  Just Cause 4
1       $6.99           $6.99  ...   SkuPrice                  Rocket Arena
2        Free            Free  ...   SkuPrice                         Vigor
3        Free            Free  ...   SkuPrice                Rocket League®
4        Free            Free  ...   SkuPrice                      Fortnite
..        ...             ...  ...        ...                           ...
472    $39.99           $9.99  ...   SkuPrice  MX vs. ATV Supercross Encore
473    $29.99          $29.99  ...   SkuPrice      ASTRO BOT Rescue Mission
474    $33.49          $33.49  ...   SkuPrice                    Descenders
475    $54.99          $21.99  ...   SkuPrice                    Fallout 76
476    $24.99          $24.99  ...   SkuPrice         LEGO® Jurassic World™

[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
   basePrice discountedPrice discountText  isFree  isExclusive               serviceBranding                  upsellServiceBranding    upsellText __typename                      title
5     $49.99          $24.99         -50%   False        False  {'type': 'json', 'json': []}   {'type': 'json', 'json': ['PS_NOW']}      Included   SkuPrice                  Days Gone
13    $39.99          $11.99         -70%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                  Fallout 4
16    $26.99          $13.49         -50%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice  RESIDENT EVIL 7 biohazard
22    $79.99          $38.39         -52%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                    NHL® 21
23    $39.99          $27.99         -30%   False        False  {'type': 'json', 'json': []}  {'type': 'json', 'json': ['PS_PLUS']}  Save 5% more   SkuPrice                  Wreckfest
输出:

print (df)
    basePrice discountedPrice  ... __typename                         title
0       $0.00        Included  ...   SkuPrice                  Just Cause 4
1       $6.99           $6.99  ...   SkuPrice                  Rocket Arena
2        Free            Free  ...   SkuPrice                         Vigor
3        Free            Free  ...   SkuPrice                Rocket League®
4        Free            Free  ...   SkuPrice                      Fortnite
..        ...             ...  ...        ...                           ...
472    $39.99           $9.99  ...   SkuPrice  MX vs. ATV Supercross Encore
473    $29.99          $29.99  ...   SkuPrice      ASTRO BOT Rescue Mission
474    $33.49          $33.49  ...   SkuPrice                    Descenders
475    $54.99          $21.99  ...   SkuPrice                    Fallout 76
476    $24.99          $24.99  ...   SkuPrice         LEGO® Jurassic World™

[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
   basePrice discountedPrice discountText  isFree  isExclusive               serviceBranding                  upsellServiceBranding    upsellText __typename                      title
5     $49.99          $24.99         -50%   False        False  {'type': 'json', 'json': []}   {'type': 'json', 'json': ['PS_NOW']}      Included   SkuPrice                  Days Gone
13    $39.99          $11.99         -70%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                  Fallout 4
16    $26.99          $13.49         -50%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice  RESIDENT EVIL 7 biohazard
22    $79.99          $38.39         -52%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                    NHL® 21
23    $39.99          $27.99         -30%   False        False  {'type': 'json', 'json': []}  {'type': 'json', 'json': ['PS_PLUS']}  Save 5% more   SkuPrice                  Wreckfest

数据在html源代码中是json格式的。你可以把它拉出来,也可以解析它

只需过滤数据框即可显示所需内容

import requests
import pandas as pd
import json
from bs4 import BeautifulSoup

rows = []
for page in range(1,11):
    url = 'https://store.playstation.com/en-ca/category/85448d87-aa7b-4318-9997-7d25f4d275a4/%s' %page
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    jsonStr = soup.find_all('script',{'type':'application/json'})[2].text
    jsonData = json.loads(jsonStr)
    
    state = jsonData['props']['apolloState']
    
    
    print ('Page: %s' %page)
    for k, v in state.items():
        if 'Product:' in k and '.price' in k:
            skuId = k.split('.price')[0][1:]
            title = jsonData['props']['apolloState'][skuId]['name']
            v.update({'title':title})
            rows.append(v)
        
df = pd.DataFrame(rows)
输出:

print (df)
    basePrice discountedPrice  ... __typename                         title
0       $0.00        Included  ...   SkuPrice                  Just Cause 4
1       $6.99           $6.99  ...   SkuPrice                  Rocket Arena
2        Free            Free  ...   SkuPrice                         Vigor
3        Free            Free  ...   SkuPrice                Rocket League®
4        Free            Free  ...   SkuPrice                      Fortnite
..        ...             ...  ...        ...                           ...
472    $39.99           $9.99  ...   SkuPrice  MX vs. ATV Supercross Encore
473    $29.99          $29.99  ...   SkuPrice      ASTRO BOT Rescue Mission
474    $33.49          $33.49  ...   SkuPrice                    Descenders
475    $54.99          $21.99  ...   SkuPrice                    Fallout 76
476    $24.99          $24.99  ...   SkuPrice         LEGO® Jurassic World™

[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
   basePrice discountedPrice discountText  isFree  isExclusive               serviceBranding                  upsellServiceBranding    upsellText __typename                      title
5     $49.99          $24.99         -50%   False        False  {'type': 'json', 'json': []}   {'type': 'json', 'json': ['PS_NOW']}      Included   SkuPrice                  Days Gone
13    $39.99          $11.99         -70%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                  Fallout 4
16    $26.99          $13.49         -50%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice  RESIDENT EVIL 7 biohazard
22    $79.99          $38.39         -52%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                    NHL® 21
23    $39.99          $27.99         -30%   False        False  {'type': 'json', 'json': []}  {'type': 'json', 'json': ['PS_PLUS']}  Save 5% more   SkuPrice                  Wreckfest
显示折扣:

print (df)
    basePrice discountedPrice  ... __typename                         title
0       $0.00        Included  ...   SkuPrice                  Just Cause 4
1       $6.99           $6.99  ...   SkuPrice                  Rocket Arena
2        Free            Free  ...   SkuPrice                         Vigor
3        Free            Free  ...   SkuPrice                Rocket League®
4        Free            Free  ...   SkuPrice                      Fortnite
..        ...             ...  ...        ...                           ...
472    $39.99           $9.99  ...   SkuPrice  MX vs. ATV Supercross Encore
473    $29.99          $29.99  ...   SkuPrice      ASTRO BOT Rescue Mission
474    $33.49          $33.49  ...   SkuPrice                    Descenders
475    $54.99          $21.99  ...   SkuPrice                    Fallout 76
476    $24.99          $24.99  ...   SkuPrice         LEGO® Jurassic World™

[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
   basePrice discountedPrice discountText  isFree  isExclusive               serviceBranding                  upsellServiceBranding    upsellText __typename                      title
5     $49.99          $24.99         -50%   False        False  {'type': 'json', 'json': []}   {'type': 'json', 'json': ['PS_NOW']}      Included   SkuPrice                  Days Gone
13    $39.99          $11.99         -70%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                  Fallout 4
16    $26.99          $13.49         -50%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice  RESIDENT EVIL 7 biohazard
22    $79.99          $38.39         -52%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                    NHL® 21
23    $39.99          $27.99         -30%   False        False  {'type': 'json', 'json': []}  {'type': 'json', 'json': ['PS_PLUS']}  Save 5% more   SkuPrice                  Wreckfest
输出:

print (df)
    basePrice discountedPrice  ... __typename                         title
0       $0.00        Included  ...   SkuPrice                  Just Cause 4
1       $6.99           $6.99  ...   SkuPrice                  Rocket Arena
2        Free            Free  ...   SkuPrice                         Vigor
3        Free            Free  ...   SkuPrice                Rocket League®
4        Free            Free  ...   SkuPrice                      Fortnite
..        ...             ...  ...        ...                           ...
472    $39.99           $9.99  ...   SkuPrice  MX vs. ATV Supercross Encore
473    $29.99          $29.99  ...   SkuPrice      ASTRO BOT Rescue Mission
474    $33.49          $33.49  ...   SkuPrice                    Descenders
475    $54.99          $21.99  ...   SkuPrice                    Fallout 76
476    $24.99          $24.99  ...   SkuPrice         LEGO® Jurassic World™

[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
   basePrice discountedPrice discountText  isFree  isExclusive               serviceBranding                  upsellServiceBranding    upsellText __typename                      title
5     $49.99          $24.99         -50%   False        False  {'type': 'json', 'json': []}   {'type': 'json', 'json': ['PS_NOW']}      Included   SkuPrice                  Days Gone
13    $39.99          $11.99         -70%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                  Fallout 4
16    $26.99          $13.49         -50%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice  RESIDENT EVIL 7 biohazard
22    $79.99          $38.39         -52%   False        False  {'type': 'json', 'json': []}                                   None          None   SkuPrice                    NHL® 21
23    $39.99          $27.99         -30%   False        False  {'type': 'json', 'json': []}  {'type': 'json', 'json': ['PS_PLUS']}  Save 5% more   SkuPrice                  Wreckfest

如果没有更多细节,我可以肯定地告诉您,您正在调用的
.get_text()
对象之一与您想象的不同。它实际上是null(或python中的非类型)。我建议观看
container
salePercentContainer
,等等。其中一个没有解决任何问题。它可能是一个容器,在尝试执行
get_text
nontype
操作之前,您可能需要检查它是否为null,这意味着
None
并且意味着它在页面上找不到元素-所以您尝试执行
None.get_text()
页面可能使用JavaScript添加元素,但
BeatifulSoup
/
请求
无法运行JavaScript。您可能需要控制可以运行JavaScript的真实web浏览器。顺便说一句:关闭web浏览器中的JavaScript并再次加载页面,以查看Beautifulsoup可以从服务器中获得什么。如果页面在没有JavaScript的情况下工作,则应检查在
页面\u html
中获得的内容,即使用
打印()
或保存在文件中并在web浏览器中打开。也许服务器认识到您使用了脚本,并且它发送了带有机器人警告或验证码的HTML。@DoloMike证明salePercentContainer为空,谢谢。没有更多详细信息,我可以肯定地告诉您,您正在调用的对象之一
。get_text()
与您想象的不同。它实际上是null(或python中的非类型)。我建议观看
container
salePercentContainer
,等等。其中一个没有解决任何问题。它可能是一个容器,在尝试执行
get_text
nontype
操作之前,您可能需要检查它是否为null,这意味着
None
并且意味着它在页面上找不到元素-所以您尝试执行
None.get_text()
页面可能使用JavaScript添加元素,但
BeatifulSoup
/
请求
无法运行JavaScript。您可能需要控制可以运行JavaScript的真实web浏览器。顺便说一句:关闭web浏览器中的JavaScript并再次加载页面,以查看Beautifulsoup可以从服务器中获得什么。如果页面在没有JavaScript的情况下工作,则应检查在
页面\u html
中获得的内容,即使用
打印()
或保存在文件中并在web浏览器中打开。也许服务器识别出您使用了脚本,并发送了带有机器人警告或验证码的HTML。@DoloMike发现salePercentContainer为空,谢谢。