Python 无法使用BeautifulSoup';s get_text()函数,返回属性错误
在我的计算机科学课程中,我正在尝试编写一个web scraper python脚本,用于查找playstation商店中使用python和beautiful soup销售的所有游戏。现在,我只是想让程序在第一页列出所有的游戏,它们的价格和销售百分比(如果有)。但是,对于所有正在销售的游戏,终端返回一个属性错误:“nontype”对象没有属性“get_text”。这是我的密码:Python 无法使用BeautifulSoup';s get_text()函数,返回属性错误,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,在我的计算机科学课程中,我正在尝试编写一个web scraper python脚本,用于查找playstation商店中使用python和beautiful soup销售的所有游戏。现在,我只是想让程序在第一页列出所有的游戏,它们的价格和销售百分比(如果有)。但是,对于所有正在销售的游戏,终端返回一个属性错误:“nontype”对象没有属性“get_text”。这是我的密码: from urllib.request import urlopen as uReq from bs4 import B
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://store.playstation.com/en-ca/category/85448d87-aa7b-4318-9997-7d25f4d275a4/1'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.find_all("section",{"class":"ems-sdk-product-tile__details"})
for container in containers:
title = container.span.get_text()
salePercentContainer = container.find("span",{"class":"psw-body-2 discount-badge discount-badge--
undefined"})
salePercent = salePercentContainer.get_text()
if salePercent is None:
salePercent = 'none'
priceContainer = container.strike
price = priceContainer#.text
if price is None:
Rprice = container.find_all("span",{"class":"price"})
price = Rprice[0].text
print("title: " + title)
print("sale percent: " + str(salePercent))
print("price: " + str(price))
输入一个
try:except
,这样它就不会因为元素
而失败,因为这些元素没有您想要的东西
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://store.playstation.com/en-ca/category/85448d87-aa7b-4318-9997-7d25f4d275a4/1'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.find_all("section",{"class":"ems-sdk-product-tile__details"})
for container in containers:
try:
title = container.span.get_text()
print(title)
salePercentContainer = container.find("span",{"class":"psw-body-2 discount-badge discount-badge--undefined"})
salePercent = salePercentContainer.get_text()
print(salePercent)
if salePercent is None:
salePercent = 'none'
except Exception as e:
pass
priceContainer = container.strike
print(priceContainer)
price = priceContainer # .text
if price is None:
Rprice = container.find_all("span", {"class": "price"})
price = Rprice[0].text
print(price)
print("title: " + title)
print("sale percent: " + str(salePercent))
print("price: " + str(price))
输出:-
Just Cause 4
Rocket Arena
Vigor
Rocket League®
Fortnite
Days Gone
-50%
God of War
Genshin Impact
Mortal Kombat X
Rogue Company
Crash Bandicoot™ N. Sane Trilogy
eFootball PES 2021 LITE
Apex Legends™
Fallout 4
-70%
Stranded Deep
MONSTER HUNTER: WORLD™
RESIDENT EVIL 7 biohazard
-50%
The Last Guardian
Bloodborne™
Horizon Zero Dawn: Complete Edition
Persona 5
Battlefield™ 1
NHL® 21
-52%
Wreckfest
-30%
The Last Of Us™ Remastered
Until Dawn
inFAMOUS Second Son
Detroit: Become Human
Red Dead Online
SHAREfactory™
Brawlhalla
Hyper Scape
Rec Room
Bless Unleashed
RACING BROS
F1 2020
-50%
NBA 2K21
-50%
Spellbreak
SMITE
Grand Theft Auto V
Injustice™ 2
-75%
UFC® 4
-50%
SPIDER-MAN: FAR FROM HOME VIRTUAL REALITY EXPERIENCE
Dead Island Definitive Edition
MX vs ATV All Out
Hello Neighbor
NARUTO TO BORUTO: SHINOBI STRIKER
Tomb Raider: Definitive Edition
None
$26.99
title: Tomb Raider: Definitive Edition
sale percent: -50%
price: $26.99
输入一个
try:except
,这样它就不会因为元素
而失败,因为这些元素没有您想要的东西
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://store.playstation.com/en-ca/category/85448d87-aa7b-4318-9997-7d25f4d275a4/1'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.find_all("section",{"class":"ems-sdk-product-tile__details"})
for container in containers:
try:
title = container.span.get_text()
print(title)
salePercentContainer = container.find("span",{"class":"psw-body-2 discount-badge discount-badge--undefined"})
salePercent = salePercentContainer.get_text()
print(salePercent)
if salePercent is None:
salePercent = 'none'
except Exception as e:
pass
priceContainer = container.strike
print(priceContainer)
price = priceContainer # .text
if price is None:
Rprice = container.find_all("span", {"class": "price"})
price = Rprice[0].text
print(price)
print("title: " + title)
print("sale percent: " + str(salePercent))
print("price: " + str(price))
输出:-
Just Cause 4
Rocket Arena
Vigor
Rocket League®
Fortnite
Days Gone
-50%
God of War
Genshin Impact
Mortal Kombat X
Rogue Company
Crash Bandicoot™ N. Sane Trilogy
eFootball PES 2021 LITE
Apex Legends™
Fallout 4
-70%
Stranded Deep
MONSTER HUNTER: WORLD™
RESIDENT EVIL 7 biohazard
-50%
The Last Guardian
Bloodborne™
Horizon Zero Dawn: Complete Edition
Persona 5
Battlefield™ 1
NHL® 21
-52%
Wreckfest
-30%
The Last Of Us™ Remastered
Until Dawn
inFAMOUS Second Son
Detroit: Become Human
Red Dead Online
SHAREfactory™
Brawlhalla
Hyper Scape
Rec Room
Bless Unleashed
RACING BROS
F1 2020
-50%
NBA 2K21
-50%
Spellbreak
SMITE
Grand Theft Auto V
Injustice™ 2
-75%
UFC® 4
-50%
SPIDER-MAN: FAR FROM HOME VIRTUAL REALITY EXPERIENCE
Dead Island Definitive Edition
MX vs ATV All Out
Hello Neighbor
NARUTO TO BORUTO: SHINOBI STRIKER
Tomb Raider: Definitive Edition
None
$26.99
title: Tomb Raider: Definitive Edition
sale percent: -50%
price: $26.99
数据在html源代码中是json格式的。你可以把它拉出来,也可以解析它 只需过滤数据框即可显示所需内容
import requests
import pandas as pd
import json
from bs4 import BeautifulSoup
rows = []
for page in range(1,11):
url = 'https://store.playstation.com/en-ca/category/85448d87-aa7b-4318-9997-7d25f4d275a4/%s' %page
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
jsonStr = soup.find_all('script',{'type':'application/json'})[2].text
jsonData = json.loads(jsonStr)
state = jsonData['props']['apolloState']
print ('Page: %s' %page)
for k, v in state.items():
if 'Product:' in k and '.price' in k:
skuId = k.split('.price')[0][1:]
title = jsonData['props']['apolloState'][skuId]['name']
v.update({'title':title})
rows.append(v)
df = pd.DataFrame(rows)
输出:
print (df)
basePrice discountedPrice ... __typename title
0 $0.00 Included ... SkuPrice Just Cause 4
1 $6.99 $6.99 ... SkuPrice Rocket Arena
2 Free Free ... SkuPrice Vigor
3 Free Free ... SkuPrice Rocket League®
4 Free Free ... SkuPrice Fortnite
.. ... ... ... ... ...
472 $39.99 $9.99 ... SkuPrice MX vs. ATV Supercross Encore
473 $29.99 $29.99 ... SkuPrice ASTRO BOT Rescue Mission
474 $33.49 $33.49 ... SkuPrice Descenders
475 $54.99 $21.99 ... SkuPrice Fallout 76
476 $24.99 $24.99 ... SkuPrice LEGO® Jurassic World™
[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
basePrice discountedPrice discountText isFree isExclusive serviceBranding upsellServiceBranding upsellText __typename title
5 $49.99 $24.99 -50% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_NOW']} Included SkuPrice Days Gone
13 $39.99 $11.99 -70% False False {'type': 'json', 'json': []} None None SkuPrice Fallout 4
16 $26.99 $13.49 -50% False False {'type': 'json', 'json': []} None None SkuPrice RESIDENT EVIL 7 biohazard
22 $79.99 $38.39 -52% False False {'type': 'json', 'json': []} None None SkuPrice NHL® 21
23 $39.99 $27.99 -30% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_PLUS']} Save 5% more SkuPrice Wreckfest
显示折扣:
print (df)
basePrice discountedPrice ... __typename title
0 $0.00 Included ... SkuPrice Just Cause 4
1 $6.99 $6.99 ... SkuPrice Rocket Arena
2 Free Free ... SkuPrice Vigor
3 Free Free ... SkuPrice Rocket League®
4 Free Free ... SkuPrice Fortnite
.. ... ... ... ... ...
472 $39.99 $9.99 ... SkuPrice MX vs. ATV Supercross Encore
473 $29.99 $29.99 ... SkuPrice ASTRO BOT Rescue Mission
474 $33.49 $33.49 ... SkuPrice Descenders
475 $54.99 $21.99 ... SkuPrice Fallout 76
476 $24.99 $24.99 ... SkuPrice LEGO® Jurassic World™
[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
basePrice discountedPrice discountText isFree isExclusive serviceBranding upsellServiceBranding upsellText __typename title
5 $49.99 $24.99 -50% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_NOW']} Included SkuPrice Days Gone
13 $39.99 $11.99 -70% False False {'type': 'json', 'json': []} None None SkuPrice Fallout 4
16 $26.99 $13.49 -50% False False {'type': 'json', 'json': []} None None SkuPrice RESIDENT EVIL 7 biohazard
22 $79.99 $38.39 -52% False False {'type': 'json', 'json': []} None None SkuPrice NHL® 21
23 $39.99 $27.99 -30% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_PLUS']} Save 5% more SkuPrice Wreckfest
输出:
print (df)
basePrice discountedPrice ... __typename title
0 $0.00 Included ... SkuPrice Just Cause 4
1 $6.99 $6.99 ... SkuPrice Rocket Arena
2 Free Free ... SkuPrice Vigor
3 Free Free ... SkuPrice Rocket League®
4 Free Free ... SkuPrice Fortnite
.. ... ... ... ... ...
472 $39.99 $9.99 ... SkuPrice MX vs. ATV Supercross Encore
473 $29.99 $29.99 ... SkuPrice ASTRO BOT Rescue Mission
474 $33.49 $33.49 ... SkuPrice Descenders
475 $54.99 $21.99 ... SkuPrice Fallout 76
476 $24.99 $24.99 ... SkuPrice LEGO® Jurassic World™
[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
basePrice discountedPrice discountText isFree isExclusive serviceBranding upsellServiceBranding upsellText __typename title
5 $49.99 $24.99 -50% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_NOW']} Included SkuPrice Days Gone
13 $39.99 $11.99 -70% False False {'type': 'json', 'json': []} None None SkuPrice Fallout 4
16 $26.99 $13.49 -50% False False {'type': 'json', 'json': []} None None SkuPrice RESIDENT EVIL 7 biohazard
22 $79.99 $38.39 -52% False False {'type': 'json', 'json': []} None None SkuPrice NHL® 21
23 $39.99 $27.99 -30% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_PLUS']} Save 5% more SkuPrice Wreckfest
数据在html源代码中是json格式的。你可以把它拉出来,也可以解析它 只需过滤数据框即可显示所需内容
import requests
import pandas as pd
import json
from bs4 import BeautifulSoup
rows = []
for page in range(1,11):
url = 'https://store.playstation.com/en-ca/category/85448d87-aa7b-4318-9997-7d25f4d275a4/%s' %page
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
jsonStr = soup.find_all('script',{'type':'application/json'})[2].text
jsonData = json.loads(jsonStr)
state = jsonData['props']['apolloState']
print ('Page: %s' %page)
for k, v in state.items():
if 'Product:' in k and '.price' in k:
skuId = k.split('.price')[0][1:]
title = jsonData['props']['apolloState'][skuId]['name']
v.update({'title':title})
rows.append(v)
df = pd.DataFrame(rows)
输出:
print (df)
basePrice discountedPrice ... __typename title
0 $0.00 Included ... SkuPrice Just Cause 4
1 $6.99 $6.99 ... SkuPrice Rocket Arena
2 Free Free ... SkuPrice Vigor
3 Free Free ... SkuPrice Rocket League®
4 Free Free ... SkuPrice Fortnite
.. ... ... ... ... ...
472 $39.99 $9.99 ... SkuPrice MX vs. ATV Supercross Encore
473 $29.99 $29.99 ... SkuPrice ASTRO BOT Rescue Mission
474 $33.49 $33.49 ... SkuPrice Descenders
475 $54.99 $21.99 ... SkuPrice Fallout 76
476 $24.99 $24.99 ... SkuPrice LEGO® Jurassic World™
[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
basePrice discountedPrice discountText isFree isExclusive serviceBranding upsellServiceBranding upsellText __typename title
5 $49.99 $24.99 -50% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_NOW']} Included SkuPrice Days Gone
13 $39.99 $11.99 -70% False False {'type': 'json', 'json': []} None None SkuPrice Fallout 4
16 $26.99 $13.49 -50% False False {'type': 'json', 'json': []} None None SkuPrice RESIDENT EVIL 7 biohazard
22 $79.99 $38.39 -52% False False {'type': 'json', 'json': []} None None SkuPrice NHL® 21
23 $39.99 $27.99 -30% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_PLUS']} Save 5% more SkuPrice Wreckfest
显示折扣:
print (df)
basePrice discountedPrice ... __typename title
0 $0.00 Included ... SkuPrice Just Cause 4
1 $6.99 $6.99 ... SkuPrice Rocket Arena
2 Free Free ... SkuPrice Vigor
3 Free Free ... SkuPrice Rocket League®
4 Free Free ... SkuPrice Fortnite
.. ... ... ... ... ...
472 $39.99 $9.99 ... SkuPrice MX vs. ATV Supercross Encore
473 $29.99 $29.99 ... SkuPrice ASTRO BOT Rescue Mission
474 $33.49 $33.49 ... SkuPrice Descenders
475 $54.99 $21.99 ... SkuPrice Fallout 76
476 $24.99 $24.99 ... SkuPrice LEGO® Jurassic World™
[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
basePrice discountedPrice discountText isFree isExclusive serviceBranding upsellServiceBranding upsellText __typename title
5 $49.99 $24.99 -50% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_NOW']} Included SkuPrice Days Gone
13 $39.99 $11.99 -70% False False {'type': 'json', 'json': []} None None SkuPrice Fallout 4
16 $26.99 $13.49 -50% False False {'type': 'json', 'json': []} None None SkuPrice RESIDENT EVIL 7 biohazard
22 $79.99 $38.39 -52% False False {'type': 'json', 'json': []} None None SkuPrice NHL® 21
23 $39.99 $27.99 -30% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_PLUS']} Save 5% more SkuPrice Wreckfest
输出:
print (df)
basePrice discountedPrice ... __typename title
0 $0.00 Included ... SkuPrice Just Cause 4
1 $6.99 $6.99 ... SkuPrice Rocket Arena
2 Free Free ... SkuPrice Vigor
3 Free Free ... SkuPrice Rocket League®
4 Free Free ... SkuPrice Fortnite
.. ... ... ... ... ...
472 $39.99 $9.99 ... SkuPrice MX vs. ATV Supercross Encore
473 $29.99 $29.99 ... SkuPrice ASTRO BOT Rescue Mission
474 $33.49 $33.49 ... SkuPrice Descenders
475 $54.99 $21.99 ... SkuPrice Fallout 76
476 $24.99 $24.99 ... SkuPrice LEGO® Jurassic World™
[477 rows x 10 columns]
discounted_df = df[~df['discountText'].isnull()]
print(discounted_df.head(5).to_string())
basePrice discountedPrice discountText isFree isExclusive serviceBranding upsellServiceBranding upsellText __typename title
5 $49.99 $24.99 -50% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_NOW']} Included SkuPrice Days Gone
13 $39.99 $11.99 -70% False False {'type': 'json', 'json': []} None None SkuPrice Fallout 4
16 $26.99 $13.49 -50% False False {'type': 'json', 'json': []} None None SkuPrice RESIDENT EVIL 7 biohazard
22 $79.99 $38.39 -52% False False {'type': 'json', 'json': []} None None SkuPrice NHL® 21
23 $39.99 $27.99 -30% False False {'type': 'json', 'json': []} {'type': 'json', 'json': ['PS_PLUS']} Save 5% more SkuPrice Wreckfest
如果没有更多细节,我可以肯定地告诉您,您正在调用的
.get_text()
对象之一与您想象的不同。它实际上是null(或python中的非类型)。我建议观看container
,salePercentContainer
,等等。其中一个没有解决任何问题。它可能是一个容器,在尝试执行get_text
nontype
操作之前,您可能需要检查它是否为null,这意味着None
并且意味着它在页面上找不到元素-所以您尝试执行None.get_text()
页面可能使用JavaScript添加元素,但BeatifulSoup
/请求
无法运行JavaScript。您可能需要控制可以运行JavaScript的真实web浏览器。顺便说一句:关闭web浏览器中的JavaScript并再次加载页面,以查看Beautifulsoup可以从服务器中获得什么。如果页面在没有JavaScript的情况下工作,则应检查在页面\u html
中获得的内容,即使用打印()
或保存在文件中并在web浏览器中打开。也许服务器认识到您使用了脚本,并且它发送了带有机器人警告或验证码的HTML。@DoloMike证明salePercentContainer为空,谢谢。没有更多详细信息,我可以肯定地告诉您,您正在调用的对象之一。get_text()
与您想象的不同。它实际上是null(或python中的非类型)。我建议观看container
,salePercentContainer
,等等。其中一个没有解决任何问题。它可能是一个容器,在尝试执行get_text
nontype
操作之前,您可能需要检查它是否为null,这意味着None
并且意味着它在页面上找不到元素-所以您尝试执行None.get_text()
页面可能使用JavaScript添加元素,但BeatifulSoup
/请求
无法运行JavaScript。您可能需要控制可以运行JavaScript的真实web浏览器。顺便说一句:关闭web浏览器中的JavaScript并再次加载页面,以查看Beautifulsoup可以从服务器中获得什么。如果页面在没有JavaScript的情况下工作,则应检查在页面\u html
中获得的内容,即使用打印()
或保存在文件中并在web浏览器中打开。也许服务器识别出您使用了脚本,并发送了带有机器人警告或验证码的HTML。@DoloMike发现salePercentContainer为空,谢谢。