Python WebScraping-BS4仅查找标记

Python WebScraping-BS4仅查找标记,python,python-3.x,web-scraping,beautifulsoup,Python,Python 3.x,Web Scraping,Beautifulsoup,我编写了一个脚本,可以自动比较不同网站(instantgaming、G2A等)上游戏的价格。下面的脚本对某些站点有效,但对其他站点无效。代码如下所示: import bs4 import requests res1 = requests.get('https://www.g2a.com/?search=dead%20by%20daylight') res1.raise_for_status() soup = bs4.BeautifulSoup(res1.text,'html.parser') e

我编写了一个脚本,可以自动比较不同网站(instantgaming、G2A等)上游戏的价格。下面的脚本对某些站点有效,但对其他站点无效。代码如下所示:

import bs4
import requests
res1 = requests.get('https://www.g2a.com/?search=dead%20by%20daylight')
res1.raise_for_status()
soup = bs4.BeautifulSoup(res1.text,'html.parser')
elems = soup.find('div', {'id': 'content-landing'})
children = elems.find('div', {'class': 'mp-product-info'})
price = children.find('strong', {'class': 'mp-pi-price-min'})
price.text.strip()
<strong class="mp-pi-price-min">10,16€</strong>
问题是price变量包含正确的标记

<strong class="mp-pi-price-min"></strong>

但它不存储价格(根据浏览器,它应该是这样的:)

10,16欧元

相反,使用CSS选择器执行相同的代码会返回相同的结果。

如果您打开Chrome Developer Tools或Firebug,您将看到当您请求该页面时,它将通过
XHR
调用一个返回游戏和价格的服务

您需要对不需要的内容进行条带化,将其解析为json并获得结果

下面是该调用的一个示例:

from bs4 import BeautifulSoup
import requests
import re
import json

response = requests.get('https://www.g2a.com/lucene/search/filter?jsoncallback=jQuery111002521088376353553_1491736907010&skip=28837%2C28838%2C28847%2C28849%2C28852%2C28856%2C28857%2C28858%2C28859%2C28860%2C28861%2C28862%2C28863%2C28867%2C28868%2C28869%2C29472%2C29473%2C29474%2C29475%2C29476%2C29482%2C29486%2C33104&minPrice=0.00&maxPrice=640.00&cn=&kr=&stock=all&event=&platform=0&search=dead+by+daylight&genre=0&cat=0&sortOrder=popularity+desc&start=0&rows=12&steam_app_id=&steam_category=&steam_prod_type=&includeOutOfStock=false&includeFreeGames=false&_=1491736907012')
json_object = json.loads('{"data":%s}}' % (response.content.decode("utf-8").replace("jQuery111002521088376353553_1491736907010(", "")[:-2].replace("\'", "")))
for game in json_object["data"]["docs"]:
    print ("Name: %s (Price: %s)" % (game["name"], game["minPrice"]))
它将打印:

名称:日光蒸汽CD-KEY GLOBAL(价格:10.16)

名称:日光蒸汽CD-KEY LATAM(价格:5)

名称:光天化日之下死亡-血肉之躯DLC STEAM CD-KEY GLOBAL(价格:4.99)

名称:Dead by Daylight豪华版STEAM CD-KEY GLOBAL(价格:13.99)

名称:日光蒸汽CD-KEY RU/CIS(价格:4.95)

名称:日光下死亡-80年代手提箱DLC STEAM CD-KEY GLOBAL(价格:2.99)

名称:日光下的死蒸汽CD-KEY SEA(价格:6)

还要注意的是,您需要为您要搜索的游戏更改
&search=…
部分(URLCoded),并为当前unix时间戳更改
部分