(Python)不能再使用re、请求和json从我的目标站点抓取数据
我遇到了一个问题,我可以使用java路径从网站上刮取数据。我正试着从火箭联盟追踪器上搜刮。 这是我的密码:(Python)不能再使用re、请求和json从我的目标站点抓取数据,python,json,web,web-scraping,re,Python,Json,Web,Web Scraping,Re,我遇到了一个问题,我可以使用java路径从网站上刮取数据。我正试着从火箭联盟追踪器上搜刮。 这是我的密码: import requests import re import json import math def rankGetter(): trackerLink = 'https://rocketleague.tracker.network/rocket-league/profile/epic/DirectPanda/overview' # now we have th
import requests
import re
import json
import math
def rankGetter():
trackerLink = 'https://rocketleague.tracker.network/rocket-league/profile/epic/DirectPanda/overview'
# now we have the tracker link we're going to scrape the website
# all the HTML of the site is now in result
result = requests.get(trackerLink)
# checker to make sure the user used the correct information
if result.status_code == 400:
print('profile not found')
else:
# Extract everything needed to render the current page. Data is stored as Json in the
# JavaScript variable: window.__INITIAL_STATE__={"route":{"path":"\u0 ... }};
json_string = re.search(r"window.__INITIAL_STATE__\s?=\s?(\{.*?\});", result.text).group(1)
# convert text string to structured json data
rocketleague = json.loads(json_string)
# Save structured json data to a text file that helps you orient yourself and pick
# the parts you are interested in.
with open('rocketleague_json_data.txt', 'w') as outfile:
outfile.write(json.dumps(rocketleague, indent=4, sort_keys=True))
错误是文档生成的文本不再具有我想要的级别
"stats": {
"standardLeaderboardLeaders": {},
"standardLeaderboards": [],
"standardPlayers": {},
"standardTitles": {}
},
**"stats-v2": {
"segments": {},
"standardProfileMatches": {},
"standardProfileSummaries": {},
"standardProfiles": {},
"standardProfilesHistory": {},
"standardSessions": {},
"subscriptions": {}
},**
"titles": {
"currentTitle": {
"name": "Rocket League",
"platforms": [
等级应该在stats-V2下,但是你现在可以看到它是空的。
发生了什么事,我该如何解决?我能够获得一周的排名,但今天突然停止工作。数据似乎是从外部URL加载的:
import json
import requests
url = "https://api.tracker.gg/api/v2/rocket-league/standard/profile/epic/DirectPanda"
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:87.0) Gecko/20100101 Firefox/87.0"
}
data = requests.get(url, headers=headers).json()
print(json.dumps(data, indent=4))
印刷品:
{
“数据”:{
“platformInfo”:{
“平台鼻涕虫”:“史诗”,
“platformUserId”:空,
“platformUserHandle”:“DirectPanda”,
“platformUserIdentifier”:“DirectPanda”,
“avatarUrl”:空,
“附加参数”:null
},
“用户信息”:{
“userId”:null,
“isPremium”:错误,
“isVerified”:假,
“isInfluencer”:错误,
“isPartner”:错误,
“countryCode”:空,
“customAvatarUrl”:空,
“customHeroUrl”:空,
“社会账户”:[],
“浏览量”:592,
“isSuspicious”:空
},
“元数据”:{
“最近更新”:{
“值”:“2021-04-22T17:39:42.277-04:00”,
“显示值”:“2021-04-22T21:39:42.2770000+00:00”
},
“玩家ID”:16603481,
“当前季节”:17
},
“部分”:[
{
“类型”:“概述”,
“属性”:{},
“元数据”:{
“名称”:“寿命”
},
“过期日期”:“0001-01-01T00:00:00+00:00”,
“统计数据”:{
“胜利”:{
“等级”:30357,
“百分位数”:98.3,
“displayName”:“Wins”,
“显示类别”:“性能”,
“类别”:“性能”,
“元数据”:{},
“价值”:4985,
“显示值”:“4985”,
“显示类型”:“编号”
},
“目标”:{
“排名”:23698,
“百分位数”:98.7,
“显示名称”:“目标”,
“显示类别”:“性能”,
“类别”:“性能”,
“元数据”:{},
“价值”:14363,
“显示值”:“14363”,
“显示类型”:“编号”
},
“MVP”:{
“排名”:35646,
“百分位数”:98.0,
“显示名称”:“MVP”,
“显示类别”:“性能”,
“类别”:“性能”,
“元数据”:{},
“价值”:2093,
“显示值”:“2093”,
“显示类型”:“编号”
},
“保存”:{
“排名”:30864,
“百分位数”:98.3,
“显示名称”:“保存”,
“显示类别”:“性能”,
“类别”:“性能”,
“元数据”:{},
“价值”:9231,
“显示值”:“9231”,
“显示类型”:“编号”
},
“协助”:{
“排名”:29228,
“百分位数”:98.4,
“显示名称”:“协助”,
“显示类别”:“性能”,
“类别”:“性能”,
“元数据”:{},
“价值”:4763,
“显示值”:“4763”,
“显示类型”:“编号”
},
“镜头”:{
“排名”:24596,
“百分位数”:98.6,
“显示名称”:“快照”,
“显示类别”:“性能”,
“类别”:“性能”,
“元数据”:{},
“价值”:29139,
“显示值”:“29139”,
“显示类型”:“编号”
},
“目标摄影率”:{
“等级”:1409320,
“百分位数”:15.0,
“显示名称”:“射门率”,
“显示类别”:“性能”,
“类别”:“性能”,
“元数据”:{},
“价值”:49.29132777377398,
“显示值”:“49.3”,
“显示类型”:“NumberPrecision1”
},
“分数”:{
“排名”:28260,
“百分位数”:98.4,
“显示名称”:“TRN分数”,
“显示类别”:“一般”,
“类别”:“一般”,
“元数据”:{},
“价值”:2398222.83,
“显示值”:“2398222.8”,
“显示类型”:“NumberPrecision1”
},
“季节性报酬水平”:{
“秩”:空,
“百分位数”:85.0,
“显示名称”:“季节奖励等级”,
“显示类别”:“一般”,
“类别”:“一般”,
“元数据”:{