Python BeautifulSoup查找变量内的数据
我正在尝试使用BeautifulSoup从网站获取一些数据。数据返回如下Python BeautifulSoup查找变量内的数据,python,beautifulsoup,Python,Beautifulsoup,我正在尝试使用BeautifulSoup从网站获取一些数据。数据返回如下 window._sharedData = { "config": { "csrf_token": "DMjhhPBY0i6ZyMKYQPjMjxJhRD0gkRVQ", "viewer": null, "viewerId": null }, "country_code": "IN", "language_code": "en", "locale": "en_US" } 如何将其
window._sharedData = {
"config": {
"csrf_token": "DMjhhPBY0i6ZyMKYQPjMjxJhRD0gkRVQ",
"viewer": null,
"viewerId": null
},
"country_code": "IN",
"language_code": "en",
"locale": "en_US"
}
如何将其导入
json.loads
以便提取数据?您需要先将其更改为json格式,方法是删除变量名并将其解析为字符串:
import json
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
text = soup.find('script').text
text = text.replace('window._sharedData = ', '')
data = json.loads(text)
country_code = data['country_code']
或者可以使用将其转换为python字典。为此,您需要将json类型替换为python并以字典格式进行解析:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html)
text = soup.find('script').text
text = text.replace('null', None)
text = text.replace('window._sharedData = ', '')
data = eval(text)
country_code = data['country_code']
那么shareData是一本字典?你想提取什么数据?这些数据是用ajax加载的还是你通过说
curl
得到的html的一部分?@iamkhush-its-in-scripts-of-html.@danimesjo我想提取country\u-code
或者如果我能得到整个JSON,我可以使用它。缩短了问题的JSON。窗口。\u sharedData[“country\u code”]
?