Python 使用BeautifulSoup获取信息并使其可提取_Python_Web Scraping_Beautifulsoup

Python 使用BeautifulSoup获取信息并使其可提取

python web-scraping

Python 使用BeautifulSoup获取信息并使其可提取,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我对BeautifulSoup的网站浏览还不熟悉，我想从zalando.de中提取一些信息我已经在那一行找到了我需要的信息（价格、商品编号等）。是否可以将此行保存为可访问的数据类型（如字典），以按其键提取信息 from bs4 import BeautifulSoup import requests source = requests.get("https://en.zalando.de/carhartt-wip-hooded-chase-sweatshirt-c1422s02x-g13.h

我对BeautifulSoup的网站浏览还不熟悉，我想从zalando.de中提取一些信息

我已经在那一行找到了我需要的信息（价格、商品编号等）。是否可以将此行保存为可访问的数据类型（如字典），以按其键提取信息

from bs4 import BeautifulSoup
import requests

source = requests.get("https://en.zalando.de/carhartt-wip-hooded-chase-sweatshirt-c1422s02x-g13.html?_rfl=de").text
soup = BeautifulSoup(source, "lxml")
scr = soup.find("script", id = "z-vegas-pdp-props").text

是的，您可以将其保存为字典（确切地说是JSON）。您可以使用模块将字符串转换为json

首先需要将文本转换为有效的json。您可以通过删除无效部分来完成此操作

from bs4 import BeautifulSoup
import requests
import json

source = requests.get("https://en.zalando.de/carhartt-wip-hooded-chase-sweatshirt-c1422s02x-g13.html?_rfl=de").text
soup = BeautifulSoup(source, "lxml")
scr = soup.find("script", id = "z-vegas-pdp-props").text

data = json.loads(scr.lstrip('<![CDATA').rstrip(']>'))
print(data['layout'])
# cover

从bs4导入美化组
导入请求
导入json
source=请求。get（“https://en.zalando.de/carhartt-wip-hooded-chase-sweatshirt-c1422s02x-g13.html?_rfl=de）。文本
汤=美汤（来源，“lxml”）
scr=soup.find（“脚本”，id=“z-vegas-pdp-props”）.text
data=json.load（scr.lstrip（“”））
打印（数据['layout']）
#掩护

改进答案。下面的代码为您提供了所需的字典，您可以从中访问问题中提供的所需信息，这比依赖原始嵌套dict更容易

from bs4 import BeautifulSoup
import requests
import json

source = requests.get("https://en.zalando.de/carhartt-wip-hooded-chase-sweatshirt-c1422s02x-g13.html?_rfl=de").text
soup = BeautifulSoup(source, "lxml")
scr = soup.find("script", id = "z-vegas-pdp-props").text

data = json.loads(scr.lstrip('<![CDATA').rstrip(']>'))
desired_data = dict(data['model']['articleInfo'])
print(desired_data)

您可以使用jsonify再次对输出进行jsonify

json_output = json.dumps(desired_data)

无法访问给定页面。请提供实际页面和所需输出。谢谢。对不起，我修复了链接。因为输出很长，所以我没有发布它。

json_output = json.dumps(desired_data)