Python 我能';t使用Beautifulsoup刮取src图像链接
因此,我正试图从Python 我能';t使用Beautifulsoup刮取src图像链接,python,beautifulsoup,python-requests,Python,Beautifulsoup,Python Requests,因此,我正试图从邮戳上的餐厅中获取食物图像链接。例如,我正在这家餐厅尝试:https://postmates.com/merchant/fruitive-washington-96807 带有图像链接的src,对我来说导出它似乎有点困难。我尝试了我知道的一切,但没有结果。我总是有这样的返回:[]或列表索引超出范围错误或无类型错误和一般错误 该页面的代码如下所示: <div id="" class="e1tw3vxs2 css-aktk0j e1qfcze90&
邮戳上的餐厅中获取食物图像链接。例如,我正在这家餐厅尝试:https://postmates.com/merchant/fruitive-washington-96807
带有图像链接的src
,对我来说导出它似乎有点困难。我尝试了我知道的一切,但没有结果。我总是有这样的返回:[]
或列表索引超出范围错误
或无类型错误
和一般错误
该页面的代码如下所示:
<div id="" class="e1tw3vxs2 css-aktk0j e1qfcze90">
<div>
<img alt="Spring Pesto from Fruitive. Order online." src="https://raster-static.postmates.com/?
url=https%3A%2F%2Fitems-static.postmates.com%2Fuploads%2Fmedia%2F7b289988-5d19-4cfc-80a6- ce88a7a05f41%2Foriginal.jpg%3Fv%3D63784935843&quality=85&w=320&h=0&mode=auto&format=webp&v=4"
class="css-1hyfx7x e1qfcze94">
<div title="Spring Pesto from Fruitive. Order online." class="css-1ggm7mr e1qfcze91"
style="background-image: url("https://raster-static.postmates.com/?url=https%3A%2F%2Fitems-
static.postmates.com%2Fuploads%2Fmedia%2F7b289988-5d19-4cfc-80a6-ce88a7a05f41%2Foriginal.jpg%3Fv%3D63784935843&quality=85&w=320&h=0&mode=auto&
format=webp&v=4"); opacity: 1;"></div>
</div>
<div class="css-f85l49 e1qfcze92"></div>
</div>
有人在这里有解决方案吗?您在页面上看到的信息是动态呈现的,数据是以JSON格式嵌入的。您可以使用以下示例说明如何使用re
/json
模块加载它:
import re
import json
import requests
url = "https://postmates.com/merchant/fruitive-washington-96807"
html_doc = requests.get(url).text
data = re.search(r"window\.__PRELOADED_STATE__ = ({.*?});", html_doc).group(1)
data = json.loads(data)
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for cat in data["cart"]["categories"]:
for product in cat["products"]:
# print only products with image:
if "img" in product:
print(
"{:<30} {}".format(
product["name"], product["img"]["originalUrl"]
)
)
成功了!谢谢你,伙计!一个问题。什么“{:@bilakos这只是字符串格式。这意味着第一项(产品名称)应该用(30个字符)来固定。{:再次感谢你,伙计!
import re
import json
import requests
url = "https://postmates.com/merchant/fruitive-washington-96807"
html_doc = requests.get(url).text
data = re.search(r"window\.__PRELOADED_STATE__ = ({.*?});", html_doc).group(1)
data = json.loads(data)
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for cat in data["cart"]["categories"]:
for product in cat["products"]:
# print only products with image:
if "img" in product:
print(
"{:<30} {}".format(
product["name"], product["img"]["originalUrl"]
)
)