使用Python抓取数据并接收不同于DevTools的html树_Python_Web Scraping_Beautifulsoup

使用Python抓取数据并接收不同于DevTools的html树

python web-scraping

使用Python抓取数据并接收不同于DevTools的html树,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正试图从zara.com网站上搜集数据，我已经弄明白了如何用列表中的一组项目解析父元素，但我想深入挖掘并打开每个项目链接，获取关于它的附加信息因此，我使用了这种代码： import requests import time from bs4 import BeautifulSoup ListWithRequests = ['https://www.zara.com/nl/en/plain-shirt-p06608389.html'] # In this example only on

我正试图从zara.com网站上搜集数据，我已经弄明白了如何用列表中的一组项目解析父元素，但我想深入挖掘并打开每个项目链接，获取关于它的附加信息

因此，我使用了这种代码：

import requests
import time
from bs4 import BeautifulSoup



ListWithRequests = ['https://www.zara.com/nl/en/plain-shirt-p06608389.html'] # In this example only one item

for item in ListWithRequests:
    
    response = requests.get(item,verify=False)
    soup2 = BeautifulSoup(response.text, "html.parser")
    soup2.prettify()
    time.sleep(1)
    f = open("demo.html","w+")
    f.write(response.text)

例如，我想在dev tools的block中接收项目的价格

<span class="main_price">25.95 EUR</span>

25.95欧元

或项目ID

<div clas="product-info-wrapper _product-info">
  <p class="product-color">
    <span class="_colorName">**White**
    </span>
  </p>
</div>



**白色的**

但是在demo.html文件中，我收到了完全不同的树，并且找不到我需要的任何元素

请告知我做错了什么

页面是通过

JavaScript

加载的，因此

bs4

将无法呈现它。对于这种情况，您可以使用

selenium

，但我注意到，您查找的数据实际上显示在

script

标记中，您可以使用

JSON

轻松加载它，或者为了快速捕获，我使用了

re

：

导入请求
进口稀土
def主（url）：
r=请求。获取（url）
price=re.search（r'\“price\”：\“（.*？\”，r.text）。组（1）
印刷品（价格）
主要（”https://www.zara.com/nl/en/plain-shirt-p06608389.html")

输出：

25.95
页面是通过JavaScript
加载的，因此bs4
将无法呈现该页面。对于这种情况，您可以使用selenium
，但我注意到，您查找的数据实际上显示在script
标记中，您可以使用JSON
轻松加载它，或者为了快速捕获，我使用了re
：
导入请求
进口稀土
def主（url）：
r=请求。获取（url）
price=re.search（r'\“price\”：\“（.*？\”，r.text）。组（1）
印刷品（价格）
主要（”https://www.zara.com/nl/en/plain-shirt-p06608389.html")

输出：
25.95