Python 刮取表单内的内容-BeautifulSoup_Python_Web Scraping_Beautifulsoup

Python 刮取表单内的内容-BeautifulSoup

python web-scraping

Python 刮取表单内的内容-BeautifulSoup,python,web-scraping,beautifulsoup,Python,Web Scraping,Beautifulsoup,我正在尝试使用BeautifulSoup和Python3.5来刮取页面。具体来说，我对尺寸的数量感兴趣。在此特定页面中，尺寸的数量为3（S、M、L）。这些信息可以在html代码的表单中找到我尝试过的代码是： import requests from bs4 import BeautifulSoup page = requests.get('http://www.bendonlingerie.com.au/pleasure-state-d-arcy-delatour-soft-cup-bra-

我正在尝试使用BeautifulSoup和Python3.5来刮取页面。具体来说，我对尺寸的数量感兴趣。在此特定页面中，尺寸的数量为3（S、M、L）。这些信息可以在html代码的表单中找到

我尝试过的代码是：

import requests
from bs4 import BeautifulSoup

page = requests.get('http://www.bendonlingerie.com.au/pleasure-state-d-arcy-delatour-soft-cup-bra-jester-red-p21-2346w')
soup=BeautifulSoup(page.content,'html.parser')
right = soup.find("div", class_="product-shop")
sizes = right.find("div", id="sizes")
sizes = sizes.find("ul", class_="button-size-list combo-list")
sizes = sizes.find_all("li")
nu_of_sizes = len(sizes)
print(nu_of_sizes)

此代码打印“0”。正确的打印应该是“3”，因为有3种尺寸（S、M、L）。我不想使用selenium或类似的软件包。有没有一种方法可以使用BeautifulSoup“捕获”这些数据？

如果仔细检查页面源代码，您会注意到您感兴趣的数据是

json

格式（右键单击页面，查看页面源代码，搜索

productJson

）。因此，您可以检查它的开始和结束位置，并使用

json.loads（）

将该片段反序列化为Python对象：

输出：

这是SFW链接吗谢谢你。我怎样才能检查它从哪里开始，从哪里结束？@nesi：嗯，你必须手动检查页面源代码才能找到答案。