Python Beautifulsoup无法找到超过24个包含find_all的类
我正在尝试从页面中替换数据,其中所有项都是这样存储的Python Beautifulsoup无法找到超过24个包含find_all的类,python,html,web-scraping,beautifulsoup,html-parsing,Python,Html,Web Scraping,Beautifulsoup,Html Parsing,我正在尝试从页面中替换数据,其中所有项都是这样存储的 有数百个,但当我尝试将它们添加到阵列中时,只节省了24个 from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup import re import lxml my_url = 'https://www.alza.co.uk/tablets/18852388.htm' uClient = uReq(my_url) page_ht
有数百个,但当我尝试将它们添加到阵列中时,只节省了24个
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
import re
import lxml
my_url = 'https://www.alza.co.uk/tablets/18852388.htm'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "lxml")
classname = "box browsingitem"
containers = page_soup.find_all("div", {"class":re.compile(classname)})
#len(containers) will be equal to 24
for container in containers:
title_container = container.find_all("a",{"class":"name browsinglink"})
product_name = title_container[0].text
print("product_name: " + product_name)
这是重新编译的问题吗?我还可以怎样搜索这些类
感谢您的帮助因此,在本例中,当您访问页面时,DOM中只加载了24个项目。我想到的两个选项是:1)使用无头浏览器单击“加载更多”按钮并将更多项目加载到DOM;2)创建简单的分页方案并循环浏览这些页面 以下是第二个选项的示例:
for page in range(0, 10):
print("Trying page # {}".format(page))
if page == 0:
my_url = 'https://www.alza.co.uk/tablets/18852388.html'
else:
my_url = 'https://www.alza.co.uk/tablets/18852388-p{}.html'.format(page)
requests.get(my_url)
page_html = requests.get(my_url)
page_soup = soup(page_html.content, "lxml")
items = page_soup.find_all('div', {"class": "browsingitem"})
print("Found a total of {}".format(len(items)))
for item in items:
title = page_soup.find('a', 'browsinglink')
您可以看到URL内置了分页信息,所以您所需要做的就是确定要刮取多少页,然后保存所有这些信息。以下是输出:
Trying page # 0
Found a total of 24
Trying page # 1
Found a total of 24
Trying page # 2
Found a total of 24
Trying page # 3
Found a total of 24
Trying page # 4
Found a total of 24
Trying page # 5
Found a total of 24
Trying page # 6
Found a total of 24
Trying page # 7
Found a total of 24
Trying page # 8
Found a total of 17
Trying page # 9
Found a total of 0
是否可能在滚动时加载了这数百个项目?如果所有这些项目都包含类名
box browsingitem
,为什么不直接执行page\u soup.find\u all('div',box browsingitem')
。它应该检索DOM中加载的该类的所有项。@taras是,但它加载24项,例如,即使有18项。。。真正地weird@Steven由于某些原因,它不起作用,它加载了0。你能提供你试图获取的链接吗@布莱斯