Pandas 塞恩斯伯里的靓汤一无所获';s
类似于,但适用于不同的站点: 我试着跑步:Pandas 塞恩斯伯里的靓汤一无所获';s,pandas,beautifulsoup,Pandas,Beautifulsoup,类似于,但适用于不同的站点: 我试着跑步: url='https://www.sainsburys.co.uk/gol-ui/SearchDisplayView?filters[keyword]=banana' # configure driver chrome_options = webdriver.ChromeOptions() chrome_options.add_argument("--headless") chrome_driver = os.getcwd() +
url='https://www.sainsburys.co.uk/gol-ui/SearchDisplayView?filters[keyword]=banana'
# configure driver
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--headless")
chrome_driver = os.getcwd() + "\\chromedriver.exe" # IF NOT IN SAME FOLDER CHANGE THIS PATH
driver = webdriver.Chrome(options=chrome_options, executable_path=chrome_driver)
driver.get(url)
page = driver.page_source
page_soup = soup(page,'html.parser')
container_tag1='pt__content'
containers = page_soup.findAll("div",{"class":container_tag1})
# print(containers)
print(len(containers))
无济于事
我尝试不使用硒,但也失败了
有什么建议吗?您必须等待页面完全呈现,然后才能将HTML传递给
BeautifulSoup
。一个选项是使用内置模块中的方法
从时间导入睡眠
从selenium导入webdriver
从bs4导入BeautifulSoup
URL=”https://www.sainsburys.co.uk/gol-ui/SearchDisplayView?filters[关键字]=香蕉“
driver=webdriver.Chrome(r“c:\path\to\chromedriver.exe”)
获取驱动程序(URL)
睡眠(5)#我还加了15秒的睡眠,但对我来说仍然不起作用
from time import sleep
from selenium import webdriver
from bs4 import BeautifulSoup
URL = "https://www.sainsburys.co.uk/gol-ui/SearchDisplayView?filters[keyword]=banana"
driver = webdriver.Chrome(r"c:\path\to\chromedriver.exe")
driver.get(URL)
sleep(5) # <-- Wait for the page to fully render
soup = BeautifulSoup(driver.page_source, "html.parser")
print(soup.find_all("div", {"class": "pt__content"}))