我如何使用python中的BeautifulSoup来刮取所有结果都在一个页面上的网站的所有结果？_Python_Html_Beautifulsoup_Screen Scraping

我如何使用python中的BeautifulSoup来刮取所有结果都在一个页面上的网站的所有结果？

python html

我如何使用python中的BeautifulSoup来刮取所有结果都在一个页面上的网站的所有结果？,python,html,beautifulsoup,screen-scraping,Python,Html,Beautifulsoup,Screen Scraping,我正在尝试从中获取所有搜索结果。如果你访问该网站，你会看到在结果的底部有一个按钮显示更多的结果，这将继续，直到没有更多的结果。我不知道如何从所有结果中提取数据，然后检查是否完成。我下面的代码适用于最初显示在结果页面上的内容谢谢你的帮助 import requests from bs4 import BeautifulSoup headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KH

我正在尝试从中获取所有搜索结果。如果你访问该网站，你会看到在结果的底部有一个按钮显示更多的结果，这将继续，直到没有更多的结果。我不知道如何从所有结果中提取数据，然后检查是否完成。我下面的代码适用于最初显示在结果页面上的内容

谢谢你的帮助

import requests
from bs4 import BeautifulSoup

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36', "Upgrade-Insecure-Requests": "1","DNT": "1","Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8","Accept-Language": "en-US,en;q=0.5","Accept-Encoding": "gzip, deflate"}
html = requests.get("https://www.carmax.com/cars/all",headers=headers)
soup = BeautifulSoup(html.content, 'html.parser')

tiles = soup.find_all('div', class_='car-tile')

n = 0
for tile in tiles:
    yearmake = tile.find('span', class_='year-make').text.strip()
    modeltrim = tile.find('span', class_='model-trim').text.strip()
    print('TILE ' + str(n) + ': ym=' + yearmake + ', mt=' + modeltrim)
    n = n + 1

还没有足够的声誉来回复，但可以使用此帖子作为指导：

注意：此方法要求您使用

主要的想法是你可以点击这个按钮来加载更多的结果，因此，找到该按钮的id，然后单击它。另外，正如我在帖子中链接到notes的建议，您可以添加一些

sleep

s以允许加载新结果并重新显示“查看更多匹配项”选项

您的

while

循环条件可以检查

car tile

div的数量是否在变化。一旦点击后它没有改变，你可以假设你已经收集了所有的结果

也可能存在这样一种情况：在获取所有结果后，按钮将不再出现；

在这种情况下，这可能是

while

循环的另一个退出条件。

我将使用selenium先加载所有内容，然后再将其删除。这是我尝试过的，并到达了页面底部，因此完整的数据现在可以刮取。对不起，我的编码很差，我也是新手

chrome驱动程序可以从这里获得：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.common.exceptions import ElementClickInterceptedException, NoSuchElementException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver import ActionChains
from time import sleep

chromeOptions = Options()
chromeOptions.add_argument("--kiosk")
driver = webdriver.Chrome(executable_path="YOUR DRIVER PATH HERE", chrome_options=chromeOptions)
driver.get("https://www.carmax.com/cars/all")

wait = WebDriverWait(driver, timeout=10)
actions = ActionChains(driver)


for i in range(100):
    # scroll to bottom
    see_more_cars = driver.find_element_by_xpath('//*[@id="see-more"]/div')
    driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", see_more_cars)
    try:
        # if button available to see more cars, click it
        wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@id="see-more"]/div/a'))).click()
        sleep(10)
    except ElementClickInterceptedException:
        # if button NOT available to see more cars, wait then click
        sleep(10)
        driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", see_more_cars)
        wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@id="see-more"]/div/a'))).click()
    except NoSuchElementException:
        # if button no longer available, break
        break