使用python过滤和webdriver.find函数与selenium进行Webscraping_Python_Html_Selenium_Web Scraping

使用python过滤和webdriver.find函数与selenium进行Webscraping

python html selenium web-scraping

使用python过滤和webdriver.find函数与selenium进行Webscraping,python,html,selenium,web-scraping,Python,Html,Selenium,Web Scraping,我正在筛选本网站上出现的固定收益产品：基本上，这一页有一些卡片，我想知道每一页有多少卡片出现。例如，选择cdb作为类型和3个月，它会显示16张卡，但如果输入其他月份或产品类型，它可能会显示较少的卡到目前为止，我知道在查看“investmentCardContainer\uuuu footer”时可能会出现多少页面，这是一个类，但卡片的数量看起来像是样式，我不知道如何使用selenium webdriver.find函数查找以下是我想要的提示：它的想法是得到这个数量的卡片，并在一个循环中

我正在筛选本网站上出现的固定收益产品：

基本上，这一页有一些卡片，我想知道每一页有多少卡片出现。例如，选择cdb作为类型和3个月，它会显示16张卡，但如果输入其他月份或产品类型，它可能会显示较少的卡

到目前为止，我知道在查看“investmentCardContainer\uuuu footer”时可能会出现多少页面，这是一个类，但卡片的数量看起来像是样式，我不知道如何使用selenium webdriver.find函数查找

以下是我想要的提示：

它的想法是得到这个数量的卡片，并在一个循环中使用它来获得向量中聚合的卡片信息

    vetor = ["cdb","lca","lci"]
    dataset_boxes =[]
    now = time.time()
    for i in vetor:
      options = Options()
      options.add_argument('--headless')
      url = 'https://yubb.com.br/investimentos/renda-fixa?investment_type={}&months=12\
        &principal=1000000.0&sort_by=net_return'.format(i)
      driver = webdriver.Chrome("C:\\Users\\yourpath\\Desktop\\PYTHON\\chromedriver.exe",options=options)
      driver.get(url)
      time.sleep(1)
      num_pages = driver.find_element_by_class_name("investmentCardContainer__footer").text
      list_pages = Convert(num_pages)
      last_page  = int(list_pages[len(list_pages)-3])
      driver.quit()
        for j in range(1,last_page+1):
          url2 = 'https://yubb.com.br/investimentos/renda-fixa?collection_page={}&investment_type={}&months=12\
            &principal=1000000.0&sort_by=net_return'.format(j,i)
          driver = webdriver.Chrome("C:\\Users\\yourpath\\Desktop\\PYTHON\\chromedriver.exe",options=options)
          driver.get(url2)
          num_boxes  = driver.find_element_by_class_name("investmentCardContainer__body").text
          list_boxes = Convert(num_boxes)
          dataset_boxes.append(list_boxes)
          driver.quit()
    print('idk')
    later = time.time()
    difference = int(later - now)
    print('Processo finalizado em {} segundos.'.format(difference))

使用

WebDriverWait

并遵循

xpath

获取

页数

计数

print(WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'(//span[@class="page"]//a)[last()]'))).text)

您需要有以下导入来执行上述代码

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

对于此链接：

https://yubb.com.br/investimentos/renda-fixa?investment_type=cdb&months=3&principal=10000000.0&sort_by=minimum_investment

它应该会返回：

谢谢你的回答，但正如我所说的，我知道页数。我不知道的是每页卡片的数量。以您的链接为例，我们在该页面中有16张卡片。您是否在第二个链接中尝试了此代码。我的意思是“内部for循环”。请使用html检查第二个链接。