如何通过Python使用ChromeDriver Chrome和Selenium在页面上打印类别链接？_Python_Selenium_Google Chrome_Selenium Chromedriver_Webdriverwait

如何通过Python使用ChromeDriver Chrome和Selenium在页面上打印类别链接？

python selenium google-chrome

如何通过Python使用ChromeDriver Chrome和Selenium在页面上打印类别链接？,python,selenium,google-chrome,selenium-chromedriver,webdriverwait,Python,Selenium,Google Chrome,Selenium Chromedriver,Webdriverwait,使用Python3，我试图让Chrome Webdriver和Selenium识别网页www.jtinsight.com上的各种“分类广告”类别，并从中打印出类别标题。到目前为止，使用下面的代码，我能做的就是让它打印出前两个-‘所有类别’和‘汽车（私人）’。我已经确定这两个html与其他html不同，并尝试了我在注释掉的代码中列出的许多不同的代码行，但无法识别正确的标记/类/xpath等。任何帮助都将不胜感激 from selenium import webdriver from seleni

使用Python3，我试图让Chrome Webdriver和Selenium识别网页www.jtinsight.com上的各种“分类广告”类别，并从中打印出类别标题。到目前为止，使用下面的代码，我能做的就是让它打印出前两个-‘所有类别’和‘汽车（私人）’。我已经确定这两个html与其他html不同，并尝试了我在注释掉的代码中列出的许多不同的代码行，但无法识别正确的标记/类/xpath等。任何帮助都将不胜感激

from selenium import webdriver
from selenium.webdriver.common.by import By

# Creating the WebDriver object using the ChromeDriver
driver = webdriver.Chrome()

# Directing the driver to the defined url
driver.get("https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main")

# Locate the categories

# Each code line runs but only returns the first two categories
# categories = driver.find_elements_by_xpath('//div[@class="col-md-3 col-sm-4 col-xs-6"]')
# categories = driver.find_elements_by_xpath('//div[@class="mainCatEntry"]')
# categories = driver.find_elements_by_xpath('//div[@class="Description"]')

# Process ran but finished with exit code 0
# categories = driver.find_elements_by_xpath('//*[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_xpath('//div[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_partial_link_text('//href[@class="divLink"]')
# categories = driver.find_elements_by_tag_name('bindonce')
# categories = driver.find_elements_by_xpath('//div[@class="divLink"]')

# Error before finished running
# categories = driver.find_elements(By.CLASS_NAME, "col-md-3 col-sm-4 col-xs-6 ng-scope")
# categories = driver.find_elements(By.XPATH, '//div bindonce[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_class_name('//div bindonce[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')

# Print out all categories on current page
num_page_items = len(categories)
print(num_page_items)
for i in range(num_page_items):
    print(categories[i].text)

# Clean up (close browser once task is completed.)
driver.close()

这确实是一个时间问题。如果我在收集分类之前加上一个“睡眠（5）”，它会找到全部24个。有趣的是，当我改用WebDriverWait时，它仍然只能拉出2个项目。因此，为了迫使驱动程序做更多的工作，我扩展了xpath。以下几点对我很有用：

categories = WebDriverWait(driver, 10).until(
EC.visibility_of_all_elements_located((By.XPATH, '//div[@class="mainCatEntry"]/div[@class="Description"]')))

识别网页上的各种分类广告类别

https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main

并打印类别标题，例如所有类别、汽车（私家车）等，您需要向下滚动一点，并诱导WebDriverWait以查看所有元素的

可见性（）

，您可以使用以下解决方案：

代码块：

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main")
driver.execute_script("arguments[0].scrollIntoView(true);",WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='ng-scope' and text()='Classifieds']"))));
print([elem.get_attribute("innerHTML") for elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='mainCatEntry']//div[@class='Description']")))])

'//div[@class=“mainCatEntry”]

似乎给出了您想要的结果。当您尝试它时发生了什么？当我按照脚本中的代码运行它时，它只返回前两个结果，而不是所有其他结果。在指定“//div[@class=“mainCatEntry”]”之前，您要求驱动程序做什么？感谢您提供有关等待/睡眠的信息。我一延迟，就可以让所有的头球回来。现在，我正试图找出如何提取与这些标题一起出现的URL和图像，但动态html并没有让这变得容易。