Python 无法在TripAdvisor中使用Selenium抓取类_Python_Html_Selenium_Beautifulsoup

Python 无法在TripAdvisor中使用Selenium抓取类

python html selenium

Python 无法在TripAdvisor中使用Selenium抓取类,python,html,selenium,beautifulsoup,Python,Html,Selenium,Beautifulsoup,我试图为一个特定的TripAdivsor页面刮取所有图像，但在Selenium中使用find_elements_by_class_name函数时，它没有给我任何值。我很困惑，因为这是我想要迭代并附加到列表中的值的确切类名，下面是示例。任何帮助都将不胜感激 # importing dependencies import re import selenium import io import pandas as pd import urllib.request import urllib.parse

我试图为一个特定的TripAdivsor页面刮取所有图像，但在Selenium中使用find_elements_by_class_name函数时，它没有给我任何值。我很困惑，因为这是我想要迭代并附加到列表中的值的确切类名，下面是示例。任何帮助都将不胜感激

# importing dependencies
import re
import selenium
import io
import pandas as pd
import urllib.request
import urllib.parse
import requests
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
import time
from _datetime import datetime
from selenium.webdriver.common.keys import Keys


#setup opening url window of website to be scraped
options = webdriver.ChromeOptions()
options.headless=False
prefs = {"profile.default_content_setting_values.notifications" : 2} 
options.add_experimental_option("prefs", prefs)
driver = webdriver.Chrome("/Users/rishi/Downloads/chromedriver 3") #possible issue by not including the file extension
driver.maximize_window()
time.sleep(5)
driver.get("""https://www.tripadvisor.com/""") #get the information from the page

#automate searching for hotels in specific city
driver.find_element_by_xpath('/html/body/div[2]/div/div[6]/div[1]/div/div/div/div/span[1]/div/div/div/a').click() #clicks on hotels option
driver.implicitly_wait(12) #allows xpath to be found
driver.find_element_by_xpath('//*[@id="BODY_BLOCK_JQUERY_REFLOW"]/div[12]/div/div/div[1]/div[1]/div/input').send_keys("Washington D.C.", Keys.ENTER) #change string to get certain city
time.sleep(8)

#now get current url
url = driver.current_url

response = requests.get(url)
response = response.text
data = BeautifulSoup(response, 'html.parser')

#get list of all hotels
hotels = driver.find_elements_by_class_name("prw_rup prw_meta_hsx_responsive_listing ui_section listItem")

print("Total Number of Hotels: ", len(hotels))

我建议，如果你使用硒，不要在它旁边使用BeautifulSoup，因为你可以用硒得到你想要的任何东西

您只需实现以下目标：

driver = webdriver.Chrome("/Users/rishi/Downloads/chromedriver 3")
driver.maximize_window()

driver.get("https://www.tripadvisor.ca/Hotels")

time.sleep(1)

driver.implicitly_wait(12)
driver.find_element_by_xpath('//*[@class="typeahead_input"]').send_keys("Washington D.C.", Keys.ENTER)
time.sleep(1)
hotels = driver.find_elements_by_xpath('//*[@class="listing collapsed"]')

print("Total Number of Hotels: ", len(hotels))

请注意，使用此代码您将获得前30家酒店（即首页）。您需要循环浏览指定城市酒店的所有页面，才能获取所有页面

希望有帮助。

我在web浏览器中查看了页面，但在页面上找不到该类-可能它在每个请求中使用不同的名称，或者针对不同的用户。此外，页面使用JavaScript添加元素，但BeautifulSoup无法运行JavaScript，您可能需要Selenium来控制可以运行JavaScript的web浏览器。非常感谢！我有一个关于代码的简单问题，你是如何得到xpath的？我试着去你找到它的那个部门，但是当我试着复制它时，它没有给我同样的结果。太棒了！实际上，您可以使用“开发人员工具”在网页中获取HTML标记或类名的任何详细信息。您可以在浏览器菜单中找到它，或者在Google Chrome上使用“CTRL+SHIFT+i”，或者在Microsoft Edge上使用“F12”。我尝试过这样做，然后转到包含“i see”的div。只需选择类名并使用它在代码中查找元素。您不需要所有这些详细的xpath。