Javascript 使用selenium python webscraping获取向下滚动网页的html源代码_Javascript_Python_Selenium_Web Scraping

Javascript 使用selenium python webscraping获取向下滚动网页的html源代码

javascript python selenium web-scraping

Javascript 使用selenium python webscraping获取向下滚动网页的html源代码,javascript,python,selenium,web-scraping,Javascript,Python,Selenium,Web Scraping,我正在尝试获取所有酒店，但即使我执行了向下滚动脚本，我的page_源代码仅显示包含11家酒店的html代码，即最初加载的内容在向下滚动浏览所有酒店后，如何获取整个数据源代码如果driver.execute脚本正在加载整个页面，那么如何将整个页面的页面源存储在变量中附言：这只是为了教育目的从selenium导入webdriver 进口稀土作为pd进口熊猫导入时间 chrome\u path=r“C:\Users\ajite\Desktop\web scraping\chromedriv

我正在尝试获取所有酒店，但即使我执行了向下滚动脚本，我的page_源代码仅显示包含11家酒店的html代码，即最初加载的内容

在向下滚动浏览所有酒店后，如何获取整个数据源代码

如果driver.execute脚本正在加载整个页面，那么如何将整个页面的页面源存储在变量中

附言：这只是为了教育目的

从selenium导入webdriver
进口稀土
作为pd进口熊猫
导入时间
chrome\u path=r“C:\Users\ajite\Desktop\web scraping\chromedriver.exe”
driver=webdriver.Chrome（Chrome\u路径）
司机，上车https://www.makemytrip.com/mmthtl/site/hotels/search?checkin=02252018&checkout=02262018&roomStayQualifier=1e0e&city=GOI&searchText=Goa，%20印度和国家=英寸）
驱动程序。隐式等待（3）
执行脚本（“window.scrollTo（0，document.body.scrollHeight）；”）
时间。睡眠（5）
两个酒店=驱动程序。通过xpath（'/*[@id=“hotel\u card\u list”]/div'查找元素

您的滚动未被执行，而是：

driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

你应该试试：

for i in range(0,25): # here you will need to tune to see exactly how many scrolls you need
  driver.execute_script('window.scrollBy(0, 400)')
  time.sleep(1)

我尝试的代码：

import selenium
import time
from selenium import webdriver
driver = webdriver.Chrome()

driver.get("https://www.makemytrip.com/mmthtl/site/hotels/search?checkin=02252018&checkout=02262018&roomStayQualifier=1e0e&city=GOI&searchText=Goa,%20India&country=IN")
driver.implicitly_wait(3)

for i in range(0,25): # here you will need to tune to see exactly how many scrolls you need
  driver.execute_script('window.scrollBy(0, 400)')
  time.sleep(1)

time.sleep(10) #more time so the cards will load

two_hotels = driver.find_elements_by_xpath('//*[@id="hotel_card_list"]/div')

two_hotels

现在有了更多的价值

对于

范围内的

值，我为酒店获得

值，我认为您需要调整一点值，以获得所需的所有值

尝试获取所有酒店

并不等同于

page\u source

，您可能需要酒店列表。让我知道我是否正确。