Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/284.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用selenium的Python-javascript web抓取无法正常工作_Javascript_Python_Selenium_Selenium Webdriver_Web Scraping - Fatal编程技术网

使用selenium的Python-javascript web抓取无法正常工作

使用selenium的Python-javascript web抓取无法正常工作,javascript,python,selenium,selenium-webdriver,web-scraping,Javascript,Python,Selenium,Selenium Webdriver,Web Scraping,我正试图从一个航班搜索网页上搜集一些数据。它可能是用Javascript生成的。我尝试了很多方法,但都不管用,所以我决定尝试selenium from selenium import webdriver driver = webdriver.Firefox() driver.get('https://www.pelikan.sk/sk/flights/list?dfc=CVIE%20BUD%20BTS&dtc=CMAD&rfc=CMAD&rtc=CVIE%20BUD%2

我正试图从一个航班搜索网页上搜集一些数据。它可能是用Javascript生成的。我尝试了很多方法,但都不管用,所以我决定尝试selenium

from selenium import webdriver

driver = webdriver.Firefox()
driver.get('https://www.pelikan.sk/sk/flights/list?dfc=CVIE%20BUD%20BTS&dtc=CMAD&rfc=CMAD&rtc=CVIE%20BUD%20BTS&dd=2015-07-09&rd=2015-07-14&px=1000&ns=0&prc=&rng=1&rbd=0&ct=0')
print driver.page_source
我想它会返回最终的javascript生成的html代码,但在浏览器中打开页面时,我找不到该页面上的字符串

问题出在哪里?我该怎么办才能得到那些航班

编辑:我忘了提到页面正在持续加载新航班。因此,当您在浏览器中打开该页面时,它会显示一些航班,但仍会加载其他航班。

该页面具有相当动态的性质,您需要这样做。选择一些指示页面和搜索结果已加载的内容。例如,等待加载图像(带有鹈鹕)变为:

在这里,我们正在等待两只鹈鹕飞走消失:一只更大的和一只更小的

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get("https://www.pelikan.sk/sk/flights/list?dfc=CVIE%20BUD%20BTS&dtc=CMAD&rfc=CMAD&rtc=CVIE%20BUD%20BTS&dd=2015-07-09&rd=2015-07-14&px=1000&ns=0&prc=&rng=1&rbd=0&ct=0")

wait = WebDriverWait(driver, 60)
wait.until(EC.invisibility_of_element_located((By.XPATH, '//img[contains(@src, "loading")]')))
wait.until(EC.invisibility_of_element_located((By.XPATH, u'//div[. = "Poprosíme o trpezlivosť, hľadáme pre Vás ešte viac letov"]/preceding-sibling::img')))

print(driver.page_source)