PhantomJS返回空网页（python、Selenium）_Python_Selenium_Selenium Webdriver_Phantomjs

PhantomJS返回空网页（python、Selenium）

python selenium selenium-webdriver phantomjs

PhantomJS返回空网页（python、Selenium）,python,selenium,selenium-webdriver,phantomjs,Python,Selenium,Selenium Webdriver,Phantomjs,尝试在不必启动python脚本中的实际浏览器实例的情况下（使用Selenium）对网站进行屏幕扫描。我可以用Chrome或Firefox来实现这一点——我已经尝试过了，而且效果很好——但我想使用PhantomJS，所以它是无头的代码如下所示： import sys import traceback import time from selenium import webdriver from selenium.webdriver.common.keys import Keys from se

尝试在不必启动python脚本中的实际浏览器实例的情况下（使用Selenium）对网站进行屏幕扫描。我可以用Chrome或Firefox来实现这一点——我已经尝试过了，而且效果很好——但我想使用PhantomJS，所以它是无头的

代码如下所示：

import sys
import traceback
import time

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities

dcap = dict(DesiredCapabilities.PHANTOMJS)
dcap["phantomjs.page.settings.userAgent"] = (
    "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/53 "
    "(KHTML, like Gecko) Chrome/15.0.87"
)

try:
    # Choose our browser
    browser = webdriver.PhantomJS(desired_capabilities=dcap)
    #browser = webdriver.PhantomJS()
    #browser = webdriver.Firefox()
    #browser = webdriver.Chrome(executable_path="/usr/local/bin/chromedriver")

    # Go to the login page
    browser.get("https://www.whatever.com")

    # For debug, see what we got back
    html_source = browser.page_source
    with open('out.html', 'w') as f:
        f.write(html_source)

    # PROCESS THE PAGE (code removed)

except Exception, e:
    browser.save_screenshot('screenshot.png')
    traceback.print_exc(file=sys.stdout)

finally:
    browser.close()

输出仅仅是：

<html><head></head><body></body></html>

您需要等待页面加载。通常，通过使用to等待关键元素出现或在页面上可见来完成。例如：

from selenium.webdriver.support.wait import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC # ... browser.get("https://www.whatever.com") wait = WebDriverWait(driver, 10) wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.content"))) html_source = browser.page_source # ...
在这里，我们将等待10秒钟，等待
div
元素和
class=“content”
在获取页面源代码之前变为可见

此外，您可能需要忽略SSL错误：

不过，我很确定这与
PhantomJS
中的重定向问题有关。在
phantomjs
bugtracker中有一个未解决的问题：

我也面临着同样的问题，没有多少代码让司机等待起作用。
问题是https网站上的SSL加密，忽略它们就可以了
将PhantomJS驱动程序调用为：

driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true', '--ssl-protocol=TLSv1'])

这为我解决了问题。
driver=webdriver.PhantomJS（服务参数=['--ignore ssl errors=true'，'--ssl protocol=TLSv1']）

这对我很管用
好吧，我试试。。。。但是如果“get”命令在返回之前不等待“pageloaded”完成，那么它有多有用呢？？看起来应该是内置的。您是否可以使用非定时等待命令来等待“页面加载”事件（或其名称）？@cbp2否，selenium不会在浏览器中等待未完成的异步请求或异步代码执行。使用显式等待应该可以解决问题。我们正在接近，但仍然没有雪茄。我添加了等待，但等待ID出现-超时，尽管我知道ID应该在那里。代码输出和屏幕截图仍然为空<代码>回溯（最后一次调用）：文件“scrape_CS.py”，第35行，在element=wait.until（EC.element_to_be_可点击（（By.ID，'loginField'））文件“/Users/carey/anaconda/lib/python2.7/site packages/selenium/webdriver/support/wait.py”，第75行，直到引发TimeoutException（消息、屏幕、stacktrace）TimeoutException:Message:Screenshot:可通过screen@cbp2获得好的，谢谢您的试用。我已经更新了答案，请检查。不幸的是，结果是一样的。顺便问一下，我需要所有“dcap”的东西吗？如果没有，我就把它去掉。您能解释一下为什么您认为忽略ssl错误是问题所在吗？我确实在Chrome中看到了关于这一点的警告，但它仍然有效。它在PhantomJS中根本不起作用。这对我来说很有效，不同于另一个答案--ssl protocol=TLSv1'部分。你知道为什么会这样吗？我今天也遇到了这个问题。我的页面停止工作并返回ssl协议=TLSv1解决了它。惊人的发现。
browser = webdriver.PhantomJS(desired_capabilities=dcap, service_args=['--ignore-ssl-errors=true'])

driver = webdriver.PhantomJS(service_args=['--ignore-ssl-errors=true', '--ssl-protocol=TLSv1'])