Python 如何在SeleniumWebDriver Firefox中从'driver.page_source'获取HTTP请求的原始JSON响应

Python 如何在SeleniumWebDriver Firefox中从'driver.page_source'获取HTTP请求的原始JSON响应,python,json,selenium,selenium-webdriver,httpresponse,Python,Json,Selenium,Selenium Webdriver,Httpresponse,如果我浏览到,我希望得到以下JSON响应: { "headers": { "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", "Accept-Encoding": "gzip, deflate, br", "Accept-Language": "en-US,en;q=0.5", "Connection": "close", "Host": "h

如果我浏览到,我希望得到以下JSON响应:

{
  "headers": {
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", 
    "Accept-Encoding": "gzip, deflate, br", 
    "Accept-Language": "en-US,en;q=0.5", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "Upgrade-Insecure-Requests": "1", 
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:64.0) Gecko/20100101 Firefox/64.0"
  }
}
但是,如果我使用硒

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.headless = True
driver = webdriver.Firefox(options=options)

url = 'https://httpbin.org/headers'
driver.get(url)
print(driver.page_source)
driver.close()
我明白了

{
“标题”:{
“接受”:“text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8”,
“接受编码”:“gzip,deflate,br”,
“接受语言”:“en-US,en;q=0.5”,
“连接”:“关闭”,
“主机”:“httpbin.org”,
“升级不安全请求”:“1”,
“用户代理”:“Mozilla/5.0(X11;Ubuntu;Linux x86_64;rv:64.0)Gecko/20100101 Firefox/64.0”
}
}

HTML标签来自哪里?如何从
driver.page\u source
获取HTTP请求的原始JSON响应?

除了原始JSON响应,
driver.page\u source
还包含在浏览器中“漂亮打印”响应的HTML。如果使用FirefoxDOM和样式检查器在浏览器中查看JSON响应的源代码,您将得到相同的结果

要获得原始JSON响应,您可以像往常一样导航HTML元素:

print(driver.find_element_by_xpath("//div[@id='json']").text)
在url中使用“查看源:”参数

简单模式:

例如:

url = 'view-source:https://httpbin.org/headers'
driver.get(url)
content = driver.page_source
print(content)
{'headers': {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
  'Accept-Encoding': 'gzip, deflate, br',
  'Accept-Language': 'en-US,en;q=0.5',
  'Host': 'httpbin.org',
  'Upgrade-Insecure-Requests': '1',
  'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0'}}
输出:

'<html><head><meta name="viewport" content="width=device-width"><title>https://httpbin.org/headers</title><link rel="stylesheet" type="text/css" href="resource://content-accessible/viewsource.css"></head><body id="viewsource" class="highlight" style="-moz-tab-size: 4"><pre>{\n  "headers": {\n    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", \n    "Accept-Encoding": "gzip, deflate, br", \n    "Accept-Language": "en-US,en;q=0.5", \n    "Host": "httpbin.org", \n    "Upgrade-Insecure-Requests": "1", \n    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0"\n  }\n}\n</pre></body></html>'
from selenium.webdriver.firefox.options import Options as FirefoxOptions

    @staticmethod
    def get_firefox_options(headless):
        options = FirefoxOptions()
        options.set_preference('devtools.jsonview.enabled', False)

        if headless:
            options.headless = True

        return options
输出:

'<html><head><meta name="viewport" content="width=device-width"><title>https://httpbin.org/headers</title><link rel="stylesheet" type="text/css" href="resource://content-accessible/viewsource.css"></head><body id="viewsource" class="highlight" style="-moz-tab-size: 4"><pre>{\n  "headers": {\n    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", \n    "Accept-Encoding": "gzip, deflate, br", \n    "Accept-Language": "en-US,en;q=0.5", \n    "Host": "httpbin.org", \n    "Upgrade-Insecure-Requests": "1", \n    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:67.0) Gecko/20100101 Firefox/67.0"\n  }\n}\n</pre></body></html>'
from selenium.webdriver.firefox.options import Options as FirefoxOptions

    @staticmethod
    def get_firefox_options(headless):
        options = FirefoxOptions()
        options.set_preference('devtools.jsonview.enabled', False)

        if headless:
            options.headless = True

        return options

本文帮助我解决了firefox的问题:

我已将此首选项添加到我的驱动程序工厂:


如果您执行F12并切换到Inspector选项卡,您将看到HTML,而不是JSON。它与SeleniumYou无关,您可以使用类似的工具获取原始源代码,如
requests.get('https://httpbin.org/headers').json()
而不进行处理DOM@Andersson考虑到这一点,我非常肯定httpbin.org应该根据请求中的
Accept
标题提供不同的内容。python的
请求
的默认值为“/”,因此服务器返回到应用程序json——因此测试结果不是例外(如果响应是html字符串,则
.json()
会引发一个异常)。@Todomnakov,取决于可能使用的响应类型
.text
.content
,但由于问题是“如何获得原始JSON响应…”,我使用了
.JSON()
,我的意思不同(不是“应该使用响应对象的哪个属性”),我可能没有清楚地解释自己-httpbin.org为不同的客户端返回不同的数据;这是基于“接受”标题的。对于浏览器客户端,它返回html,其中一个节点的值为“json”;对于可以接受它的客户端(如
请求
lib),它直接返回json响应。因此,您的注释中的代码是开箱即用的——服务器没有返回“屏蔽html”,响应中的负载是正确的json。