Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/281.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
为什么我在python中使用PhantomJS同时拥有两个不同的用户代理?_Python_Phantomjs_Screen Scraping_User Agent - Fatal编程技术网

为什么我在python中使用PhantomJS同时拥有两个不同的用户代理?

为什么我在python中使用PhantomJS同时拥有两个不同的用户代理?,python,phantomjs,screen-scraping,user-agent,Python,Phantomjs,Screen Scraping,User Agent,下面的代码为一个phantomJS实例设置用户代理,打印它,然后刮取一个网站,再次确定它。结果是不同的。这怎么可能?我还没有能够复制出明显的解决方案 1设置一个用户代理 serviceDefaults=["--ignore-ssl-errors=yes",] desiredDefaults={ "phantomjs.page.settings.userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) A

下面的代码为一个phantomJS实例设置用户代理,打印它,然后刮取一个网站,再次确定它。结果是不同的。这怎么可能?我还没有能够复制出明显的解决方案

1设置一个用户代理

serviceDefaults=["--ignore-ssl-errors=yes",]
desiredDefaults={
          "phantomjs.page.settings.userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
           AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"}
def create_phantomJS():
    driver = webdriver.PhantomJS("phantomjs.exe", desired_capabilities=desiredDefaults, service_args=serviceDefaults)
    phantom_exc_uri='/session/$sessionId/phantom/execute'
    driver.command_executor._commands['executePhantomScript'] = ('POST', phantom_exc_uri)
    initScript="""             
    this.onInitialized=function() {
        var page=this;
        if (page.navigator == page.settings.userAgent){return};
        page.settings.navigator = page.settings.userAgent;
    }
    """
    driver.execute('executePhantomScript',{'script': initScript, 'args': []})
    agent = driver.execute_script("return navigator.userAgent")
    print "rawUa:", agent
    return driver
2设置驱动程序和打印用户代理

serviceDefaults=["--ignore-ssl-errors=yes",]
desiredDefaults={
          "phantomjs.page.settings.userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
           AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"}
def create_phantomJS():
    driver = webdriver.PhantomJS("phantomjs.exe", desired_capabilities=desiredDefaults, service_args=serviceDefaults)
    phantom_exc_uri='/session/$sessionId/phantom/execute'
    driver.command_executor._commands['executePhantomScript'] = ('POST', phantom_exc_uri)
    initScript="""             
    this.onInitialized=function() {
        var page=this;
        if (page.navigator == page.settings.userAgent){return};
        page.settings.navigator = page.settings.userAgent;
    }
    """
    driver.execute('executePhantomScript',{'script': initScript, 'args': []})
    agent = driver.execute_script("return navigator.userAgent")
    print "rawUa:", agent
    return driver
3抓取网站,确定用户代理并打印

def use_driver(driver, URL):
    website = driver.get(URL) 
    html = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.ID, "rawUa")))
    return text
4比较结果

driver = create_phantomJS()
text = use_driver(driver, URL)
print text
输出是两个不同的用户代理


如何在python中匹配此场景中的用户代理?

改进initScrip可能会奏效

initScript="""
this.onInitialized=function() {
console.log("[INFO] TESTING NAVIGATOR VALUE");
if (navigator.userAgent == this.settings.userAgent){return};
navigator={"User-Agent":this.settings.userAgent};
}.bind(this);
"""
导航器必须设置为新对象。驱动程序创建之后的打印不会给出正确的测试结果,因为处理程序onInitialized将在页面创建之后和URL请求之前调用