Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 3.x Selenium:如何避免访问被拒绝页面?_Python 3.x_Selenium_Google Chrome_Firefox - Fatal编程技术网

Python 3.x Selenium:如何避免访问被拒绝页面?

Python 3.x Selenium:如何避免访问被拒绝页面?,python-3.x,selenium,google-chrome,firefox,Python 3.x,Selenium,Google Chrome,Firefox,我正在尝试使用Selenium浏览一个网站,但是当我尝试获取下一页时,我得到了一个错误:访问被拒绝。您没有访问此服务器上“”的权限 我的代码如下: import os import time from selenium import webdriver os.environ['MOZ_HEADLESS'] = '1' petshop_url = 'https://www.blah.com/Filtro=D37608&ordenacao=_maisvendidos&nid=202

我正在尝试使用Selenium浏览一个网站,但是当我尝试获取下一页时,我得到了一个错误:访问被拒绝。您没有访问此服务器上“”的权限

我的代码如下:

import os
import time
from selenium import webdriver

os.environ['MOZ_HEADLESS'] = '1'
petshop_url = 'https://www.blah.com/Filtro=D37608&ordenacao=_maisvendidos&nid=202059'
browser = webdriver.Firefox(executable_path = './geckodriver')


browser.get(petshop_url)
next_button = browser.find_element_by_id('ctl00_Conteudo_ctl02_divBuscaResultadoInferior').find_element_by_class_name('next')
time.sleep(1)
next_button.click()
time.sleep(1)
html_source = browser.page_source
print(html_source)
我已经按照此处的建议尝试清理现金并删除代理:

还添加和删除了睡眠选项,用Chrome进行了尝试,并删除了无头选项,但没有任何效果。知道我的错误是什么吗

以下是浏览器关闭时的日志:

1572780171083   Marionette  TRACE   [16] Received DOM event pageshow for https://www.blah.com/?Filtro=D37608&Ordenacao=_maisvendidos&paginaAtual=3&ComparacaoProdutos=&AdicionaListaCasamento=
1572780171086   Marionette  DEBUG   0 <- [1,6,null,{"value":null}]
1572780171093   webdriver::server   DEBUG   <- 200 OK {"value":null}
1572780172095   webdriver::server   DEBUG   -> GET /session/1ea63780-133a-4649-ba1b-5732a2fed59c/source 
1572780172098   Marionette  DEBUG   0 -> [0,7,"WebDriver:GetPageSource",{}]
1572780172099   Marionette  DEBUG   0 <- [1,7,null,{"value":"<html><head>\n<title>Access Denied</title>\n</head><body>\n<h1>Access Denied</h1>\n \nYou don't have perm ... ccess \"http://www.blah.com/?\” on this server.<p>\nReference #18.debc1002.1572780170.31119482\n\n\n</p></body></html>"}]
1572780172102   webdriver::server   DEBUG   <- 200 OK {"value":"<html><head>\n<title>Access Denied</title>\n</head><body>\n<h1>Access Denied</h1>\n \nYou don't have permission to access \"http://www.blah.com/?\” on this server.<p>\nReference #18.debc1002.1572780170.31119482\n\n\n</p></body></html>"}
1572780172103   webdriver::server   DEBUG   -> DELETE /session/1ea63780-133a-4649-ba1b-5732a2fed59c 
1572780172106   Marionette  DEBUG   0 -> [0,8,"Marionette:Quit",{"flags":["eForceQuit"]}]
1572780172106   Marionette  INFO    Stopped listening on port 56193
1572780172149   Marionette  TRACE   Received observer notification quit-application
1572780172164   Marionette  DEBUG   0 <- [1,8,null,{"cause":"shutdown"}]
1572780172202   webdriver::server   DEBUG   Deleting session
1572780172221   Marionette  DEBUG   0 -> [0,9,"Marionette:Quit",{"flags":["eForceQuit"]}]
1572780172222   Marionette  DEBUG   0 <- [1,9,{"error":"invalid session id","message":"Tried to run command without establishing a connection","stacktrace":"WebDriver ... t@chrome://marionette/content/server.js:249:9\n_onJSONObjectReady/<@chrome://marionette/content/transport.js:501:20\n"},null]
1572780172222   Marionette  DEBUG   Closed connection 0
1572780176394   Marionette  TRACE   Received observer notification xpcom-will-shutdown
15727801710083木偶跟踪[16]收到的DOM事件页面显示https://www.blah.com/?Filtro=D37608&Ordenacao=_maisvendidos&paginaAtual=3&ComparacaoProdutos=&AdicionaListaCasamento=
1572780171086木偶调试0[0,7,“WebDriver:GetPageSource”,{}]

1572780172099木偶调试0如果可能的话,你可以共享url以便我检查吗

大多数情况下,我猜该网站要么需要cookies/会话,而cookies/会话是在主主页上生成的

以下是一些提示:

添加推荐人(同样取决于网站)

请尝试以下代码:

def get_dynamic_website_content(url, first_refer='https://www.google.com'):
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')

    chrome_prefs = {}
    options.experimental_options["prefs"] = chrome_prefs
    chrome_prefs['chrome.page.customHeaders.referrer'] = first_refer  

    wd = webdriver.Chrome(chrome_options=options) # live 
    wd = webdriver.Chrome(executable_path="/chromedriver.exe", chrome_options=options)  # desktop env  

    wd.get(url)

    elems = wd.find_elements_by_xpath("//a[@href]")

    for elem in elems:
        link = elem.get_attribute("href") 

        get_dynamic_website_content(link, url) // load recursively by adding refer

    # Add custom return logic etc
另外,请做一个网页的网络检查,然后是如何加载下一页


检查并试验会话、cookie、自定义标题和其他内容,确保将它们添加/删除到chromium中。

如果您需要测试没有cookie的站点

browser.get(petshop_url)
browser.delete_all_cookies()

非常感谢你的帮助!我尝试用Firefox添加一个推荐人(带有“Chrome I get the error”消息:找不到匹配的功能集-我在MacOS上)但它不起作用。似乎下一个页面会加载,但在我获得拒绝访问页面后立即加载。我正在访问的URL是petshop_URL=''啊,我刚刚测试了另一件事。在访问主链接后访问直接链接也不起作用(相同错误:拒绝访问).为了缓解这一问题,我退出浏览器,用新链接重新打开它。虽然不好看,但它可以工作。无论如何,这些问题似乎是相关的。
def get_dynamic_website_content(url, first_refer='https://www.google.com'):
    options = webdriver.ChromeOptions()
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')

    chrome_prefs = {}
    options.experimental_options["prefs"] = chrome_prefs
    chrome_prefs['chrome.page.customHeaders.referrer'] = first_refer  

    wd = webdriver.Chrome(chrome_options=options) # live 
    wd = webdriver.Chrome(executable_path="/chromedriver.exe", chrome_options=options)  # desktop env  

    wd.get(url)

    elems = wd.find_elements_by_xpath("//a[@href]")

    for elem in elems:
        link = elem.get_attribute("href") 

        get_dynamic_website_content(link, url) // load recursively by adding refer

    # Add custom return logic etc
browser.get(petshop_url)
browser.delete_all_cookies()