Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/templates/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 使用Selenium WebDriver将文件下载到特定路径_Python_Selenium_Webdriver - Fatal编程技术网

Python 使用Selenium WebDriver将文件下载到特定路径

Python 使用Selenium WebDriver将文件下载到特定路径,python,selenium,webdriver,Python,Selenium,Webdriver,我需要将文件下载到非本地计算机上的给定位置。这是web浏览器的正常流程,我将为此执行以下操作: 访问网站 单击按钮下载文件(它是生成文件的表单,不是下载链接) 网站会提示一个警告窗口“是否要下载此文件?”等 我希望能够绕过该文件并执行以下操作: >>> path_to_download_path = PATH >>> button = driver.find_element_by_css("...") >>> button.click()

我需要将文件下载到非本地计算机上的给定位置。这是web浏览器的正常流程,我将为此执行以下操作:

  • 访问网站
  • 单击按钮下载文件(它是生成文件的表单,不是下载链接)
  • 网站会提示一个警告窗口“是否要下载此文件?”等
我希望能够绕过该文件并执行以下操作:

>>> path_to_download_path = PATH
>>> button = driver.find_element_by_css("...")
>>> button.click()

--> And the file is automatically downloaded to my PATH (or wherever I choose)
或者有没有一种更简单的方法,可以通过
单击
,自动下载文件的内容


我该怎么做呢?

您必须先检查网站上的javascript,了解它是如何工作的,然后才能覆盖它来执行类似的操作,但即使如此,浏览器安全性仍会弹出一个对话框,要求您确认下载。这就给你留下了两个选择(据我所知):

  • 确认警报对话框
  • 确定文件在远程服务器上的位置,并使用GET下载文件

关于这两个方面的详细信息我都帮不上忙,因为我不懂python,但希望这有助于……

使用selenium webdriver

使用firefox配置文件下载您的文件。此配置文件跳过firefox的对话框。 一致:-

   pro.setPreference("browser.downLoad.folderList", 0);
browser.download.folderList的值可以设置为0、1或2。设置为0时,Firefox会将通过浏览器下载的所有文件保存在用户桌面上。设置为1时,这些下载将存储在downloads文件夹中。当设置为2时,将再次使用为最近下载指定的位置

您需要实现的Firefox配置文件代码:-

        FirefoxProfile pro=new FirefoxProfile();
        pro.setPreference("browser.downLoad.folderList", 0);
        pro.setPreference("browser.helperApps.neverAsk.saveToDisk", "Applications/zip");
        WebDriver driver=new FirefoxDriver(pro);
        driver.get("http://selenium-release.storage.googleapis.com/2.47/selenium-java-2.47.1.zip");

希望它能帮助您:)

初始化驱动程序时,请确保设置下载首选项

对于Firefox:

ff_prof.set_preference( "browser.download.manager.showWhenStarting", False )
ff_prof.set_preference( "browser.download.folderList", 2 )
ff_prof.set_preference( "browser.download.useDownloadDir", True )
ff_prof.set_preference( "browser.download.dir", self.driver_settings['download_folder'] )

##
# if FF still shows the download dialog, make sure that the filetype is included below
# filetype string options can be found in '~/.mozilla/$USER_PROFILE/mimeTypes.rdf'
##
mime_types = ("application/pdf", "text/html")

ff_prof.set_preference( "browser.helperApps.neverAsk.saveToDisk", (", ".join( mime_types )) )
ff_prof.set_preference( "browser.helperApps.neverAsk.openFile", (", ".join( mime_types )) )
对于铬:

capabilities['chromeOptions']['prefs']['download.prompt_for_download'] = False
capabilities['chromeOptions']['prefs']['download.default_directory'] = self.driver_settings['download_folder']
转发下载:

下面是我用来将文件从
self.driver\u settings['download\u folder']
(如上设置)重定向到实际需要文件的位置的代码(
to\u path
可以是现有文件夹或文件路径)。如果你在linux上,我建议使用
tmpfs
,这样
/tmp
就保存在ram中,然后将
self.driver\u设置['download\u folder']
设置为
“/tmp/driver\u downloads/”
。请注意,下面的函数假定
self.driver\u settings['download\u folder']
始终以空文件夹开始(这是它定位正在下载的文件的方式,因为它是目录中唯一的文件)

def moveDriverDownload(self、to_路径、允许的_扩展名、允许_重命名(如果_存在)=False、超时_秒数=None):
如果超时\u秒为无:
超时时间=30秒
等待增量=时间增量(秒=超时秒)
start\u download\u time=datetime.now()
Hastinmedout=lambda:datetime.now()-start\u download\u time>wait\u delta
断言isinstance(允许的扩展名,列表)或isinstance(允许的扩展名,元组)或isinstance(允许的扩展名,集合),“找到允许的扩展名类型为“{}”,而不是列表。格式(类型(允许的扩展名))
允许的_扩展=[elem.lower().strip()用于允许的_扩展中的元素]
允许的_扩展=[elem if elem.startswith(“.”else“.”+elem for elem in allowed_扩展]
如果在允许的扩展名中没有“.part”:
允许的扩展名。追加(“.part”)
re_extension_str=“(?:”+(“$)|(?:”。连接(允许的_extension中的元素的re.escape(elem))+“$”
getFiles=lambda:next(os.walk(self.driver\u设置['download\u folder'])[2]
尽管如此:
如果您退出():
del允许的扩展[允许的扩展.索引(“.part”)]
raise DownloadTimeoutError(“等待扩展名为{}的文件下载时{}秒后超时”。格式(超时秒,允许的扩展名))
睡眠时间(0.5)
file_list=[getFiles()中elem的elem如果重新搜索(re_扩展名_str,elem)]
如果len(文件列表)>0:
打破
文件\u列表=[r.search(r“(?i)^(.*?(:\.part)?$”,elem.groups()[0]用于文件\u列表中的元素]
如果len(文件列表)>1:
如果len(文件列表)=2:
如果文件列表[0]!=文件列表[1]:
引发异常(“文件列表[0]!=文件列表[1]{}!={}”。格式(文件列表[0],文件列表[1]))
其他:
引发异常(“len(文件列表)>1.found{}.format(文件列表))
file_path=“%s%s”%(self.driver_设置['download_folder'],文件列表[0])
#通过检查文件是否被任何程序打开,查看文件是否仍在下载中
如果platform.system()=“Linux”:
openProcess=lambda:subprocess.Popen('lsof | grep“%s”%file\u路径,shell=True,stdout=subprocess.PIPE,stdin=subprocess.PIPE,stderr=subprocess.PIPE)
fileIsFinished=lambda txt:txt.strip()
elif platform.system()=“Windows”:
#“句柄”程序必须位于路径中
# https://technet.microsoft.com/en-us/sysinternals/bb896655
openProcess=lambda:subprocess.Popen('handle“%s”'%file\u path.replace(“/”,“\\”),shell=True,stdout=subprocess.PIPE,stdin=subprocess.PIPE,stderr=subprocess.PIPE)
fileIsFinished=lambda txt:bool(重新搜索(“(?i)未找到匹配的句柄”,txt))
其他:
引发异常(“无法识别的{}”格式的platform.system()(platform.system())
尽管如此:
lsof_进程=openProcess()
lsof_结果=lsof_进程。通信()
如果len(lsof_结果)!=2:
引发异常(“len(lsof_结果)!=2.found{}”。格式(lsof_结果))
如果lsof_结果[1]。条带()!="":
引发异常('lsof_结果[1].strip()!=“”。找到{}。格式(lsof_结果))
如果文件已完成(lsof_结果[0]):
打破
如果您退出():
引发异常(“在等待“{}”从写入中释放出来的{}”秒后超时。找到“{}”的lsof/handle.form
def moveDriverDownload(self, to_path, allowable_extensions, allow_rename_if_exists=False, timeout_seconds=None):
    if timeout_seconds is None:
        timeout_seconds = 30
    wait_delta = timedelta( seconds=timeout_seconds )
    start_download_time = datetime.now()
    hasTimedOut = lambda: datetime.now() - start_download_time > wait_delta

    assert isinstance(allowable_extensions, list) or isinstance(allowable_extensions, tuple) or isinstance(allowable_extensions, set), "instead of a list, found allowable_extensions type of '{}'".format(type(allowable_extensions))
    allowable_extensions = [ elem.lower().strip() for elem in allowable_extensions ]
    allowable_extensions = [ elem if elem.startswith(".") else "."+elem for elem in allowable_extensions ]

    if not ".part" in allowable_extensions:
        allowable_extensions.append( ".part" )

    re_extension_str = "(?:" + ("$)|(?:".join( re.escape(elem) for elem in allowable_extensions )) + "$)"

    getFiles = lambda: next( os.walk( self.driver_settings['download_folder'] ) )[2]

    while True:
        if hasTimedOut():
            del allowable_extensions[ allowable_extensions.index(".part") ]
            raise DownloadTimeoutError( "timed out after {} seconds while waiting on file download with extension in {}".format(timeout_seconds, allowable_extensions) )

        time.sleep( 0.5 )

        file_list = [ elem for elem in getFiles() if re.search( re_extension_str, elem ) ]
        if len(file_list) > 0:
            break

    file_list = [ re.search( r"(?i)^(.*?)(?:\.part)?$", elem ).groups()[0] for elem in file_list ]

    if len(file_list) > 1:
        if len(file_list) == 2:
            if file_list[0] != file_list[1]:
                raise Exception( "file_list[0] != file_list[1] <==> {} != {}".format(file_list[0], file_list[1]) )
        else:
            raise Exception( "len(file_list) > 1. found {}".format(file_list) )

    file_path = "%s%s" %(self.driver_settings['download_folder'], file_list[0])

    # see if the file is still being downloaded by checking if it's open by any programs
    if platform.system() == "Linux":
        openProcess = lambda: subprocess.Popen( 'lsof | grep "%s"' %file_path, shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE )
        fileIsFinished = lambda txt: txt.strip() == ""
    elif platform.system() == "Windows":
        # 'handle' program must be in PATH
        # https://technet.microsoft.com/en-us/sysinternals/bb896655
        openProcess = lambda: subprocess.Popen( 'handle "%s"' %file_path.replace("/", "\\"), shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE )
        fileIsFinished = lambda txt: bool( re.search("(?i)No matching handles found", txt) )
    else:
        raise Exception( "unrecognised platform.system() of '{}'".format(platform.system()) )

    while True:
        lsof_process = openProcess()
        lsof_result = lsof_process.communicate()

        if len(lsof_result) != 2:
            raise Exception( "len(lsof_result) != 2. found {}".format(lsof_result) )
        if lsof_result[1].strip() != "":
            raise Exception( 'lsof_result[1].strip() != "". found {}'.format(lsof_result) )
        if fileIsFinished( lsof_result[0] ):
            break

        if hasTimedOut():
            raise Exception( "timed out after {} seconds waiting for '{}' to be freed from writing. found lsof/handle of '{}'".format(timeout_seconds, file_path, lsof_result[0]) )

        time.sleep( 0.5 )

    to_path = to_path.replace("\\", "/")
    if os.path.isdir( to_path ):
        if not to_path.endswith("/"):
            to_path += "/"

        to_path += file_list[0]

    i = 2
    while os.path.exists( to_path ):
        if not allow_rename_if_exists:
            raise Exception( "{} already exists".format(to_path) )

        to_path = re.sub( "^(.*/)(.*?)(?:-" + str(i-1) + r")?(|\..*?)?$", r"\1\2-%i\3" %i, to_path )
        i += 1

    shutil.move( file_path, to_path )

    return to_path[ to_path.rindex("/")+1: ]