Python 使用Selenium WebDriver将文件下载到特定路径_Python_Selenium_Webdriver

Python 使用Selenium WebDriver将文件下载到特定路径

python selenium

Python 使用Selenium WebDriver将文件下载到特定路径,python,selenium,webdriver,Python,Selenium,Webdriver,我需要将文件下载到非本地计算机上的给定位置。这是web浏览器的正常流程，我将为此执行以下操作：访问网站单击按钮下载文件（它是生成文件的表单，不是下载链接）网站会提示一个警告窗口“是否要下载此文件？”等我希望能够绕过该文件并执行以下操作： >>> path_to_download_path = PATH >>> button = driver.find_element_by_css("...") >>> button.click()

我需要将文件下载到非本地计算机上的给定位置。这是web浏览器的正常流程，我将为此执行以下操作：

访问网站
单击按钮下载文件（它是生成文件的表单，不是下载链接）
网站会提示一个警告窗口“是否要下载此文件？”等

我希望能够绕过该文件并执行以下操作：

>>> path_to_download_path = PATH
>>> button = driver.find_element_by_css("...")
>>> button.click()

--> And the file is automatically downloaded to my PATH (or wherever I choose)

或者有没有一种更简单的方法，可以通过

单击，自动下载文件的内容
我该怎么做呢？
您必须先检查网站上的javascript，了解它是如何工作的，然后才能覆盖它来执行类似的操作，但即使如此，浏览器安全性仍会弹出一个对话框，要求您确认下载。这就给你留下了两个选择（据我所知）：

确认警报对话框
确定文件在远程服务器上的位置，并使用GET下载文件

关于这两个方面的详细信息我都帮不上忙，因为我不懂python，但希望这有助于……
使用selenium webdriver
使用firefox配置文件下载您的文件。此配置文件跳过firefox的对话框。
一致：-
   pro.setPreference("browser.downLoad.folderList", 0);

browser.download.folderList的值可以设置为0、1或2。设置为0时，Firefox会将通过浏览器下载的所有文件保存在用户桌面上。设置为1时，这些下载将存储在downloads文件夹中。当设置为2时，将再次使用为最近下载指定的位置
您需要实现的Firefox配置文件代码：-
        FirefoxProfile pro=new FirefoxProfile();
        pro.setPreference("browser.downLoad.folderList", 0);
        pro.setPreference("browser.helperApps.neverAsk.saveToDisk", "Applications/zip");
        WebDriver driver=new FirefoxDriver(pro);
        driver.get("http://selenium-release.storage.googleapis.com/2.47/selenium-java-2.47.1.zip");

希望它能帮助您：）
初始化驱动程序时，请确保设置下载首选项
对于Firefox：
ff_prof.set_preference( "browser.download.manager.showWhenStarting", False )
ff_prof.set_preference( "browser.download.folderList", 2 )
ff_prof.set_preference( "browser.download.useDownloadDir", True )
ff_prof.set_preference( "browser.download.dir", self.driver_settings['download_folder'] )

##
# if FF still shows the download dialog, make sure that the filetype is included below
# filetype string options can be found in '~/.mozilla/$USER_PROFILE/mimeTypes.rdf'
##
mime_types = ("application/pdf", "text/html")

ff_prof.set_preference( "browser.helperApps.neverAsk.saveToDisk", (", ".join( mime_types )) )
ff_prof.set_preference( "browser.helperApps.neverAsk.openFile", (", ".join( mime_types )) )

对于铬：
capabilities['chromeOptions']['prefs']['download.prompt_for_download'] = False
capabilities['chromeOptions']['prefs']['download.default_directory'] = self.driver_settings['download_folder']

转发下载：
下面是我用来将文件从self.driver\u settings['download\u folder']
（如上设置）重定向到实际需要文件的位置的代码（to\u path
可以是现有文件夹或文件路径）。如果你在linux上，我建议使用tmpfs
，这样/tmp
就保存在ram中，然后将self.driver\u设置['download\u folder']
设置为“/tmp/driver\u downloads/”
。请注意，下面的函数假定self.driver\u settings['download\u folder']
始终以空文件夹开始（这是它定位正在下载的文件的方式，因为它是目录中唯一的文件）
def moveDriverDownload（self、to_路径、允许的_扩展名、允许_重命名（如果_存在）=False、超时_秒数=None）：
如果超时\u秒为无：
超时时间=30秒
等待增量=时间增量（秒=超时秒）
start\u download\u time=datetime.now（）
Hastinmedout=lambda:datetime.now（）-start\u download\u time>wait\u delta
断言isinstance（允许的扩展名，列表）或isinstance（允许的扩展名，元组）或isinstance（允许的扩展名，集合），“找到允许的扩展名类型为“{}”，而不是列表。格式（类型（允许的扩展名））
允许的_扩展=[elem.lower（）.strip（）用于允许的_扩展中的元素]
允许的_扩展=[elem if elem.startswith（“.”else“.”+elem for elem in allowed_扩展]
如果在允许的扩展名中没有“.part”：
允许的扩展名。追加（“.part”）
re_extension_str=“（？：”+（“$）|（？：”。连接（允许的_extension中的元素的re.escape（elem））+“$”
getFiles=lambda:next（os.walk（self.driver\u设置['download\u folder']）[2]
尽管如此：
如果您退出（）：
del允许的扩展[允许的扩展.索引（“.part”）]
raise DownloadTimeoutError（“等待扩展名为{}的文件下载时{}秒后超时”。格式（超时秒，允许的扩展名））
睡眠时间（0.5）
file_list=[getFiles（）中elem的elem如果重新搜索（re_扩展名_str，elem）]
如果len（文件列表）>0：
打破
文件\u列表=[r.search（r“（？i）^（.*？（:\.part）？$”，elem.groups（）[0]用于文件\u列表中的元素]
如果len（文件列表）>1：
如果len（文件列表）=2：
如果文件列表[0]！=文件列表[1]：
引发异常（“文件列表[0]！=文件列表[1]{}！={}”。格式（文件列表[0]，文件列表[1]））
其他：
引发异常（“len（文件列表）>1.found{}.format（文件列表））
file_path=“%s%s”%（self.driver_设置['download_folder']，文件列表[0]）
#通过检查文件是否被任何程序打开，查看文件是否仍在下载中
如果platform.system（）=“Linux”：
openProcess=lambda:subprocess.Popen（'lsof | grep“%s”%file\u路径，shell=True，stdout=subprocess.PIPE，stdin=subprocess.PIPE，stderr=subprocess.PIPE）
fileIsFinished=lambda txt:txt.strip（）
elif platform.system（）=“Windows”：
#“句柄”程序必须位于路径中
# https://technet.microsoft.com/en-us/sysinternals/bb896655
openProcess=lambda:subprocess.Popen（'handle“%s”'%file\u path.replace（“/”，“\\”），shell=True，stdout=subprocess.PIPE，stdin=subprocess.PIPE，stderr=subprocess.PIPE）
fileIsFinished=lambda txt:bool（重新搜索（“（？i）未找到匹配的句柄”，txt））
其他：
引发异常（“无法识别的{}”格式的platform.system（）（platform.system（））
尽管如此：
lsof_进程=openProcess（）
lsof_结果=lsof_进程。通信（）
如果len（lsof_结果）！=2:
引发异常（“len（lsof_结果）！=2.found{}”。格式（lsof_结果））
如果lsof_结果[1]。条带（）！="":
引发异常（'lsof_结果[1].strip（）！=“”。找到{}。格式（lsof_结果））
如果文件已完成（lsof_结果[0]）：
打破
如果您退出（）：
引发异常（“在等待“{}”从写入中释放出来的{}”秒后超时。找到“{}”的lsof/handle.form
def moveDriverDownload(self, to_path, allowable_extensions, allow_rename_if_exists=False, timeout_seconds=None):
    if timeout_seconds is None:
        timeout_seconds = 30
    wait_delta = timedelta( seconds=timeout_seconds )
    start_download_time = datetime.now()
    hasTimedOut = lambda: datetime.now() - start_download_time > wait_delta

    assert isinstance(allowable_extensions, list) or isinstance(allowable_extensions, tuple) or isinstance(allowable_extensions, set), "instead of a list, found allowable_extensions type of '{}'".format(type(allowable_extensions))
    allowable_extensions = [ elem.lower().strip() for elem in allowable_extensions ]
    allowable_extensions = [ elem if elem.startswith(".") else "."+elem for elem in allowable_extensions ]

    if not ".part" in allowable_extensions:
        allowable_extensions.append( ".part" )

    re_extension_str = "(?:" + ("$)|(?:".join( re.escape(elem) for elem in allowable_extensions )) + "$)"

    getFiles = lambda: next( os.walk( self.driver_settings['download_folder'] ) )[2]

    while True:
        if hasTimedOut():
            del allowable_extensions[ allowable_extensions.index(".part") ]
            raise DownloadTimeoutError( "timed out after {} seconds while waiting on file download with extension in {}".format(timeout_seconds, allowable_extensions) )

        time.sleep( 0.5 )

        file_list = [ elem for elem in getFiles() if re.search( re_extension_str, elem ) ]
        if len(file_list) > 0:
            break

    file_list = [ re.search( r"(?i)^(.*?)(?:\.part)?$", elem ).groups()[0] for elem in file_list ]

    if len(file_list) > 1:
        if len(file_list) == 2:
            if file_list[0] != file_list[1]:
                raise Exception( "file_list[0] != file_list[1] <==> {} != {}".format(file_list[0], file_list[1]) )
        else:
            raise Exception( "len(file_list) > 1. found {}".format(file_list) )

    file_path = "%s%s" %(self.driver_settings['download_folder'], file_list[0])

    # see if the file is still being downloaded by checking if it's open by any programs
    if platform.system() == "Linux":
        openProcess = lambda: subprocess.Popen( 'lsof | grep "%s"' %file_path, shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE )
        fileIsFinished = lambda txt: txt.strip() == ""
    elif platform.system() == "Windows":
        # 'handle' program must be in PATH
        # https://technet.microsoft.com/en-us/sysinternals/bb896655
        openProcess = lambda: subprocess.Popen( 'handle "%s"' %file_path.replace("/", "\\"), shell=True, stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.PIPE )
        fileIsFinished = lambda txt: bool( re.search("(?i)No matching handles found", txt) )
    else:
        raise Exception( "unrecognised platform.system() of '{}'".format(platform.system()) )

    while True:
        lsof_process = openProcess()
        lsof_result = lsof_process.communicate()

        if len(lsof_result) != 2:
            raise Exception( "len(lsof_result) != 2. found {}".format(lsof_result) )
        if lsof_result[1].strip() != "":
            raise Exception( 'lsof_result[1].strip() != "". found {}'.format(lsof_result) )
        if fileIsFinished( lsof_result[0] ):
            break

        if hasTimedOut():
            raise Exception( "timed out after {} seconds waiting for '{}' to be freed from writing. found lsof/handle of '{}'".format(timeout_seconds, file_path, lsof_result[0]) )

        time.sleep( 0.5 )

    to_path = to_path.replace("\\", "/")
    if os.path.isdir( to_path ):
        if not to_path.endswith("/"):
            to_path += "/"

        to_path += file_list[0]

    i = 2
    while os.path.exists( to_path ):
        if not allow_rename_if_exists:
            raise Exception( "{} already exists".format(to_path) )

        to_path = re.sub( "^(.*/)(.*?)(?:-" + str(i-1) + r")?(|\..*?)?$", r"\1\2-%i\3" %i, to_path )
        i += 1

    shutil.move( file_path, to_path )

    return to_path[ to_path.rindex("/")+1: ]