Python-使用Selenium下载PDF并保存到磁盘_Python_Selenium_Web Scraping_Download

Python-使用Selenium下载PDF并保存到磁盘

python selenium web-scraping download

Python-使用Selenium下载PDF并保存到磁盘,python,selenium,web-scraping,download,Python,Selenium,Web Scraping,Download,我正在创建一个从网站下载PDF并将其保存到磁盘的应用程序。我理解Requests模块能够做到这一点，但无法处理下载背后的逻辑（文件大小、进度、剩余时间等）到目前为止，我已经使用selenium创建了该程序，并希望最终将其合并到GUI Tkinter应用程序中处理下载、跟踪并最终创建进度条的最佳方式是什么这是我目前的代码： from selenium import webdriver from time import sleep import requests import secret

我正在创建一个从网站下载PDF并将其保存到磁盘的应用程序。我理解Requests模块能够做到这一点，但无法处理下载背后的逻辑（文件大小、进度、剩余时间等）

到目前为止，我已经使用selenium创建了该程序，并希望最终将其合并到GUI Tkinter应用程序中

处理下载、跟踪并最终创建进度条的最佳方式是什么

这是我目前的代码：

from selenium import webdriver
from time import sleep 
import requests

import secrets

class manual_grabber():
    """ A class creating a manual downloader for the Roger Technology website """
    def __init__(self):
    """ Initialize attributes of manual grabber """
    self.driver = webdriver.Chrome('\\Users\\Joel\\Desktop\\Python\\manual_grabber\\chromedriver.exe')

def login(self):
    """ Function controlling the login logic """
    self.driver.get('https://rogertechnology.it/en/b2b')

    sleep(1)

    # Locate elements and enter login details
    user_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[6]')
    user_in.send_keys(secrets.username)   

    pass_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[7]')
    pass_in.send_keys(secrets.password)

    enter_button = self.driver.find_element_by_xpath('/html/body/div[2]/form/div/input')
    enter_button.click()

    # Click Self Service Area button
    self_service_button = self.driver.find_element_by_xpath('//*[@id="bs-example-navbar-collapse-1"]/ul/li[1]/a')
    self_service_button.click()

def download_file(self):
    """Access file tree and navigate to PDF's and download"""
    # Wait for all elements to load 
    sleep(3)

    # Find and switch to iFrame
    frame = self.driver.find_element_by_xpath('//*[@id="siteOutFrame"]/iframe')
    self.driver.switch_to.frame(frame)

    # Find and click tech manuals button 
    tech_manuals_button = self.driver.find_element_by_xpath('//*[@id="fileTree_1"]/ul/li/ul/li[6]/a')
    tech_manuals_button.click()


bot = manual_grabber()
bot.login()
bot.download_file()

总之，我想让这段代码在网站上下载PDF，将它们存储在特定的目录中（以JQuery文件树中的父文件夹命名），并跟踪进度（文件大小、剩余时间等）

下面是DOM：

我希望这是足够的信息。如果需要更多信息，请告诉我。

我建议为此使用

请求和模块。
下面是一个示例代码，它有效地完成了下载和更新进度条的艰苦工作
from tqdm import tqdm
import requests

url = "http://www.ovh.net/files/10Mb.dat" #big file test
# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes= int(response.headers.get('content-length', 0))
block_size = 1024 #1 Kibibyte
progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as file:
    for data in response.iter_content(block_size):
        progress_bar.update(len(data)) #change this to your widget in tkinter
        file.write(data)
progress_bar.close()
if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
    print("ERROR, something went wrong")

block_size
是您的文件大小，剩余时间
可以根据剩余块大小计算每秒执行的迭代次数。这里有一个替代方案-
问题解决了吗？@AzyCrw4282我没有机会尝试，因为这是一个我正在工作的项目，我还没有在办公室。我来试试看，看我们怎么办。