Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/324.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/selenium/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python-使用Selenium下载PDF并保存到磁盘_Python_Selenium_Web Scraping_Download - Fatal编程技术网

Python-使用Selenium下载PDF并保存到磁盘

Python-使用Selenium下载PDF并保存到磁盘,python,selenium,web-scraping,download,Python,Selenium,Web Scraping,Download,我正在创建一个从网站下载PDF并将其保存到磁盘的应用程序。我理解Requests模块能够做到这一点,但无法处理下载背后的逻辑(文件大小、进度、剩余时间等) 到目前为止,我已经使用selenium创建了该程序,并希望最终将其合并到GUI Tkinter应用程序中 处理下载、跟踪并最终创建进度条的最佳方式是什么 这是我目前的代码: from selenium import webdriver from time import sleep import requests import secret

我正在创建一个从网站下载PDF并将其保存到磁盘的应用程序。我理解Requests模块能够做到这一点,但无法处理下载背后的逻辑(文件大小、进度、剩余时间等)

到目前为止,我已经使用selenium创建了该程序,并希望最终将其合并到GUI Tkinter应用程序中

处理下载、跟踪并最终创建进度条的最佳方式是什么

这是我目前的代码:

from selenium import webdriver
from time import sleep 
import requests

import secrets

class manual_grabber():
    """ A class creating a manual downloader for the Roger Technology website """
    def __init__(self):
    """ Initialize attributes of manual grabber """
    self.driver = webdriver.Chrome('\\Users\\Joel\\Desktop\\Python\\manual_grabber\\chromedriver.exe')

def login(self):
    """ Function controlling the login logic """
    self.driver.get('https://rogertechnology.it/en/b2b')

    sleep(1)

    # Locate elements and enter login details
    user_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[6]')
    user_in.send_keys(secrets.username)   

    pass_in = self.driver.find_element_by_xpath('/html/body/div[2]/form/input[7]')
    pass_in.send_keys(secrets.password)

    enter_button = self.driver.find_element_by_xpath('/html/body/div[2]/form/div/input')
    enter_button.click()

    # Click Self Service Area button
    self_service_button = self.driver.find_element_by_xpath('//*[@id="bs-example-navbar-collapse-1"]/ul/li[1]/a')
    self_service_button.click()

def download_file(self):
    """Access file tree and navigate to PDF's and download"""
    # Wait for all elements to load 
    sleep(3)

    # Find and switch to iFrame
    frame = self.driver.find_element_by_xpath('//*[@id="siteOutFrame"]/iframe')
    self.driver.switch_to.frame(frame)

    # Find and click tech manuals button 
    tech_manuals_button = self.driver.find_element_by_xpath('//*[@id="fileTree_1"]/ul/li/ul/li[6]/a')
    tech_manuals_button.click()


bot = manual_grabber()
bot.login()
bot.download_file()
总之,我想让这段代码在网站上下载PDF,将它们存储在特定的目录中(以JQuery文件树中的父文件夹命名),并跟踪进度(文件大小、剩余时间等)

下面是DOM:

我希望这是足够的信息。如果需要更多信息,请告诉我。

我建议为此使用
请求和
模块。
下面是一个示例代码,它有效地完成了下载和更新进度条的艰苦工作

from tqdm import tqdm
import requests

url = "http://www.ovh.net/files/10Mb.dat" #big file test
# Streaming, so we can iterate over the response.
response = requests.get(url, stream=True)
total_size_in_bytes= int(response.headers.get('content-length', 0))
block_size = 1024 #1 Kibibyte
progress_bar = tqdm(total=total_size_in_bytes, unit='iB', unit_scale=True)
with open('test.dat', 'wb') as file:
    for data in response.iter_content(block_size):
        progress_bar.update(len(data)) #change this to your widget in tkinter
        file.write(data)
progress_bar.close()
if total_size_in_bytes != 0 and progress_bar.n != total_size_in_bytes:
    print("ERROR, something went wrong")

block_size
是您的文件大小,
剩余时间
可以根据剩余块大小计算每秒执行的迭代次数。这里有一个替代方案-

问题解决了吗?@AzyCrw4282我没有机会尝试,因为这是一个我正在工作的项目,我还没有在办公室。我来试试看,看我们怎么办。