Download 使用DataLakeFileClient和进度条下载文件_Download_Progress Bar_Chunks_Tqdm_Azure Data Lake Gen2

Download 使用DataLakeFileClient和进度条下载文件

download

Download 使用DataLakeFileClient和进度条下载文件,download,progress-bar,chunks,tqdm,azure-data-lake-gen2,Download,Progress Bar,Chunks,Tqdm,Azure Data Lake Gen2,我需要使用DatalakeffileClient从Azure下载一个大文件，并在下载过程中显示一个类似TQM的进度条。下面是我用一个较小的测试文件尝试的代码 # Download a File test_file = DataLakeFileClient.from_connection_string(my_conn_str, file_system_name=fs_name, file_path="161263.tmp") download = test_file.down

我需要使用DatalakeffileClient从Azure下载一个大文件，并在下载过程中显示一个类似TQM的进度条。下面是我用一个较小的测试文件尝试的代码

# Download a File
test_file = DataLakeFileClient.from_connection_string(my_conn_str, file_system_name=fs_name, file_path="161263.tmp")

download = test_file.download_file()
blocks = download.chunks()
print(f"File Size = {download.size}, Number of blocks = {len(blocks)}")

with open("./newfile.tmp", "wb") as my_file:
    for block in tqdm(blocks):
        my_file.write(block)

结果如下所示，在jupyter笔记本中，块数与文件大小相同。

如何使块数正确且进度条正常工作？

使用卡盘时，应注意只有文件大小大于

32MB

（

33554432字节

），然后是文件大小（此处，文件大小表示

总文件大小-32MB

）将被拆分为每个块大小为4MB的块

例如，如果文件大小为39MB，它将被分成3个块。第一个块是32MB，第二个块是4MB，第三个块是3MB（

39MB-32MB-4MB

）

下面是一个例子，它可以在我这边很好地工作：

from tqdm import tqdm
from azure.storage.filedatalake import DataLakeFileClient
import math

conn_str = "xxxxxxxx"
file_system_name="xxxx"
file_name="ccc.txt"

test_file = DataLakeFileClient.from_connection_string(conn_str,file_system_name,file_name)

download = test_file.download_file()

blocks = download.chunks()

number_of_blocks = 0

#if the file size is larger than 32MB
if len(blocks) > 33554432:
    number_of_blocks = math.ceil((len(blocks) - 33554432) / 1024 / 1024 / 4) + 1
else:
    number_of_blocks = 1
    
print(f"File Size = {download.size}, Number of blocks = {number_of_blocks}")

#initialize a tqdm instance
progress_bar = tqdm(total=download.size,unit='iB',unit_scale=True)

with open("D:\\a11\\ccc.txt","wb") as my_file:
    for block in blocks:
        #update the progress bar
        progress_bar.update(len(block))

        my_file.write(block)

progress_bar.close()

print("**completed**")

谢谢。这正是我想要的。