Download 使用DataLakeFileClient和进度条下载文件

Download 使用DataLakeFileClient和进度条下载文件,download,progress-bar,chunks,tqdm,azure-data-lake-gen2,Download,Progress Bar,Chunks,Tqdm,Azure Data Lake Gen2,我需要使用DatalakeffileClient从Azure下载一个大文件,并在下载过程中显示一个类似TQM的进度条。下面是我用一个较小的测试文件尝试的代码 # Download a File test_file = DataLakeFileClient.from_connection_string(my_conn_str, file_system_name=fs_name, file_path="161263.tmp") download = test_file.down

我需要使用DatalakeffileClient从Azure下载一个大文件,并在下载过程中显示一个类似TQM的进度条。下面是我用一个较小的测试文件尝试的代码

# Download a File
test_file = DataLakeFileClient.from_connection_string(my_conn_str, file_system_name=fs_name, file_path="161263.tmp")

download = test_file.download_file()
blocks = download.chunks()
print(f"File Size = {download.size}, Number of blocks = {len(blocks)}")

with open("./newfile.tmp", "wb") as my_file:
    for block in tqdm(blocks):
        my_file.write(block)
结果如下所示,在jupyter笔记本中,块数与文件大小相同。


如何使块数正确且进度条正常工作?

使用卡盘时,应注意只有文件大小大于
32MB
33554432字节
),然后是文件大小(此处,文件大小表示
总文件大小-32MB
)将被拆分为每个块大小为4MB的块

例如,如果文件大小为39MB,它将被分成3个块。第一个块是32MB,第二个块是4MB,第三个块是3MB(
39MB-32MB-4MB

下面是一个例子,它可以在我这边很好地工作:

from tqdm import tqdm
from azure.storage.filedatalake import DataLakeFileClient
import math

conn_str = "xxxxxxxx"
file_system_name="xxxx"
file_name="ccc.txt"

test_file = DataLakeFileClient.from_connection_string(conn_str,file_system_name,file_name)

download = test_file.download_file()

blocks = download.chunks()

number_of_blocks = 0

#if the file size is larger than 32MB
if len(blocks) > 33554432:
    number_of_blocks = math.ceil((len(blocks) - 33554432) / 1024 / 1024 / 4) + 1
else:
    number_of_blocks = 1
    
print(f"File Size = {download.size}, Number of blocks = {number_of_blocks}")

#initialize a tqdm instance
progress_bar = tqdm(total=download.size,unit='iB',unit_scale=True)

with open("D:\\a11\\ccc.txt","wb") as my_file:
    for block in blocks:
        #update the progress bar
        progress_bar.update(len(block))

        my_file.write(block)

progress_bar.close()

print("**completed**")

谢谢。这正是我想要的。