Download 使用DataLakeFileClient和进度条下载文件
我需要使用DatalakeffileClient从Azure下载一个大文件,并在下载过程中显示一个类似TQM的进度条。下面是我用一个较小的测试文件尝试的代码Download 使用DataLakeFileClient和进度条下载文件,download,progress-bar,chunks,tqdm,azure-data-lake-gen2,Download,Progress Bar,Chunks,Tqdm,Azure Data Lake Gen2,我需要使用DatalakeffileClient从Azure下载一个大文件,并在下载过程中显示一个类似TQM的进度条。下面是我用一个较小的测试文件尝试的代码 # Download a File test_file = DataLakeFileClient.from_connection_string(my_conn_str, file_system_name=fs_name, file_path="161263.tmp") download = test_file.down
# Download a File
test_file = DataLakeFileClient.from_connection_string(my_conn_str, file_system_name=fs_name, file_path="161263.tmp")
download = test_file.download_file()
blocks = download.chunks()
print(f"File Size = {download.size}, Number of blocks = {len(blocks)}")
with open("./newfile.tmp", "wb") as my_file:
for block in tqdm(blocks):
my_file.write(block)
结果如下所示,在jupyter笔记本中,块数与文件大小相同。
如何使块数正确且进度条正常工作?使用卡盘时,应注意只有文件大小大于
32MB
(33554432字节
),然后是文件大小(此处,文件大小表示总文件大小-32MB
)将被拆分为每个块大小为4MB的块
例如,如果文件大小为39MB,它将被分成3个块。第一个块是32MB,第二个块是4MB,第三个块是3MB(39MB-32MB-4MB
)
下面是一个例子,它可以在我这边很好地工作:
from tqdm import tqdm
from azure.storage.filedatalake import DataLakeFileClient
import math
conn_str = "xxxxxxxx"
file_system_name="xxxx"
file_name="ccc.txt"
test_file = DataLakeFileClient.from_connection_string(conn_str,file_system_name,file_name)
download = test_file.download_file()
blocks = download.chunks()
number_of_blocks = 0
#if the file size is larger than 32MB
if len(blocks) > 33554432:
number_of_blocks = math.ceil((len(blocks) - 33554432) / 1024 / 1024 / 4) + 1
else:
number_of_blocks = 1
print(f"File Size = {download.size}, Number of blocks = {number_of_blocks}")
#initialize a tqdm instance
progress_bar = tqdm(total=download.size,unit='iB',unit_scale=True)
with open("D:\\a11\\ccc.txt","wb") as my_file:
for block in blocks:
#update the progress bar
progress_bar.update(len(block))
my_file.write(block)
progress_bar.close()
print("**completed**")
谢谢。这正是我想要的。