得到一个;EOFError:已到达流的末端;尝试在Python和smart_打开的情况下动态卸载大文件时出错
我正在尝试从远程Apache服务器下载并解压缩一组文件。我提供了一个要动态下载和解压缩的.tbz(tar.bz2)文件列表。目标是通过tar解压器将它们从远程Apache服务器流式传输到我的amazonaws S3 bucket。我这样做是因为文件可以大到30Gb 我使用“smart_open”python库来抽象https和s3管理 我在这里提供的代码适用于小文件。当我尝试使用更大的文件(超过8Mb)执行此操作时,我会出现以下错误:得到一个;EOFError:已到达流的末端;尝试在Python和smart_打开的情况下动态卸载大文件时出错,python,python-3.x,amazon-s3,tar,tarfile,Python,Python 3.x,Amazon S3,Tar,Tarfile,我正在尝试从远程Apache服务器下载并解压缩一组文件。我提供了一个要动态下载和解压缩的.tbz(tar.bz2)文件列表。目标是通过tar解压器将它们从远程Apache服务器流式传输到我的amazonaws S3 bucket。我这样做是因为文件可以大到30Gb 我使用“smart_open”python库来抽象https和s3管理 我在这里提供的代码适用于小文件。当我尝试使用更大的文件(超过8Mb)执行此操作时,我会出现以下错误: "EOFError: End of stream alrea
"EOFError: End of stream already reached"
以下是回溯:
Traceback (most recent call last):
File "./script.py", line 28, in <module>
download_file(fileName)
File "./script.py", line 21, in download_file
for line in tfext:
File "/.../lib/python3.7/tarfile.py", line 706, in readinto
buf = self.read(len(b))
File "/.../lib/python3.7/tarfile.py", line 695, in read
b = self.fileobj.read(length)
File "/.../lib/python3.7/tarfile.py", line 537, in read
buf = self._read(size)
File "/.../lib/python3.7/tarfile.py", line 554, in _read
buf = self.cmp.decompress(buf)
EOFError: End of stream already reached
我希望能够以与处理小文件完全相同的方式处理大文件。压缩tar提取需要文件搜索,这可能无法使用smart\u open创建的虚拟文件描述符。另一种方法是在处理之前将数据下载到块存储
from smart_open import open
import tarfile
import boto3
from codecs import open as copen
filenames = ['test.tar.bz2',]
def download_file(fileName):
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucketname')
obj = bucket.Object(fileName)
local_filename = '/tmp/{}'.format(fileName)
obj.download_file(local_filename)
tf = tarfile.open(local_filename, 'r:bz2')
for member in tf.getmembers():
tf.extract(member)
fd = open(member.name, 'rb')
print(member, len(fd.read()))
if __name__ == '__main__':
for f in filenames:
download_file(f)
压缩tar提取需要文件搜索,这可能无法使用smart_open创建的虚拟文件描述符。另一种方法是在处理之前将数据下载到块存储
from smart_open import open
import tarfile
import boto3
from codecs import open as copen
filenames = ['test.tar.bz2',]
def download_file(fileName):
s3 = boto3.resource('s3')
bucket = s3.Bucket('bucketname')
obj = bucket.Object(fileName)
local_filename = '/tmp/{}'.format(fileName)
obj.download_file(local_filename)
tf = tarfile.open(local_filename, 'r:bz2')
for member in tf.getmembers():
tf.extract(member)
fd = open(member.name, 'rb')
print(member, len(fd.read()))
if __name__ == '__main__':
for f in filenames:
download_file(f)