得到一个；EOFError：已到达流的末端；尝试在Python和smart_打开的情况下动态卸载大文件时出错_Python_Python 3.x_Amazon S3_Tar_Tarfile

得到一个；EOFError：已到达流的末端；尝试在Python和smart_打开的情况下动态卸载大文件时出错

python python-3.x amazon-s3

得到一个；EOFError：已到达流的末端；尝试在Python和smart_打开的情况下动态卸载大文件时出错,python,python-3.x,amazon-s3,tar,tarfile,Python,Python 3.x,Amazon S3,Tar,Tarfile,我正在尝试从远程Apache服务器下载并解压缩一组文件。我提供了一个要动态下载和解压缩的.tbz（tar.bz2）文件列表。目标是通过tar解压器将它们从远程Apache服务器流式传输到我的amazonaws S3 bucket。我这样做是因为文件可以大到30Gb 我使用“smart_open”python库来抽象https和s3管理我在这里提供的代码适用于小文件。当我尝试使用更大的文件（超过8Mb）执行此操作时，我会出现以下错误： "EOFError: End of stream alrea

我正在尝试从远程Apache服务器下载并解压缩一组文件。我提供了一个要动态下载和解压缩的.tbz（tar.bz2）文件列表。目标是通过tar解压器将它们从远程Apache服务器流式传输到我的amazonaws S3 bucket。我这样做是因为文件可以大到30Gb

我使用“smart_open”python库来抽象https和s3管理

我在这里提供的代码适用于小文件。当我尝试使用更大的文件（超过8Mb）执行此操作时，我会出现以下错误：

"EOFError: End of stream already reached"

以下是回溯：

Traceback (most recent call last):
  File "./script.py", line 28, in <module>
    download_file(fileName)
  File "./script.py", line 21, in download_file
    for line in tfext:
  File "/.../lib/python3.7/tarfile.py", line 706, in readinto
    buf = self.read(len(b))
  File "/.../lib/python3.7/tarfile.py", line 695, in read
    b = self.fileobj.read(length)
  File "/.../lib/python3.7/tarfile.py", line 537, in read
    buf = self._read(size)
  File "/.../lib/python3.7/tarfile.py", line 554, in _read
    buf = self.cmp.decompress(buf)
EOFError: End of stream already reached

我希望能够以与处理小文件完全相同的方式处理大文件。

压缩tar提取需要文件搜索，这可能无法使用smart\u open创建的虚拟文件描述符。另一种方法是在处理之前将数据下载到块存储

from smart_open import open import tarfile import boto3 from codecs import open as copen filenames = ['test.tar.bz2',] def download_file(fileName): s3 = boto3.resource('s3') bucket = s3.Bucket('bucketname') obj = bucket.Object(fileName) local_filename = '/tmp/{}'.format(fileName) obj.download_file(local_filename) tf = tarfile.open(local_filename, 'r:bz2') for member in tf.getmembers(): tf.extract(member) fd = open(member.name, 'rb') print(member, len(fd.read())) if __name__ == '__main__': for f in filenames: download_file(f)

压缩tar提取需要文件搜索，这可能无法使用smart_open创建的虚拟文件描述符。另一种方法是在处理之前将数据下载到块存储

from smart_open import open import tarfile import boto3 from codecs import open as copen filenames = ['test.tar.bz2',] def download_file(fileName): s3 = boto3.resource('s3') bucket = s3.Bucket('bucketname') obj = bucket.Object(fileName) local_filename = '/tmp/{}'.format(fileName) obj.download_file(local_filename) tf = tarfile.open(local_filename, 'r:bz2') for member in tf.getmembers(): tf.extract(member) fd = open(member.name, 'rb') print(member, len(fd.read())) if __name__ == '__main__': for f in filenames: download_file(f)