在python中解压嵌套的zip文件_Python_Zip

在python中解压嵌套的zip文件

python

在python中解压嵌套的zip文件,python,zip,Python,Zip,我正在寻找一种在python中解压嵌套zip文件的方法。例如，考虑下面的结构（假设名称为方便）：文件夹 ZipfileA.zip ZipfileA1.zip ZipfileA2.zip ZipfileB.zip ZipfileB1.zip ZipfileB2.zip ……等等。我正在尝试访问第二个zip中的文本文件。我当然不想提取所有内容，因为剪切数字会使计算机崩溃（第一层有几百个拉链，第二层几乎有10000个拉链）我一直在玩“zipfile”模块-我能够打开zipfile的

我正在寻找一种在python中解压嵌套zip文件的方法。例如，考虑下面的结构（假设名称为方便）：

文件夹
- ZipfileA.zip
  - ZipfileA1.zip
  - ZipfileA2.zip
- ZipfileB.zip
  - ZipfileB1.zip
  - ZipfileB2.zip

……等等。我正在尝试访问第二个zip中的文本文件。我当然不想提取所有内容，因为剪切数字会使计算机崩溃（第一层有几百个拉链，第二层几乎有10000个拉链）

我一直在玩“zipfile”模块-我能够打开zipfile的第一级。例如：

zipfile_obj = zipfile.ZipFile("/Folder/ZipfileA.zip")
next_layer_zip = zipfile_obj.open("ZipfileA1.zip")

但是，这会返回一个“ZipExtFile”实例（不是文件或zipfile实例）——我无法继续打开这个特定的数据类型。我不能这样做：

data = next_layer_zip.open(data.txt)

但是，我可以通过以下方式“读取”此zip文件：

next_layer_zip.read()

但这是完全没有用的！（即，只能读取压缩数据/goobledigook）

有没有人对我如何在不使用ZipFile.extract的情况下（）进行此操作有任何想法
我遇到了这个，-看起来正是我想要的，但似乎对我不起作用。（继续获取“[Errno 2]没有这样的文件或目录：”对于我正在尝试处理的文件，使用该模块）

任何想法都将不胜感激！！提前感谢
不幸的是，解压缩zip文件需要随机访问归档文件，
ZipFile
方法（更不用说DEFLATE算法本身）只提供流。因此，不解压缩嵌套的zip文件是不可能的。
ZipFile需要一个类似文件的对象，因此您可以使用StringIO将从嵌套zip读取的数据转换为这样的对象。需要注意的是，您将把完整的（仍然压缩的）内部zip加载到内存中

with zipfile.ZipFile('foo.zip') as z: with z.open('nested.zip') as z2: z2_filedata = cStringIO.StringIO(z2.read()) with zipfile.ZipFile(z2_filedata) as nested_zip: print nested_zip.open('data.txt').read()

这是我想出的一个函数

def extract_nested_zipfile(path, parent_zip=None): """Returns a ZipFile specified by path, even if the path contains intermediary ZipFiles. For example, /root/gparent.zip/parent.zip/child.zip will return a ZipFile that represents child.zip """ def extract_inner_zipfile(parent_zip, child_zip_path): """Returns a ZipFile specified by child_zip_path that exists inside parent_zip. """ memory_zip = StringIO() memory_zip.write(parent_zip.open(child_zip_path).read()) return zipfile.ZipFile(memory_zip) if ('.zip' + os.sep) in path: (parent_zip_path, child_zip_path) = os.path.relpath(path).split( '.zip' + os.sep, 1) parent_zip_path += '.zip' if not parent_zip: # This is the top-level, so read from disk parent_zip = zipfile.ZipFile(parent_zip_path) else: # We're already in a zip, so pull it out and recurse parent_zip = extract_inner_zipfile(parent_zip, parent_zip_path) return extract_nested_zipfile(child_zip_path, parent_zip) else: if parent_zip: return extract_inner_zipfile(parent_zip, path) else: # If there is no nesting, it's easy! return zipfile.ZipFile(path)
下面是我如何测试它的：

echo hello world > hi.txt zip wrap1.zip hi.txt zip wrap2.zip wrap1.zip zip wrap3.zip wrap2.zip print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap1.zip').open('hi.txt').read() print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap2.zip/wrap1.zip').open('hi.txt').read() print extract_nested_zipfile('/Users/mattfaus/dev/dev-git/wrap3.zip/wrap2.zip/wrap1.zip').open('hi.txt').read()

对于那些正在寻找提取嵌套zip文件（任何级别的嵌套）并清理原始zip文件的函数的用户：

import zipfile, re, os def extract_nested_zip(zippedFile, toFolder): """ Unzip a zip file and its contents, including nested zip files Delete the zip file(s) after extraction """ with zipfile.ZipFile(zippedFile, 'r') as zfile: zfile.extractall(path=toFolder) os.remove(zippedFile) for root, dirs, files in os.walk(toFolder): for filename in files: if re.search(r'\.zip$', filename): fileSpec = os.path.join(root, filename) extract_nested_zip(fileSpec, root)
我使用python 3.7.3

导入zipfile 输入io 将zipfile.zipfile（'all.zip'）作为z: 将z.open（'nested.zip'）作为z2： z2_filedata=io.BytesIO（z2.read（））将zipfile.zipfile（z2_文件数据）作为嵌套的zip：打印（嵌套的zip.open（'readme.md'）.read（））
这对我很有用。只需将此脚本和嵌套的zip放在同一目录下即可。它还将计算嵌套zip中的文件总数

import os from zipfile import ZipFile def unzip (path, total_count): for root, dirs, files in os.walk(path): for file in files: file_name = os.path.join(root, file) if (not file_name.endswith('.zip')): total_count += 1 else: currentdir = file_name[:-4] if not os.path.exists(currentdir): os.makedirs(currentdir) with ZipFile(file_name) as zipObj: zipObj.extractall(currentdir) os.remove(file_name) total_count = unzip(currentdir, total_count) return total_count total_count = unzip ('.', 0) print(total_count)

对于使用3.3的用户，为了节省时间，需要使用
TypeError:string参数，获得了与行memory\u zip.write（parent\u zip.open（child\u zip\u path.read（））相关的“bytes” 不确定解决方法