Python 如何读取包含在gz文件中的文件名_Python_Gzip_Compression

Python 如何读取包含在gz文件中的文件名

python compression

Python 如何读取包含在gz文件中的文件名,python,gzip,compression,Python,Gzip,Compression,我试图读取一个gz文件： with open(os.path.join(storage_path,file), "rb") as gzipfile: with gzip.GzipFile(fileobj=gzipfile) as datafile: data = datafile.read() 它工作，但我需要的文件名和每个文件的大小包括在我的gz文件。此代码将包含的文件的内容打印到归档文件中如何读取包含在此gz文件中的文件名？Pythongzip模

我试图读取一个gz文件：

with open(os.path.join(storage_path,file), "rb") as gzipfile:
        with gzip.GzipFile(fileobj=gzipfile) as datafile:
            data = datafile.read()

它工作，但我需要的文件名和每个文件的大小包括在我的gz文件。此代码将包含的文件的内容打印到归档文件中

如何读取包含在此gz文件中的文件名？

Python

gzip

模块不提供对该信息的访问

源代码跳过它而不存储它：

if flag & FNAME:
    # Read and discard a null-terminated string containing the filename
    while True:
        s = self.fileobj.read(1)
        if not s or s=='\000':
            break

文件名组件是可选的，不保证存在（我认为在这种情况下，命令行

gzip-c

解压缩选项将使用原始文件名sans

.gz

）。未压缩的文件大小未存储在标头中；您可以在最后四个字节中找到它

要自己从文件头读取文件名，需要重新创建文件头读取代码，并保留文件名字节。以下函数返回该值加上解压缩大小：

import struct
from gzip import FEXTRA, FNAME

def read_gzip_info(gzipfile):
    gf = gzipfile.fileobj
    pos = gf.tell()

    # Read archive size
    gf.seek(-4, 2)
    size = struct.unpack('<I', gf.read())[0]

    gf.seek(0)
    magic = gf.read(2)
    if magic != '\037\213':
        raise IOError('Not a gzipped file')

    method, flag, mtime = struct.unpack("<BBIxx", gf.read(8))

    if not flag & FNAME:
        # Not stored in the header, use the filename sans .gz
        gf.seek(pos)
        fname = gzipfile.name
        if fname.endswith('.gz'):
            fname = fname[:-3]
        return fname, size

    if flag & FEXTRA:
        # Read & discard the extra field, if present
        gf.read(struct.unpack("<H", gf.read(2)))

    # Read a null-terminated string containing the filename
    fname = []
    while True:
        s = gf.read(1)
        if not s or s=='\000':
            break
        fname.append(s)

    gf.seek(pos)
    return ''.join(fname), size

GzipFile本身没有此信息，但是：

文件名（通常）是存档的名称减去

.gz

如果未压缩文件小于4G，则存档文件的最后四个字节包含未压缩大小：

[14]中的

f=open（'fuse-ext2-0.0.7.tar.gz'）
在[15]中：f.seek（-4,2）
在[16]中：导入结构
在[17]中：r=f.read（）
在[18]：struct.unpack（“我在这种模式下求解：
fl = search_files(storage_path)     
for f in fl:
    with open(os.path.join(storage_path,f), "rb") as gzipfile:
        with gzip.GzipFile(fileobj=gzipfile) as datafile:
            data = datafile.read()
        print str(storage_path) + "/" + str(f[:-3]) +  " : " + str(len(data)) + " bytes" #pcap file size

我不知道这是否正确
有什么建议吗？
新代码：
fl = search_files(storage_path)     
for f in fl:
    with open(os.path.join(storage_path,f), "rb") as gzipfile:
        #try with module 2^32
        gzipfile.seek(-4,2)
        r = gzipfile.read()
        print str(storage_path) + "/" + str(f[:-3]) +  " : " + str(struct.unpack('<I' ,r)[0]) + " bytes" #dimensione del file pcap

fl=搜索文件（存储路径）
对于fl中的f：
将open（os.path.join（storage_path，f），“rb”）作为gzip文件：
#尝试使用模块2^32
gzipfile.seek（-4,2）
r=gzipfile.read（）
print str（storage_path）+“/”+str（f[：-3]）+”：“+str（struct.unpack）（“Martjin的解决方案非常好，我已经为Python 3.6++打包了它：
只需pip安装gzinfo

在代码中
import gzinfo

info = gzinfo.read_gz_info('bar.txt.gz')

# info.name is 'foo.txt'
print(info.fname)

gzip只能压缩一个文件。你有gzipped tar存档吗？我有一个gz文件，但我需要知道存档中包含的文件及其大小。在这个gz文件中，我有一个pcap文件不是真的，gzip文件可以包含多个名为“members”的文件。请参阅“文件格式”规范的第节。未压缩的文件大小模2^32是“成员”的最后四个字节。@PavelAnossov：是的，我刚才看到了你的答案。：-）@moose是的；我现在已经将其更新为与Python 3兼容的语法。很抱歉！这是可行的，但显然需要解压缩。如果有很多大文件，这可能会变慢。好的，观察得很好！现在我尝试使用之前发布的代码！感谢不是真的。gzip文件可以包含原始文件名（请参见规范中的FNAME标志）。gzip文件可以，但GzipFile类没有公开它。看到Martijn的答案，他必须自己解析头。我知道，我没有仔细阅读您的答案；我读到了这一点，因为gzip文件规范没有相关信息。我为否决票道歉。如果用户更改了gzip文件的名称以及文件扩展名怎么办？
fl = search_files(storage_path)     
for f in fl:
    with open(os.path.join(storage_path,f), "rb") as gzipfile:
        #try with module 2^32
        gzipfile.seek(-4,2)
        r = gzipfile.read()
        print str(storage_path) + "/" + str(f[:-3]) +  " : " + str(struct.unpack('<I' ,r)[0]) + " bytes" #dimensione del file pcap

import gzinfo

info = gzinfo.read_gz_info('bar.txt.gz')

# info.name is 'foo.txt'
print(info.fname)