Python zlib.error:解压缩时出错-3:标头检查不正确_Python_Gzip_Zlib

Python zlib.error:解压缩时出错-3:标头检查不正确

python

Python zlib.error:解压缩时出错-3:标头检查不正确,python,gzip,zlib,Python,Gzip,Zlib,我有一个gzip文件，我正试图通过Python读取它，如下所示： import zlib do = zlib.decompressobj(16+zlib.MAX_WBITS) fh = open('abc.gz', 'rb') cdata = fh.read() fh.close() data = do.decompress(cdata) 它抛出以下错误： zlib.error: Error -3 while decompressing: incorrect header check zl

我有一个gzip文件，我正试图通过Python读取它，如下所示：

import zlib

do = zlib.decompressobj(16+zlib.MAX_WBITS)
fh = open('abc.gz', 'rb')
cdata = fh.read()
fh.close()
data = do.decompress(cdata)

它抛出以下错误：

zlib.error: Error -3 while decompressing: incorrect header check

zlib.error: Error -3 while decompressing: incorrect header check

如何克服它？

更新：解释了问题，应该是公认的答案

请尝试
gzip
模块，下面的代码直接来自

有趣的是，我在尝试使用Python处理堆栈溢出API时遇到了这个错误
我设法从gzip目录使用
GzipFile
对象，大致如下：

import gzip gzip_file = gzip.GzipFile(fileobj=open('abc.gz', 'rb')) file_contents = gzip_file.read()
我刚刚解决了解压缩gzip数据时的“不正确的头检查”问题
您需要设置-WindowBits=>wantgzip来调用inflateInit2（使用2版本）
是的，这可能非常令人沮丧。通常对文档的浅读会将Zlib作为Gzip压缩的API，但默认情况下（不使用gz*方法），它不会创建或解压缩Gzip格式。您必须发送此非显著记录的标志。
您有以下错误：

zlib.error: Error -3 while decompressing: incorrect header check

zlib.error: Error -3 while decompressing: incorrect header check
这很可能是因为您试图检查不存在的标题，例如，您的数据遵循
RFC1951
（
deflate
压缩格式），而不是
RFC1950
（
zlib
压缩格式）或
RFC1952
（
gzip
压缩格式）
选择窗口位但是
zlib
可以解压缩所有这些格式：

要（反）压缩
deflate
格式，请使用
wbits=-zlib.MAX\u wbits

要（反）压缩
zlib
格式，请使用
wbits=zlib.MAX\u wbits

要（反）压缩
gzip
格式，请使用
wbits=zlib.MAX\u wbits|16

请参阅（章节
inflateInit2
）中的文档
例子测试数据：

>>> deflate_compress = zlib.compressobj(9, zlib.DEFLATED, -zlib.MAX_WBITS) >>> zlib_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS) >>> gzip_compress = zlib.compressobj(9, zlib.DEFLATED, zlib.MAX_WBITS | 16) >>> >>> text = '''test''' >>> deflate_data = deflate_compress.compress(text) + deflate_compress.flush() >>> zlib_data = zlib_compress.compress(text) + zlib_compress.flush() >>> gzip_data = gzip_compress.compress(text) + gzip_compress.flush() >>>

zlib的明显测试： >>> zlib.decompress(zlib_data) 'test' 测试放气： >>> zlib.decompress(deflate_data) Traceback (most recent call last): File "<stdin>", line 1, in <module> zlib.error: Error -3 while decompressing data: incorrect header check >>> zlib.decompress(deflate_data, -zlib.MAX_WBITS) 'test' 该数据还与gzip 模块兼容： >>> import gzip >>> import StringIO >>> fio = StringIO.StringIO(gzip_data) # io.BytesIO for Python 3 >>> f = gzip.GzipFile(fileobj=fio) >>> f.read() 'test' >>> f.close() 自动报头检测（zlib或gzip）将32 添加到windowBits 将触发标头检测 >>> zlib.decompress(gzip_data, zlib.MAX_WBITS|32) 'test' >>> zlib.decompress(zlib_data, zlib.MAX_WBITS|32) 'test' 改用gzip 或者您可以忽略zlib ，直接使用gzip 模块；但是，gzip 使用zlib fh = gzip.open('abc.gz', 'rb') cdata = fh.read() fh.close() 我的案例是解压缩存储在Bullhorn数据库中的电子邮件。代码段如下所示： import pyodbc import zlib cn = pyodbc.connect('connection string') cursor = cn.cursor() cursor.execute('SELECT TOP(1) userMessageID, commentsCompressed FROM BULLHORN1.BH_UserMessage WHERE DATALENGTH(commentsCompressed) > 0 ') for msg in cursor.fetchall(): #magic in the second parameter, use negative value for deflate format decompressedMessageBody = zlib.decompress(bytes(msg.commentsCompressed), -zlib.MAX_WBITS) 只需添加标题“接受编码”：“标识” import requests requests.get('http://gett.bike/', headers={'Accept-Encoding': 'identity'}) 要解压缩内存中不完整的gzip字节，很有用，但它忽略了我认为必要的zlib.decompressobj 调用： incomplete_decompressed_content = zlib.decompressobj(wbits=zlib.MAX_WBITS | 16).decompress(incomplete_gzipped_content) 请注意，zlib.MAX|WBITS | 16 是15 | 16 ，即31。有关wbits 的一些背景信息，请参阅信用证：注意到了zlib.decompressobj 调用。这并没有回答最初的问题，但它可能会帮助其他人解压时zlib.error:error-3：错误的头检查也出现在下面的示例中： b64_encoded_bytes = base64.b64encode(zlib.compress(b'abcde')) encoded_bytes_representation = str(b64_encoded_bytes) # this the cause zlib.decompress(base64.b64decode(encoded_bytes_representation)) 这个例子是我在一些遗留Django代码中遇到的一些问题的最小复制，其中编码的字节（来自HTTP POST）存储在Django（而不是文件）中从数据库读取CharField 值时，将对该值调用str（），而不使用显式编码，如中所示 str（）如果既不提供编码也不提供错误，str（object）将返回object.str（），这是object的“非正式”或可良好打印的字符串表示形式。对于字符串对象，这是字符串本身。若对象并没有str（）方法，那个么str（）将返回repr（对象）因此，在本例中，我们无意中进行了base64解码 “b'eJxLTEpOSQUABcgB8A==” 而不是 b'eJxLTEpOSQUABcgB8A==” 如果使用显式编码，例如str（b64编码字节，'utf-8'），则示例中的zlib 解压缩将成功针对Django的注释：特别棘手的是：这个问题只在从数据库检索值时出现。例如，请参见下面的测试，该测试通过（在Django 3.0.3中）：其中MyModel 为 class MyModel(models.Model): data = models.CharField(max_length=100) 出现了相同的错误：回溯（最近一次调用）：文件“”，第1行，在文件“/usr/lib/python2.6/gzip.py”中，第212行，在读取self.\u read（readsize）文件“/usr/lib/python2.6/gzip.py”中，第271行，在读取解压缩=self.decompress.decompress（buf）zlib中。错误：解压缩时错误-3：无效的代码长度set@VarunVyas很抱歉我不能重现你的错误。它一定与您的输入数据有关。您的输入文件是用gzip生成的吗？命令行中的gunzip解压正确吗？这是：zlib.decompress（gzip_数据，zlib.MAX_WBITS | 32） @dnozay，我尝试过使用上面的zlib.decompress（zlib_数据，zlib.MAX_WBITS | 32）调整，但它没有起作用。我仍然得到不正确的标题检查错误。如果我尝试使用上面提到的其他选项，仍然会出现各种错误。还有什么东西会触发这个错误吗？@Minu，当然——任何类型的数据如果不是有效的deflate、zlib或gzip内容，都会通过标题检查。zlib.MAX|WBITS | 16 对我有用，谢谢。从中推断出这一点是非常不平凡的。另外，aiohttp 没有透明地解码内容编码：gzip，这让人恼火。因此，您对解压问题的回答是：不要从一开始就压缩它？？服务器并不总是尊重声明的头，因此这无法可靠地工作。 class MyModel(models.Model): data = models.CharField(max_length=100)