如何告诉python中的zlib压缩不使用几个字节/字符_Python_Zlib_Pyserial

如何告诉python中的zlib压缩不使用几个字节/字符

python

如何告诉python中的zlib压缩不使用几个字节/字符,python,zlib,pyserial,Python,Zlib,Pyserial,在我的研究中，我正在开发一种通过无线通信设备发送任意数据的工具，该设备通过串行连接。 PySerial用于通信发送到设备的每个命令的结构都封装在一个start/stopbyte中，如果我们的有效负载是数据，它看起来像 cmd = b'\x02' + DATA.encode() + b'\x03' 数据可能很大，通信速度很慢，所以我尝试使用zlib进行压缩 from zlib import compress, decompress DATA_comp = compress(DATA.encod

在我的研究中，我正在开发一种通过无线通信设备发送任意数据的工具，该设备通过串行连接。 PySerial用于通信

发送到设备的每个命令的结构都封装在一个start/stopbyte中，如果我们的有效负载是数据，它看起来像

cmd = b'\x02' + DATA.encode() + b'\x03'

数据可能很大，通信速度很慢，所以我尝试使用zlib进行压缩

from zlib import compress, decompress
DATA_comp = compress(DATA.encode())
cmd = b'\x02' + DATA_comp + b'\x03'

但压缩可能会在有效负载的某个位置引入字符b'\x02'和b'\x03'。这会导致错误，因为设备固件将这些视为控制字节

有没有办法告诉zlib或任何其他压缩方法不要在压缩输出中使用几个字节

tl；dr：压缩将设备无法处理的控制字节引入有效负载中

我们可以将问题分为两部分：

压缩数据。转换压缩数据，使其不包含字节3。

对于第二部分，可以使用许多编码。例如，编码不发出字节3。再进一步说，您可以使用base255编码和有效符号0-2和4-255。

在@JohnZwinck的帮助下，我得出了以下结论，这是一个简单的工作示例

from zlib import compress, decompress
from base64 import b64encode
DATA_comp = compress(DATA.encode())
DATA_enc = b64encode(DATA_comp)
cmd = b'\x02' + DATA_enc + b'\x03'

在接收端则相反

正如@Błotosmętek所指出的，有效载荷的大小再次以一个常数因子增加。使用Ascii85可能更好。

如前所述，让zlib完成它的工作，然后对产生的比特流进行编码以避免被禁止的字节。这可以通过相当于将比特流等概率哈夫曼解码到期望的小于256个符号的数目来高效且快速地完成。然后在另一端使用哈夫曼码对符号流进行编码，将其转换回原始比特流

对于要避免的少量字节，您将从流中提取7位。根据7位的值，再拉一位或不拉一位。将7或8位映射到所需的字节子集。重复将零位附加到输入端，以便允许使用所有的输入位。反向还原，丢弃最后生成的少于8个零位

下面是示例代码：

/*
  avoid.c version 1.0, 2 July 2017

  Copyright (C) 2017 Mark Adler

  This software is provided 'as-is', without any express or implied
  warranty.  In no event will the authors be held liable for any damages
  arising from the use of this software.

  Permission is granted to anyone to use this software for any purpose,
  including commercial applications, and to alter it and redistribute it
  freely, subject to the following restrictions:

  1. The origin of this software must not be misrepresented; you must not
     claim that you wrote the original software. If you use this software
     in a product, an acknowledgment in the product documentation would be
     appreciated but is not required.
  2. Altered source versions must be plainly marked as such, and must not be
     misrepresented as being the original software.
  3. This notice may not be removed or altered from any source distribution.

  Mark Adler
  madler@alumni.caltech.edu
*/

// Take arbitrary binary input and encode it to avoid specified byte values.
// The number of such values to avoid is the parameter "cut". The input is
// taken as a stream of bits. At each step either 7 or 8 bits of input is coded
// to an output byte. As a result, the input bits are expanded by a factor
// between 1 and about 1.143 (rounded up to the next multiple of 8 bits),
// depending on the value of cut and depending on the input data. cut must be
// in the range 1..128. For random input, the average expansion ratio is
// 1/(1-cut/1024).
//
// avoid() does the encoding, and restore() does the decodng. avoid() uses the
// table map[], which maps the values 0..255-cut to the allowed byte values,
// i.e. the byte values that are not cut. invert_map() is provided to invert
// that transformation to make the table unmap[], which is used by restore().

#include <stddef.h>

// Encode input[0..len-1] into a subset of permitted byte values, which number
// cut less than 256. Therefore cut values are cut from the set of possible
// output byte values. map[0..255-cut] is the set of allowed byte values. cut
// must be in the range 1..128. If cut is out of range, zero is returned and no
// encoding is performed. Otherwise the return value is the size of the encoded
// result. size is the size of the output space in bytes, which should be at
// least the maximum possible encoded size, equal to ceiling(len * 8 / 7). The
// return value may be larger than size, in which case only size bytes are
// written to *out, with the remaining encoded data lost. Otherwise the number
// of bytes written to *out is the returned value.
size_t avoid(unsigned char *output, size_t size,
             unsigned char const *input, size_t len,
             unsigned char const *map, unsigned cut) {
    if (len == 0 || cut < 1 || cut > 128)
        return 0;
    unsigned buf = *input, code = buf;
    int bits = 8;
    size_t in = 1, out = 0;
    for (;;) {
        unsigned less = code >> 1;
        if (less < cut) {
            code = less;
            bits -= 7;
        }
        else {
            code -= cut;
            bits -= 8;
        }
        if (out < size)
            output[out] = map[code];
        out++;
        if (in == len && bits <= 0)
            return out;
        if (in < len) {
            if (bits < 8) {
                buf = (buf << 8) + input[in++];
                bits += 8;
            }
            code = buf >> (bits - 8);
        }
        else
            code = buf << (8 - bits);   // pad with zeros
        code &= 0xff;
    }
}

// Invert the map used by avoid() for use by restore().
void invert_map(unsigned char *unmap, unsigned char const *map, unsigned cut) {
    if (cut < 1 || cut > 128)
        return;
    unsigned k = 0;
    do {
        unmap[k++] = 255;
    } while (k < 256);
    k -= cut;
    do {
        k--;
        unmap[map[k]] = k;
    } while (k);
}

// Restore the data input[0..len-1] that was encoded with avoid(), writing the
// restored bytes to *output. The number of restored bytes is returned. size is
// the size of the output space in bytes, which should be at least the maximum
// possible restored size, equal to len. If the returned value is greater than
// size, then only size bytes are written to *output, with the remainder of the
// restored data lost. unmap[k] gives the corresponding code for character k in
// the range 0..255-cut if k is in the allowed set, or 255 if k is not in the
// allowed set. Characters in the input that are not in the allowed set are
// ignored. cut must be in the range 1..128. If cut is out of range, zero is
// returned and no restoration is conducted.
size_t restore(unsigned char *output, size_t size,
               unsigned char const *input, size_t len,
               unsigned char const *unmap, unsigned cut) {
    if (cut < 1 || cut > 128)
        return 0;
    unsigned buf = 0;
    int bits = 0;
    size_t in = 0, out = 0;
    while (in < len) {
        unsigned code = unmap[input[in++]];
        if (code == 255)
            continue;
        if (code < cut) {
            buf <<= 7;
            bits += 7;
        }
        else {
            buf <<= 8;
            bits += 8;
            buf += cut;
        }
        buf += code;
        if (bits >= 8) {
            if (out < size)
                output[out] = buf >> (bits - 8);
            out++;
            bits -= 8;
        }
    }
    return out;
}

是否真的没有办法告诉设备传输N个字节，而不是在看到一个值为X的字节之前进行传输？通常，您只需将消息长度放在最前面。不幸的是，设备不支持这一点，它是一个固定的专有协议…如果您的设备使用的控制字符多于0x02和0x03，请记住选择不包含所有这些字符的编码。Base64当然是一个安全的选择，但是它将有效载荷的大小增加了4/3的常量，所以你可以考虑ASCI85。总的来说，请看@Błotosmętek这是一个很好的观点，我注意到了这一点。在下一步中，我将进行一些性能测试。Ascii85听起来也很有希望。