Java Android GZIP解压在缓冲区限制处中断unicode字符

Java Android GZIP解压在缓冲区限制处中断unicode字符,java,android,unicode,gzip,Java,Android,Unicode,Gzip,我将收到的gzip数据解压缩为字符串。问题当缓冲区大小为512时,它在缓冲区限制点处打断unicode字符。结果我得到了带问号的文本。它发生在非拉丁字母上 …а��БГМц… public static String decompress(byte[] compressed) throws IOException { final int BUFFER_SIZE = 512; ByteArrayInputStream is = new ByteArrayInputS

我将收到的gzip数据解压缩为字符串。问题当缓冲区大小为512时,它在缓冲区限制点处打断unicode字符。结果我得到了带问号的文本。它发生在非拉丁字母上

…а��БГМц…

public static String decompress(byte[] compressed) throws IOException {
        final int BUFFER_SIZE = 512;
        ByteArrayInputStream is = new ByteArrayInputStream(compressed);
        GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
        StringBuilder string = new StringBuilder();
        byte[] data = new byte[BUFFER_SIZE];
        int bytesRead;
        while ((bytesRead = gis.read(data)) != -1) {
            string.append(new String(data, 0, bytesRead));
        }
        gis.close();
        is.close();
        return string.toString();
    }

您可以将
gzip输入流
包装成
InputStreamReader
并读取字符而不是字节。这样做,就不会在缓冲区边界出现潜在无效编码的问题。

错误在算法中,假设正在读取的块在UTF-8字节序列边界上结束(和开始)

因此,请按以下步骤操作:

    ByteArrayInputStream is = new ByteArrayInputStream(compressed);
    GZIPInputStream gis = new GZIPInputStream(is, BUFFER_SIZE);
    byte[] data = new byte[BUFFER_SIZE];
    int bytesRead;
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    while ((bytesRead = gis.read(data)) != -1) {
        baos.write(data, 0, bytesRead);
    }
    gis.close();
    is.close();
    return baos.toString("UTF-8");