Java GZIPInputStream在末尾填充零？_Java_Junit_Gzip_Guava

Java GZIPInputStream在末尾填充零？

java junit

Java GZIPInputStream在末尾填充零？,java,junit,gzip,guava,Java,Junit,Gzip,Guava,我在解压文件时遇到了一个奇怪的问题，我正在考虑使用字符集UTF-8。我在用番石榴图书馆 public static byte[] gzip(final CharSequence cs, final Charset charset) throws IOException { final ByteArrayOutputStream os = new ByteArrayOutputStream(cs.length()); final GZIPOutputStream gzipOs =

我在解压文件时遇到了一个奇怪的问题，我正在考虑使用字符集UTF-8。我在用番石榴图书馆

public static byte[] gzip(final CharSequence cs, final Charset charset) throws IOException {
    final ByteArrayOutputStream os = new ByteArrayOutputStream(cs.length());
    final GZIPOutputStream gzipOs = new GZIPOutputStream(os);
    gzipOs.write(charset.encode(CharBuffer.wrap(cs)).array());
    Closeables.closeQuietly(gzipOs);
    return os.toByteArray();
}

public static boolean gzipToFile(final CharSequence from, final File to, final Charset charset) {
    try {
        Files.write(StreamUtils.gzip(from, charset), to);
        return true;
    } catch (final IOException e) {
        // ignore
    }
    return false;
}

public static String gunzipFromFile(final File from, final Charset charset) {
    String str = null;
    try {
        str = charset.decode(ByteBuffer.wrap(gunzip(Files.toByteArray(from)))).toString();
    } catch (final IOException e) {
        // ignore
    }
    return str;
}

public static byte[] gunzip(final byte[] b) throws IOException {
    GZIPInputStream gzipIs = null;
    final byte[] bytes;
    try {
        gzipIs = new GZIPInputStream(new ByteArrayInputStream(b));
        bytes = ByteStreams.toByteArray(gzipIs);
    } finally {
        Closeables.closeQuietly(gzipIs);
    }
    return bytes;
}

这里有一棵小松树。为了进行测试，我使用了不同语言的lorem ipsum，如英语、德语、俄语。。。我先将原始文本压缩成一个文件，然后解压缩文件并与原始文本进行比较：

@Test
public void gzip() throws IOException {
    final String originalText = Files.toString(ORIGINAL_IPSUM_LOREM, Charsets.UTF_8);

    // create temporary file
    final File tmpFile = this.tmpFolder.newFile("loremIpsum.txt.gz");

    // check if gzip write is OK
    final boolean status = StreamUtils.gzipToFile(originalText, tmpFile, Charsets.UTF_8);
    Assertions.assertThat(status).isTrue();
    Assertions.assertThat(Files.toByteArray(tmpFile)).isEqualTo(Files.toByteArray(GZIPPED_IPSUM_LOREM));

    // unzip it again
    final String uncompressedString = StreamUtils.gunzipFromFile(tmpFile, Charsets.UTF_8);
    Assertions.assertThat(uncompressedString).isEqualTo(originalText);
}

JUnit出现以下故障：

调试器显示未压缩文本和原始文本之间的差异：

[-17, -69, -65, 76, 111, ... (omitted) ... , -117, 32, -48, -66, -48, -76, -47, -128, 32, -48, -78, -48, -75, -47, -127, 46, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ... (omitted) ... , 0, 0, 0, 0]

。。原始文本没有尾随零：

[-17, -69, -65, 76, 111, ... (omitted) ... , -117, 32, -48, -66, -48, -76, -47, -128, 32, -48, -78, -48, -75, -47, -127, 46]

你知道有什么不对吗？？？谢谢：-）

我认为问题在于：

    charset.encode(CharBuffer.wrap(cs)).array()

for

array（）

表示它返回ByteBuffer的后备数组。但是备份数组可能大于缓冲区的有效内容。。。。我怀疑在这种情况下是这样的

FWIW。。。我怀疑缓冲区对象和ByteArray流对象的显式用户对性能有多大帮助

我想你最好这样做：

public static boolean gzipToFile(CharSequence from, File to, Charset charset) {
    try (FileOutputStream fos = new FileOutputStream(to);
         BufferedOutputStream bos = new BufferedOutputStream(fos);
         GZIPOutputStream gzos = new GZIPOutputStream(bos);
         OutputStreamWriter w = new OutputStreamWriter(gzos, charset)) {
        w.append(from);
        w.close();
        return true;
    } catch (final IOException e) {
        // ignore
    }
    return false;
}

（与read等效。）

为什么?？我怀疑到中间ByteArray流的额外拷贝很可能会抵消使用缓冲区所获得的潜在加速

此外，我的直觉是，压缩/减压步骤将主宰一切。

谢谢你的提示。如果有人感兴趣，我是如何解决的：final ByteBuffer bb=charset.encode（CharBuffer.wrap（cs））；write（Arrays.copyOfRange（bb.array（），bb.arrayOffset（）+bb.position（），bb.arrayOffset（）+bb.limit（））@pitschr使用三个参数

write

方法

gzipOs.write（bb.array（），bb.arrayOffset（）+bb.position（），bb.remaining（））

会更有效，因为这不需要复制底层字节数组。