Java 读取GZIP文件时,GZIPInputStream引发异常

Java 读取GZIP文件时,GZIPInputStream引发异常,java,ftp,gzip,inputstream,gzipinputstream,Java,Ftp,Gzip,Inputstream,Gzipinputstream,我试图从公共匿名ftp读取文件,但遇到了一个问题。我可以很好地读取纯文本文件,但当我尝试读取gzip文件时,会出现以下异常: Exception in thread "main" java.util.zip.ZipException: invalid distance too far back at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164) at java.util.zip.GZIPInputStr

我试图从公共匿名ftp读取文件,但遇到了一个问题。我可以很好地读取纯文本文件,但当我尝试读取gzip文件时,会出现以下异常:

Exception in thread "main" java.util.zip.ZipException: invalid distance too far back

at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:164)
at java.util.zip.GZIPInputStream.read(GZIPInputStream.java:116)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at java_io_FilterInputStream$read.call(Unknown Source)
at GenBankFilePoc.main(GenBankFilePoc.groovy:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
我曾尝试下载该文件并使用
FileInputStream
包装在
GZIPInputStream
中,但遇到了完全相同的问题,因此我认为FTP客户端(即apache)没有问题

下面是一些重现问题的测试代码。它只是试图打印到标准输出:

    FTPClient ftp = new FTPClient();
    ftp.connect("ftp.ncbi.nih.gov");
    ftp.login("anonymous", "");
    InputStream is = new GZIPInputStream(ftp.retrieveFileStream("/genbank/gbbct1.seq.gz"));

    try {
        byte[] buffer = new byte[65536];
        int noRead;

        while ((noRead = is.read(buffer)) != 1) {
            System.out.write(buffer, 0, noRead);
        }
    } finally {
        is.close();
        ftp.disconnect();
    }
我找不到任何关于为什么会发生这种情况的文档,在调试器中通过代码进行跟踪并没有任何帮助。我觉得我错过了一些明显的东西


编辑:我手动下载了该文件,并使用GZIPInputStream将其读入,并且能够很好地打印出来。我在两个不同的Java FTP客户端上尝试过这一点,因为
FTP.retrieveFileStream()
不支持文件查找,所以您需要先完全下载文件

您的代码应该是:

FTPClient ftp = new FTPClient();
ftp.connect("ftp.ncbi.nih.gov");
ftp.login("anonymous", "");
File downloaded = new File("");
FileOutputStream fos = new FileOutputStream(downloaded);
ftp.retrieveFile("/genbank/gbbct1.seq.gz", fos);
InputStream is = new GZIPInputStream(new FileInputStream(downloaded));

try {
    byte[] buffer = new byte[65536];
    int noRead;

    while ((noRead = is.read(buffer)) != 1) {
        System.out.write(buffer, 0, noRead);
    }
} finally {
    is.close();
    ftp.disconnect();
}

因为
ftp.retrieveFileStream()
不支持文件查找,所以您需要先完全下载文件

您的代码应该是:

FTPClient ftp = new FTPClient();
ftp.connect("ftp.ncbi.nih.gov");
ftp.login("anonymous", "");
File downloaded = new File("");
FileOutputStream fos = new FileOutputStream(downloaded);
ftp.retrieveFile("/genbank/gbbct1.seq.gz", fos);
InputStream is = new GZIPInputStream(new FileInputStream(downloaded));

try {
    byte[] buffer = new byte[65536];
    int noRead;

    while ((noRead = is.read(buffer)) != 1) {
        System.out.write(buffer, 0, noRead);
    }
} finally {
    is.close();
    ftp.disconnect();
}

啊,我发现了问题所在。您必须将文件类型设置为FTP.BINARY_file_type,以便从
retrieveFileStream
返回的
SocketInputStream
不会被缓冲

以下代码起作用:

    FTPClient ftp = new FTPClient();
    ftp.connect("ftp.ncbi.nih.gov");
    ftp.login("anonymous", "");
    ftp.setFileType(FTP.BINARY_FILE_TYPE);
    InputStream is = new GZIPInputStream(ftp.retrieveFileStream("/genbank/gbbct1.seq.gz"));

    try {
        byte[] buffer = new byte[65536];
        int noRead;

        while ((noRead = is.read(buffer)) != 1) {
            System.out.write(buffer, 0, noRead);
        }
    } finally {
        is.close();
        ftp.disconnect();
    }
}

啊,我发现了问题所在。您必须将文件类型设置为FTP.BINARY_file_type,以便从
retrieveFileStream
返回的
SocketInputStream
不会被缓冲

以下代码起作用:

    FTPClient ftp = new FTPClient();
    ftp.connect("ftp.ncbi.nih.gov");
    ftp.login("anonymous", "");
    ftp.setFileType(FTP.BINARY_FILE_TYPE);
    InputStream is = new GZIPInputStream(ftp.retrieveFileStream("/genbank/gbbct1.seq.gz"));

    try {
        byte[] buffer = new byte[65536];
        int noRead;

        while ((noRead = is.read(buffer)) != 1) {
            System.out.write(buffer, 0, noRead);
        }
    } finally {
        is.close();
        ftp.disconnect();
    }
}
该代码(一旦您为文件命名)产生与以前完全相同的异常。该代码(一旦您为文件命名)产生与以前完全相同的异常。