Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/loops/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Java从n嵌套zip读取、修改和写入新zip,保留原始结构_Java_Zip_Zipoutputstream_Zipinputstream - Fatal编程技术网

Java从n嵌套zip读取、修改和写入新zip,保留原始结构

Java从n嵌套zip读取、修改和写入新zip,保留原始结构,java,zip,zipoutputstream,zipinputstream,Java,Zip,Zipoutputstream,Zipinputstream,在编辑3中解决的问题 我为这个问题挣扎了一段时间。SO或internet中的所有问题似乎都只适用于一个拉链在另一个拉链中的“浅”结构。但是,我有一个zip存档,它的结构大致如下: input.zip/ --1.zip/ --文件夹/ ----2.zip/ ------3.zip/ --------试验/ ----------其他文件夹/ ----------archive.gz/ ------------要分析的过滤器 ----------file-to-parse3.txt ------fil

在编辑3中解决的问题

我为这个问题挣扎了一段时间。SO或internet中的所有问题似乎都只适用于一个拉链在另一个拉链中的“浅”结构。但是,我有一个zip存档,它的结构大致如下:

input.zip/
--1.zip/
--文件夹/
----2.zip/
------3.zip/
--------试验/
----------其他文件夹/
----------archive.gz/
------------要分析的过滤器
----------file-to-parse3.txt
------file-to-parse.txt
--4.zip/
------文件夹/
以此类推,我的代码需要处理N级的zip,同时保留原始的zip、gzip、文件夹和文件结构。由于缺乏特权,禁止使用临时文件(这是我不愿意更改的)

这是我到目前为止编写的代码,但是
zipoutpstream
似乎只在一个(顶层)级别上运行-如果目录中的文件/dir名称与线程“main”java.util.zip.ZipException中的
异常完全相同,它会在线程“main”java.util.zip.ZipException:duplicate entry:folder/
中引发
异常。它还跳过空目录(这不是预期的)。我想要实现的是将我的
ZipOutputStream
移动到“较低”级别,并对每个zip执行操作。也许有更好的方法来处理所有这些问题,任何帮助都将不胜感激。我需要在以后执行某些文本提取/修改,但是在读/写整个结构无法正常工作之前,我不会启动它。提前感谢您的帮助

    //constructor
private final File zipFile;

ArchiveResolver(String fileToHandle) {
    this.zipFile = new File(Objects.requireNonNull(getClass().getClassLoader().getResource(fileToHandle)).getFile());
}

void resolveInputFile() throws Exception {
    FileInputStream fileInputStream = new FileInputStream(this.zipFile);
    FileOutputStream fileOutputStream = new FileOutputStream("out.zip");
    ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
    ZipInputStream zipInputStream = new ZipInputStream(fileInputStream);

    zip(zipInputStream, zipOutputStream);

    zipInputStream.close();
    zipOutputStream.close();
}

//    this one doesn't preserve internal structure(empty folders), but can work on each file
private void zip(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
    ZipEntry entry;
    while ((entry = zipInputStream.getNextEntry()) != null) {
        System.out.println(entry.getName());
        byte[] buffer = new byte[1024];
        int length;
        if (entry.getName().endsWith(".zip")) {
//              wrapping outer zip streams to inner streams making actual entries a new source
            ZipInputStream innerZipInputStream = new ZipInputStream(zipInputStream);
            ZipOutputStream innerZipOutputStream = new ZipOutputStream(zipOutputStream);

            ZipEntry zipEntry = new ZipEntry(entry.getName());
//              add new zip entry here to outer zipOutputStream: i.e. data.zip
            zipOutputStream.putNextEntry(zipEntry);

//              now treat this data.zip as parent and call recursively zipFolder on it
            zip(innerZipInputStream, innerZipOutputStream);

//              Finish internal stream work when innerZipOutput is done
            innerZipOutputStream.finish();

//              Close entry
            zipOutputStream.closeEntry();
        } else if (entry.isDirectory()) {
//              putting new zip entry into output stream and adding extra '/' to make
//              sure zipOutputStream will treat it as folder
            ZipEntry zipEntry = new ZipEntry(entry.getName() + "/");

//              this only should preserve internal structure
            zipOutputStream.putNextEntry(zipEntry);

//              reading everything from zipInputStream
            while ((length = zipInputStream.read(buffer)) > 0) {
//                  sending it straight to zipOutputStream
                zipOutputStream.write(buffer, 0, length);
            }

            zipOutputStream.closeEntry();

//              This else will include checking if file is respectively:
//              .gz file <- then open it, read from file inside, modify and save it
//              .txt file <- also read, modify and preserve
        } else {
//              create new entry on top of this
            ZipEntry zipEntry = new ZipEntry(entry.getName());
            zipOutputStream.putNextEntry(zipEntry);
            while ((length = zipInputStream.read(buffer)) > 0) {
                zipOutputStream.write(buffer, 0, length);
            }
            zipOutputStream.closeEntry();
        }
    }
}

//    This one preserves internal structure (empty folders and so)
//    BUT! no work on each file is possible it just preserves everything as it is
private void zipWhole(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
    ZipEntry entry;
    while ((entry = zipInputStream.getNextEntry()) != null) {
        System.out.println(entry.getName());
        byte[] buffer = new byte[1024];
        int length;
        zipOutputStream.putNextEntry(new ZipEntry(entry.getName()));
        while ((length = zipInputStream.read(buffer)) > 0) {
            zipOutputStream.write(buffer, 0, length);
        }
        zipOutputStream.closeEntry();
    }
}

再次感谢所有的帮助和建议

那里似乎有很多调试和重构工作要做

一个明显的问题是,您要么没有关闭流/条目,要么关闭顺序错误。缓冲数据将丢失,且中心目录未写入。(有一个复杂的情况,Java流无法帮助关闭它们所包装的流,因此有
finish
vs
close
,但仍然需要按照正确的顺序完成)

Zip文件没有目录的表示形式,因为它们具有平面结构-本地头和中心目录中的每个条目都包含整个文件路径


Java zip库中提供随机访问接口的部分使用内存映射文件,因此除了顶层之外,其他所有方面都需要使用流。

这里似乎需要进行大量调试和重构

一个明显的问题是,您要么没有关闭流/条目,要么关闭顺序错误。缓冲数据将丢失,且中心目录未写入。(有一个复杂的情况,Java流无法帮助关闭它们所包装的流,因此有
finish
vs
close
,但仍然需要按照正确的顺序完成)

Zip文件没有目录的表示形式,因为它们具有平面结构-本地头和中心目录中的每个条目都包含整个文件路径


Java zip库中提供随机访问接口的部分使用内存映射文件,因此除了顶层之外,其他所有内容都需要使用流。

是的,我知道我做了很多错事。。。我打赌这可能是无效的流关闭,但当我主要关注关闭每个条目时,我被大量的
无效条目大小(预期为107,但有105个字节)
等等。我将对流中的完成/关闭进行一些阅读,也许我会发现一些东西。到目前为止,我知道它们是
finish
关闭某些内部入口压缩流,而
close
。。。井关小河。谢谢你的提示however@MacRyze这些错误可能很重要。不知道他们是从哪里来的。您每次似乎都在创建一个新的
ZipEntry
——如果您使用的是同一个,那么我猜大小可能是错误的。(我还注意到,在处理zip时,JDK13中引入了未压缩数据,但进行了后端口处理——我不知道细节。)感谢您的评论,在整个实现过程中,我看到了一些错误,甚至存在巨大的条目大小差异,如
预期的2512个,但得到了0个字节,这取决于具体情况,我想这毕竟还是我的错。顺便说一句,我使用的是OpenJDK 11.0.4,JVM参数为-Xmx8M(我尝试过更改可用内存,但不是这样)。是的,我意识到我用了很多错误的方法。。。我打赌这可能是无效的流关闭,但当我主要关注关闭每个条目时,我被大量的
无效条目大小(预期为107,但有105个字节)
等等。我将对流中的完成/关闭进行一些阅读,也许我会发现一些东西。到目前为止,我知道它们是
finish
关闭某些内部入口压缩流,而
close
。。。井关小河。谢谢你的提示however@MacRyze这些错误可能很重要。不知道他们是从哪里来的。您每次似乎都在创建一个新的
ZipEntry
——如果您使用的是同一个,那么我猜大小可能是错误的。(我还注意到,在处理zip时,JDK13中引入了未压缩数据,但进行了后端口处理——我不知道细节。)感谢您的评论,在整个实现过程中,我看到了一些错误,甚至存在巨大的条目大小差异,如
预期的2512个,但得到了0个字节,这取决于具体情况,我想这毕竟还是我的错。顺便说一句,我使用的是OpenJDK 11.0.4,JVM参数为-Xmx8M(我尝试过更改可用内存,但不是这种情况)。您需要
innerZipoutStream.finish()
ZipoutStream.closeEntry()
之前将
ZipoutStream
包装在
GzipoutStream
中,除非您打算解压缩内容而不删除
    void resolveInputFile() throws IOException {
    FileInputStream fileInputStream = new FileInputStream(this.zipFile);
    FileOutputStream fileOutputStream = new FileOutputStream("in.zip");
    ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
    ZipInputStream zipInputStream = new ZipInputStream(fileInputStream);

    zip(zipInputStream, zipOutputStream);

    zipInputStream.close();
    zipOutputStream.close();
}

    private void zip(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
    ZipEntry entry;
    while ((entry = zipInputStream.getNextEntry()) != null) {
        logger.info(entry.getName());

        if (entry.getName().endsWith(".zip")) {
            // If entry is zip, I create inner zip streams that wrap outer ones
            ZipInputStream innerZipInputStream = new ZipInputStream(zipInputStream);
            ZipOutputStream innerZipOutputStream = new ZipOutputStream(zipOutputStream);

            ZipEntry zipEntry = new ZipEntry(entry.getName());
            zipOutputStream.putNextEntry(zipEntry);

            zip(innerZipInputStream, innerZipOutputStream);
            //As mentioned in comments, proper streams needs to be properly closed/finished, I'm done writing to inner stream so I call finish() rather than close() which closes outer stream
            innerZipOutputStream.finish();
            zipOutputStream.closeEntry();

        } else if (entry.getName().endsWith(".gz")) {

            GZIPInputStream gzipInputStream = new GZIPInputStream(zipInputStream);
            //small trap while using GZIP - to save it properly I needed to put new ZipEntry to outerZipOutputStream BEFORE creating GZIPOutputStream wrapper
            ZipEntry zipEntry = new ZipEntry(entry.getName());
            zipOutputStream.putNextEntry(zipEntry);
            GZIPOutputStream gzipOutputStream = new GZIPOutputStream(zipOutputStream);
            //To make it as as much efficient as possible I've used BufferedReader
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));

            long start = System.nanoTime();
            logger.info("Started to process {}", zipEntry.getName());

            String line;
            while ((line = bufferedReader.readLine()) != null) {

                //PROCESSING LINE BY LINE...

                zipOutputStream.write((line + "\n").getBytes());
            }

            logger.info("Processing of {} took {} miliseconds", entry.getName() ,(System.nanoTime() - start) / 1_000_000);
            gzipOutputStream.finish();
            zipOutputStream.closeEntry();

        } else if (entry.getName().endsWith(".txt")) {

            ZipEntry zipEntry = new ZipEntry(entry.getName());
            zipOutputStream.putNextEntry(zipEntry);
            BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(zipInputStream));

            long start = System.nanoTime();
            logger.info("Started to process {}", zipEntry.getName());

            String line;
            while ((line = bufferedReader.readLine()) != null) {

                //PROCESSING LINE BY LINE...

                zipOutputStream.write((line + "\n").getBytes());
            }

            logger.info("Processing of {} took {} miliseconds", entry.getName() ,(System.nanoTime() - start) / 1_000_000);
            zipOutputStream.closeEntry();

        } else if (entry.isDirectory()) {
            //Standard directory preserving
            byte[] buffer = new byte[8192];
            int length;
            // Adding extra "/" to make sure it's dir
            ZipEntry zipEntry = new ZipEntry(entry.getName() + "/");
            zipOutputStream.putNextEntry(zipEntry);
            while ((length = zipInputStream.read(buffer)) > 0) {
                // sending it straight to zipOutputStream
                zipOutputStream.write(buffer, 0, length);
            }

            zipOutputStream.closeEntry();
        } else {
            //In my case it probably will never be called but if there's some different file in here it will be preserved unchanged in the output file
            byte[] buffer = new byte[8192];
            int length;
            ZipEntry zipEntry = new ZipEntry(entry.getName());
            zipOutputStream.putNextEntry(zipEntry);
            while ((length = zipInputStream.read(buffer)) > 0) {
                zipOutputStream.write(buffer, 0, length);
            }
            zipOutputStream.closeEntry();
        }
    }
}