Java从n嵌套zip读取、修改和写入新zip,保留原始结构
在编辑3中解决的问题 我为这个问题挣扎了一段时间。SO或internet中的所有问题似乎都只适用于一个拉链在另一个拉链中的“浅”结构。但是,我有一个zip存档,它的结构大致如下:Java从n嵌套zip读取、修改和写入新zip,保留原始结构,java,zip,zipoutputstream,zipinputstream,Java,Zip,Zipoutputstream,Zipinputstream,在编辑3中解决的问题 我为这个问题挣扎了一段时间。SO或internet中的所有问题似乎都只适用于一个拉链在另一个拉链中的“浅”结构。但是,我有一个zip存档,它的结构大致如下: input.zip/ --1.zip/ --文件夹/ ----2.zip/ ------3.zip/ --------试验/ ----------其他文件夹/ ----------archive.gz/ ------------要分析的过滤器 ----------file-to-parse3.txt ------fil
input.zip/
--1.zip/
--文件夹/
----2.zip/
------3.zip/
--------试验/
----------其他文件夹/
----------archive.gz/
------------要分析的过滤器
----------file-to-parse3.txt
------file-to-parse.txt
--4.zip/
------文件夹/
以此类推,我的代码需要处理N级的zip,同时保留原始的zip、gzip、文件夹和文件结构。由于缺乏特权,禁止使用临时文件(这是我不愿意更改的)
这是我到目前为止编写的代码,但是zipoutpstream
似乎只在一个(顶层)级别上运行-如果目录中的文件/dir名称与线程“main”java.util.zip.ZipException中的异常完全相同,它会在线程“main”java.util.zip.ZipException:duplicate entry:folder/
中引发异常。它还跳过空目录(这不是预期的)。我想要实现的是将我的ZipOutputStream
移动到“较低”级别,并对每个zip执行操作。也许有更好的方法来处理所有这些问题,任何帮助都将不胜感激。我需要在以后执行某些文本提取/修改,但是在读/写整个结构无法正常工作之前,我不会启动它。提前感谢您的帮助
//constructor
private final File zipFile;
ArchiveResolver(String fileToHandle) {
this.zipFile = new File(Objects.requireNonNull(getClass().getClassLoader().getResource(fileToHandle)).getFile());
}
void resolveInputFile() throws Exception {
FileInputStream fileInputStream = new FileInputStream(this.zipFile);
FileOutputStream fileOutputStream = new FileOutputStream("out.zip");
ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
ZipInputStream zipInputStream = new ZipInputStream(fileInputStream);
zip(zipInputStream, zipOutputStream);
zipInputStream.close();
zipOutputStream.close();
}
// this one doesn't preserve internal structure(empty folders), but can work on each file
private void zip(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
System.out.println(entry.getName());
byte[] buffer = new byte[1024];
int length;
if (entry.getName().endsWith(".zip")) {
// wrapping outer zip streams to inner streams making actual entries a new source
ZipInputStream innerZipInputStream = new ZipInputStream(zipInputStream);
ZipOutputStream innerZipOutputStream = new ZipOutputStream(zipOutputStream);
ZipEntry zipEntry = new ZipEntry(entry.getName());
// add new zip entry here to outer zipOutputStream: i.e. data.zip
zipOutputStream.putNextEntry(zipEntry);
// now treat this data.zip as parent and call recursively zipFolder on it
zip(innerZipInputStream, innerZipOutputStream);
// Finish internal stream work when innerZipOutput is done
innerZipOutputStream.finish();
// Close entry
zipOutputStream.closeEntry();
} else if (entry.isDirectory()) {
// putting new zip entry into output stream and adding extra '/' to make
// sure zipOutputStream will treat it as folder
ZipEntry zipEntry = new ZipEntry(entry.getName() + "/");
// this only should preserve internal structure
zipOutputStream.putNextEntry(zipEntry);
// reading everything from zipInputStream
while ((length = zipInputStream.read(buffer)) > 0) {
// sending it straight to zipOutputStream
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
// This else will include checking if file is respectively:
// .gz file <- then open it, read from file inside, modify and save it
// .txt file <- also read, modify and preserve
} else {
// create new entry on top of this
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
}
// This one preserves internal structure (empty folders and so)
// BUT! no work on each file is possible it just preserves everything as it is
private void zipWhole(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
System.out.println(entry.getName());
byte[] buffer = new byte[1024];
int length;
zipOutputStream.putNextEntry(new ZipEntry(entry.getName()));
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
再次感谢所有的帮助和建议 那里似乎有很多调试和重构工作要做
一个明显的问题是,您要么没有关闭流/条目,要么关闭顺序错误。缓冲数据将丢失,且中心目录未写入。(有一个复杂的情况,Java流无法帮助关闭它们所包装的流,因此有finish
vsclose
,但仍然需要按照正确的顺序完成)
Zip文件没有目录的表示形式,因为它们具有平面结构-本地头和中心目录中的每个条目都包含整个文件路径
Java zip库中提供随机访问接口的部分使用内存映射文件,因此除了顶层之外,其他所有方面都需要使用流。这里似乎需要进行大量调试和重构
一个明显的问题是,您要么没有关闭流/条目,要么关闭顺序错误。缓冲数据将丢失,且中心目录未写入。(有一个复杂的情况,Java流无法帮助关闭它们所包装的流,因此有finish
vsclose
,但仍然需要按照正确的顺序完成)
Zip文件没有目录的表示形式,因为它们具有平面结构-本地头和中心目录中的每个条目都包含整个文件路径
Java zip库中提供随机访问接口的部分使用内存映射文件,因此除了顶层之外,其他所有内容都需要使用流。是的,我知道我做了很多错事。。。我打赌这可能是无效的流关闭,但当我主要关注关闭每个条目时,我被大量的无效条目大小(预期为107,但有105个字节)
等等。我将对流中的完成/关闭进行一些阅读,也许我会发现一些东西。到目前为止,我知道它们是finish
关闭某些内部入口压缩流,而close
。。。井关小河。谢谢你的提示however@MacRyze这些错误可能很重要。不知道他们是从哪里来的。您每次似乎都在创建一个新的ZipEntry
——如果您使用的是同一个,那么我猜大小可能是错误的。(我还注意到,在处理zip时,JDK13中引入了未压缩数据,但进行了后端口处理——我不知道细节。)感谢您的评论,在整个实现过程中,我看到了一些错误,甚至存在巨大的条目大小差异,如预期的2512个,但得到了0个字节,这取决于具体情况,我想这毕竟还是我的错。顺便说一句,我使用的是OpenJDK 11.0.4,JVM参数为-Xmx8M(我尝试过更改可用内存,但不是这样)。是的,我意识到我用了很多错误的方法。。。我打赌这可能是无效的流关闭,但当我主要关注关闭每个条目时,我被大量的无效条目大小(预期为107,但有105个字节)
等等。我将对流中的完成/关闭进行一些阅读,也许我会发现一些东西。到目前为止,我知道它们是finish
关闭某些内部入口压缩流,而close
。。。井关小河。谢谢你的提示however@MacRyze这些错误可能很重要。不知道他们是从哪里来的。您每次似乎都在创建一个新的ZipEntry
——如果您使用的是同一个,那么我猜大小可能是错误的。(我还注意到,在处理zip时,JDK13中引入了未压缩数据,但进行了后端口处理——我不知道细节。)感谢您的评论,在整个实现过程中,我看到了一些错误,甚至存在巨大的条目大小差异,如预期的2512个,但得到了0个字节,这取决于具体情况,我想这毕竟还是我的错。顺便说一句,我使用的是OpenJDK 11.0.4,JVM参数为-Xmx8M(我尝试过更改可用内存,但不是这种情况)。您需要innerZipoutStream.finish()
在ZipoutStream.closeEntry()
之前将ZipoutStream
包装在GzipoutStream
中,除非您打算解压缩内容而不删除
void resolveInputFile() throws IOException {
FileInputStream fileInputStream = new FileInputStream(this.zipFile);
FileOutputStream fileOutputStream = new FileOutputStream("in.zip");
ZipOutputStream zipOutputStream = new ZipOutputStream(fileOutputStream);
ZipInputStream zipInputStream = new ZipInputStream(fileInputStream);
zip(zipInputStream, zipOutputStream);
zipInputStream.close();
zipOutputStream.close();
}
private void zip(ZipInputStream zipInputStream, ZipOutputStream zipOutputStream) throws IOException {
ZipEntry entry;
while ((entry = zipInputStream.getNextEntry()) != null) {
logger.info(entry.getName());
if (entry.getName().endsWith(".zip")) {
// If entry is zip, I create inner zip streams that wrap outer ones
ZipInputStream innerZipInputStream = new ZipInputStream(zipInputStream);
ZipOutputStream innerZipOutputStream = new ZipOutputStream(zipOutputStream);
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
zip(innerZipInputStream, innerZipOutputStream);
//As mentioned in comments, proper streams needs to be properly closed/finished, I'm done writing to inner stream so I call finish() rather than close() which closes outer stream
innerZipOutputStream.finish();
zipOutputStream.closeEntry();
} else if (entry.getName().endsWith(".gz")) {
GZIPInputStream gzipInputStream = new GZIPInputStream(zipInputStream);
//small trap while using GZIP - to save it properly I needed to put new ZipEntry to outerZipOutputStream BEFORE creating GZIPOutputStream wrapper
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
GZIPOutputStream gzipOutputStream = new GZIPOutputStream(zipOutputStream);
//To make it as as much efficient as possible I've used BufferedReader
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(gzipInputStream));
long start = System.nanoTime();
logger.info("Started to process {}", zipEntry.getName());
String line;
while ((line = bufferedReader.readLine()) != null) {
//PROCESSING LINE BY LINE...
zipOutputStream.write((line + "\n").getBytes());
}
logger.info("Processing of {} took {} miliseconds", entry.getName() ,(System.nanoTime() - start) / 1_000_000);
gzipOutputStream.finish();
zipOutputStream.closeEntry();
} else if (entry.getName().endsWith(".txt")) {
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(zipInputStream));
long start = System.nanoTime();
logger.info("Started to process {}", zipEntry.getName());
String line;
while ((line = bufferedReader.readLine()) != null) {
//PROCESSING LINE BY LINE...
zipOutputStream.write((line + "\n").getBytes());
}
logger.info("Processing of {} took {} miliseconds", entry.getName() ,(System.nanoTime() - start) / 1_000_000);
zipOutputStream.closeEntry();
} else if (entry.isDirectory()) {
//Standard directory preserving
byte[] buffer = new byte[8192];
int length;
// Adding extra "/" to make sure it's dir
ZipEntry zipEntry = new ZipEntry(entry.getName() + "/");
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
// sending it straight to zipOutputStream
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
} else {
//In my case it probably will never be called but if there's some different file in here it will be preserved unchanged in the output file
byte[] buffer = new byte[8192];
int length;
ZipEntry zipEntry = new ZipEntry(entry.getName());
zipOutputStream.putNextEntry(zipEntry);
while ((length = zipInputStream.read(buffer)) > 0) {
zipOutputStream.write(buffer, 0, length);
}
zipOutputStream.closeEntry();
}
}
}