Java 如何从内存中的流式zip文件访问zipEntry_Java_Android_Zip_Compression_Skyepub

Java 如何从内存中的流式zip文件访问zipEntry

java android compression

Java 如何从内存中的流式zip文件访问zipEntry,java,android,zip,compression,skyepub,Java,Android,Zip,Compression,Skyepub,我目前正在实现一个Ereader库（），它要求我实现一个检查zipEntry是否存在的方法。在演示版中，解决方案很简单： public boolean isExists(String baseDirectory,String contentPath) { setupZipFile(baseDirectory,contentPath); if (this.isCustomFont(contentPath)) { String path = baseDirectory

我目前正在实现一个Ereader库（），它要求我实现一个检查zipEntry是否存在的方法。在演示版中，解决方案很简单：

public boolean isExists(String baseDirectory,String contentPath) {
    setupZipFile(baseDirectory,contentPath);
    if (this.isCustomFont(contentPath)) {
        String path = baseDirectory +"/"+ contentPath;
        File file = new File(path);
        return file.exists();
    }

    ZipEntry entry = this.getZipEntry(contentPath);
    if (entry==null) return false;
    else return true;       
}

// Entry name should start without / like META-INF/container.xml 

private ZipEntry getZipEntry(String contentPath) {

    if (zipFile==null) return null;

    String[] subDirs = contentPath.split(Pattern.quote(File.separator));

    String corePath = contentPath.replace(subDirs[1], "");

    corePath=corePath.replace("//", "");

    ZipEntry entry = zipFile.getEntry(corePath.replace(File.separatorChar, '/'));

    return entry;

}

如您所见，您可以使用

getZipEntry（contentPath）在O（1）时间内访问相关的ZipEntry
但是，在我的情况下，我无法直接从文件系统读取zipfile（出于安全原因，它必须从内存中读取）。。因此，我的ifExists
实现实际上一次只通过zip文件的一个条目，直到找到有问题的zipEntry，以下是相关部分：
try {
        final InputStream stream = dbUtil.getBookStream(bookEditionID);
        if( stream == null) return null;

        final ZipInputStream zip = new ZipInputStream(stream);

        ZipEntry entry;
        do {
            entry = zip.getNextEntry();
            if( entry == null) {
                zip.close();
                return null;
            }
        } while( !entry.getName().equals(zipEntryName));

    } catch( IOException e) {
        Log.e("demo", "Can't get content data for "+contentPath);
        return null;
    }

    return data;

因此，如果数据存在，ifExists
将返回true，如果为null，则返回false
问题:
有没有一种方法可以在O（1）时间而不是O（n）时间内从整个ZipInputStream中找到有问题的zip条目
相关的
见问题
然后回答。
如果存档的内容在内存中，那么它是可查找的，您可以搜索中心目录并自己使用。现在除了文件
之外，ZipFile和ApacheCommons Compress在其他任何方面都没有同等的工作，但是其他开源库可能（不确定）
Apache Commons Compress“ZipFile
中搜索中心目录并对其进行解析的代码应该很容易适应归档文件以字节[]形式提供的情况。事实上，有一个补丁还没有被应用，这可能会有帮助。
zip存档中的条目不能真正在O（1）时间内加载。如果我们看a的结构，它看起来是这样的：
  [local file header 1]
  [encryption header 1]
  [file data 1]
  [data descriptor 1]
  ... 
  [local file header n]
  [encryption header n]
  [file data n]
  [data descriptor n]
  [archive decryption header] 
  [archive extra data record] 
  [central directory header 1]
  .
  [central directory header n]
  [zip64 end of central directory record]
  [zip64 end of central directory locator] 
  [end of central directory record]

String myZipFile = "...";
byte[] bytes = readFile();
MemoryHeaderReader headerReader = new MemoryHeaderReader(RandomAccessStream.fromBytes(bytes));
ZipModel zipModel = headerReader.readAllHeaders();
FileHeader myFile = Zip4jUtil.getFileHeader(zipModel, myZipFile)
boolean fileIsPresent = myFile != null;

基本上，有一些压缩文件带有一些标题，外加一个“中心目录”，其中包含关于文件的所有元数据（中心目录标题）。找到条目的唯一有效方法是扫描中心目录（）：
…不能扫描ZIP文件顶部的条目，因为只有中央目录指定文件块的起始位置
由于中央目录头上没有索引，因此只能在O（n）
中获取一个条目，其中n
是存档中的文件数
更新：不幸的是，我所知道的所有处理流而不是文件的zip库都使用本地文件头并扫描整个流，包括内容。它们也不容易弯曲。避免扫描我找到的整个归档文件的唯一方法是自己修改一个库
更新2:我已出于您的目的，擅自修改了上述zip4j库。假设在字节数组中读取zip文件，并且添加了对zip4j版本1.3.2的依赖项，则可以使用和如下所示：
  [local file header 1]
  [encryption header 1]
  [file data 1]
  [data descriptor 1]
  ... 
  [local file header n]
  [encryption header n]
  [file data n]
  [data descriptor n]
  [archive decryption header] 
  [archive extra data record] 
  [central directory header 1]
  .
  [central directory header n]
  [zip64 end of central directory record]
  [zip64 end of central directory locator] 
  [end of central directory record]

String myZipFile = "...";
byte[] bytes = readFile();
MemoryHeaderReader headerReader = new MemoryHeaderReader(RandomAccessStream.fromBytes(bytes));
ZipModel zipModel = headerReader.readAllHeaders();
FileHeader myFile = Zip4jUtil.getFileHeader(zipModel, myZipFile)
boolean fileIsPresent = myFile != null;

它在O（entryCount）中工作，而不读取整个归档文件，这应该是相当快的。我还没有完全测试过它，但它应该会让您了解如何根据自己的目的调整zip4j
 从技术上讲，搜索总是O（n），其中n是zip文件中的条目数，因为您必须通过中心目录或本地标题进行线性搜索
您似乎暗示zip文件已完全加载到内存中。在这种情况下，最快的方法是在中央目录中搜索条目。如果找到它，该目录条目将指向本地头
如果您在同一个zip文件上进行大量搜索，那么您可以在O（n）时间内构建一个中心目录中名称的哈希表，然后使用该哈希表在大约O（1）时间内搜索给定名称。
什么是日志（1）时间？0? 你的意思是O（1）：？是什么让你认为它是O（1）？必须扫描文件，直到找到条目。这就是O（N）。为什么你还没有发布getZipEntry（）
？@EJP我刚刚添加了getZipEntry它是否足够让你检查一个zip条目是否存在，或者你是否也需要解压缩相应的文件？@Mifeet我只需要知道一个条目是否存在技术上，只有当你扫描中心目录时它才是O（n）。浏览本地头文件还取决于每个文件的大小。n是条目数。因此，无论是搜索本地头还是搜索中心目录，顺序都是n。如果第i个文件的压缩大小为2^i怎么办？然后您需要读取O（2^n）字节，而不是O（n），其中n是条目数。您假设需要处理所有字节。你没有。本地头具有压缩数据的长度，因此您可以跳到下一个本地头。（有一些流式zip文件在本地头中没有，但它们很少见。而且没有人为地将每个条目的大小增加一倍。）我不想燃烧，你有一个观点：）。假设长度存在于本地标头中，则可以在O（n）中搜索。不幸的是，简单地看一下Java ZipInputStream，它似乎没有能力在不阅读内容的情况下在条目之间跳过：/woah man，这看起来太棒了！你能在github repo或其他东西中上传修改过的zip4j吗？然后我就可以使用它了（如果还有一些粗糙的边缘，我会发送一个pull请求），你的意思是说在github上放一个带有我修改的整个zip4j分支，还是只放一个修改过的类？我问的原因是我不知道你会怎么用它。如果它是一个fork，除非您自己构建它，否则您将无法将其作为maven依赖项使用。依赖原始的zip4j并将我的两个新类添加到代码中的另一个包中不是更容易吗？嘿@Mifeet原来我排除了一个将一切都搞砸的需求。。我正在读一个加密文件。。（请参阅此问题中的更多详细信息）因此，我无法在内存中随机查找加密文件..：（我明白了，所以你最终还是读了整个文件