Lucene Jackrabbit Oak Lucine索引和SQL2查询,用于txt和pdf格式的全文搜索

Lucene Jackrabbit Oak Lucine索引和SQL2查询,用于txt和pdf格式的全文搜索,lucene,jcr,jackrabbit,jackrabbit-oak,Lucene,Jcr,Jackrabbit,Jackrabbit Oak,我尝试使用Oak版本1.16.0在文件内容中实现全文搜索 试图创建索引,就像Oak文档中所说的那样,对所有属性进行索引 /oak:index/assetType - jcr:primaryType = "oak:QueryIndexDefinition" - type = "lucene" - compatVersion = 2 - async = "async" + indexRules - jcr:primaryType = "nt:unstructured"

我尝试使用Oak版本1.16.0在文件内容中实现全文搜索

试图创建索引,就像Oak文档中所说的那样,对所有属性进行索引

/oak:index/assetType
  - jcr:primaryType = "oak:QueryIndexDefinition"
  - type = "lucene"
  - compatVersion = 2
  - async = "async"
  + indexRules
    - jcr:primaryType = "nt:unstructured"
    + nt:base
      + properties
        - jcr:primaryType = "nt:unstructured"
        + allProps
          - name = ".*"
          - isRegexp = true
          - nodeScopeIndex = true
  • 创建索引。尝试了节点类型的不同组合。没用
  • publicstaticvoidcreateindex(存储库){
    会话=空;
    试一试{
    session=repository.login();
    Node root=session.getRootNode();
    Node index=root.getNode(“oak:index”);
    Node lucinindex=index.addNode(“assetType”、“oak:QueryIndexDefinition”);
    lucineIndex.setProperty(“兼容版本”,“2”);
    setProperty(“类型”、“lucene”);
    setProperty(“异步”、“异步”);
    Node rules=lucinendex.addNode(“索引规则”,“nt:非结构化”);
    Node base=rules.addNode(“nt:base”);
    节点属性=base.addNode(“属性”,“nt:非结构化”);
    Node allProps=properties.addNode(“allProps”);
    setProperty(“jcr:content”、“*”);
    allProps.setProperty(“isRegexp”,true);
    setProperty(“nodeScopeIndex”,true);
    session.save();
    }捕获(LoginException e){
    e、 printStackTrace();
    }捕获(存储异常e){
    e、 printStackTrace();
    }最后{
    session.logout();
    }
    }
    
  • 添加一些文件
  • public static void saveFileIfNotExist(字节[]rawFile,字符串文件名,字符串folderName,字符串mimeType,存储库){
    会话=空;
    试一试{
    session=repository.login(新的simpleredentials(“admin”,“admin.tocharray()));
    Node root=session.getRootNode();
    Binary Binary=session.getValueFactory().createBinary(新的ByteArrayInputStream(rawFile));
    如果(!root.hasNode(folderName)){
    System.out.println(“无文件夹”);
    Node folder=root.addNode(folderName,“nt:folder”);
    Node file=folder.addNode(文件名,“nt:file”);
    Node content=file.addNode(“jcr:content”、“nt:resource”);
    setProperty(“jcr:mimeType”,mimeType);
    setProperty(“jcr:data”,二进制);
    }否则{
    System.out.println(“文件夹存在”);
    }
    session.save();
    }
    捕获(存储异常e){
    e、 printStackTrace();
    }最后{
    session.logout();
    }
    }
    
    文件内容:

    An implementation of the Value interface must override the inherited method
    Object.equals(Object) so that, given Value instances V1 and V2,
    V1.equals(V2) will return true if.
    
  • 尝试搜索文件内容
  • DocumentNodeStore rdb=newdocumentnodestore(new-RDBDocumentNodeStoreBuilder().setRDBConnection(dataSource));
    Repository repo=new Jcr(new Oak(rdb)).with(new OpenSecurityProvider()).createRepository();
    创建指数(repo);
    byte[]rawFile=readBytes(“D:\\file.txt”);
    saveFileIfNotExist(rawFile,“txt\u文件夹”,“text\u文件”,“text/plain”,repo);
    会话=空;
    试一试{
    session=repo.login();
    Node root=session.getRootNode();
    Node index=root.getNode(“oak:index”);
    QueryManager QueryManager=session.getWorkspace().getQueryManager();session.getWorkspace().getQueryManager();
    Query Query=queryManager.createQuery(“从[nt:resource]中选择*作为s,其中包含(s.*,*so*)选项(遍历警告)”,Query.JCR_SQL2);
    QueryResult result=query.execute();
    RowIterator ri=result.getRows();
    while(ri.hasNext()){
    Row Row=ri.nextRow();
    System.out.println(“行:“+Row.toString());
    }
    }捕获(存储异常e){
    e、 printStackTrace();
    }
    最后{
    session.logout();
    ((RepositoryImpl)repo.shutdown();
    dispose();
    }
    
    但不会返回任何内容,并在日志中发出警告:

    2019-10-02 18:27:35,821 [main] WARN  QueryImpl - Traversal query (query without index): SELECT * FROM [nt:resource] AS s WHERE CONTAINS(s.*, '*so*') option(traversal warn); consider creating an index
    
  • 那么,如何在文件内容中建立适当的索引并提出正确的搜索请求呢
  • 如何在pdf文档中搜索

  • 我没有仔细检查所有的代码段,但似乎缺少的一件事是设置一个异步索引器(您的索引定义有
    async=“async”
    )。只是从我的头顶打字,但做一些类似的事情

    new Oak(rdb)).with(new OpenSecurityProvider().withAsyncIndexing("async", 5) // 5 is number seconds to define period at which async indexer would run
    

    顺便说一句,因为它是一个异步索引,所以在查询中显示结果之前,您需要等待一段时间。但是,即使结果没有显示,查询仍应提取您的索引。

    我没有仔细检查所有代码段,但似乎缺少的一点是设置一个异步索引器(您的索引def具有
    async=“async”
    )。只是从我的头顶打字,但做一些类似的事情

    new Oak(rdb)).with(new OpenSecurityProvider().withAsyncIndexing("async", 5) // 5 is number seconds to define period at which async indexer would run
    

    顺便说一句,因为它是一个异步索引,所以在查询中显示结果之前,您需要等待一段时间。但是,即使结果没有显示出来,查询仍应获取您的索引。

    谢谢。我添加了LuceneProvider
    LuceneIndexProvider=new LuceneIndexProvider();repository=new Jcr(new Oak(rdb)).with(new OpenSecurityProvider()).with(new lucenedexeditorprovider()).with((QueryIndexProvider)provider.).withAsyncIndexing(“async”,5).createRepository()
    ,并查看它是否尝试在日志中构建索引。但查询结果仍然为空,警告消息仍在日志中:谢谢。我添加了LuceneProvider
    LuceneIndexProvider=new LuceneIndexProvider();repository=new Jcr(new Oak(rdb)).with(new OpenSecurityProvider()).with(new LuceneIndexEditorProvider()).with((QueryIndexProvider)provider).with(AsyncIndexing)(“async”,5).createRepository()
    并查看它是否尝试