Java can';t使用lucene IndexWriter删除文档。删除文档(术语)

Java can';t使用lucene IndexWriter删除文档。删除文档(术语),java,lucene,Java,Lucene,已经挣扎了两天了,就是不能用indexWriter.deleteDocuments(term) 在这里,我将把代码用于测试,希望有人能指出我做错了什么,尝试过的事情: 将lucene版本从2.x更新为5.x 使用indexWriter.deleteDocuments()而不是indexReader.deleteDocuments() 字符串索引选项配置为NONE或DOCS\u和\u FREQS\u和\u POSITIONS\u和\u offset 代码如下: import org.apache.

已经挣扎了两天了,就是不能用
indexWriter.deleteDocuments(term)

在这里,我将把代码用于测试,希望有人能指出我做错了什么,尝试过的事情:

  • 将lucene版本从
    2.x
    更新为
    5.x
  • 使用
    indexWriter.deleteDocuments()
    而不是
    indexReader.deleteDocuments()
  • 字符串
    索引选项
    配置为
    NONE
    DOCS\u和\u FREQS\u和\u POSITIONS\u和\u offset
  • 代码如下:

    import org.apache.lucene.analysis.core.SimpleAnalyzer;
    import org.apache.lucene.document.Document;
    import org.apache.lucene.document.Field;
    import org.apache.lucene.document.FieldType;
    import org.apache.lucene.index.*;
    import org.apache.lucene.queryparser.classic.ParseException;
    import org.apache.lucene.queryparser.classic.QueryParser;
    import org.apache.lucene.search.IndexSearcher;
    import org.apache.lucene.search.Query;
    import org.apache.lucene.search.ScoreDoc;
    import org.apache.lucene.store.Directory;
    import org.apache.lucene.store.FSDirectory;
    
    import java.io.IOException;
    import java.nio.file.Paths;
    
    public class TestSearch {
        static SimpleAnalyzer analyzer = new SimpleAnalyzer();
    
        public static void main(String[] argvs) throws IOException, ParseException {
            generateIndex("5836962b0293a47b09d345f1");
            query("5836962b0293a47b09d345f1");
            delete("5836962b0293a47b09d345f1");
            query("5836962b0293a47b09d345f1");
    
        }
    
        public static void generateIndex(String id) throws IOException {
            Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
            IndexWriterConfig config = new IndexWriterConfig(analyzer);
            IndexWriter iwriter = new IndexWriter(directory, config);
            FieldType fieldType = new FieldType();
            fieldType.setStored(true);
            fieldType.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS);
            Field idField = new Field("_id", id, fieldType);
            Document doc = new Document();
            doc.add(idField);
            iwriter.addDocument(doc);
            iwriter.close();
    
        }
    
        public static void query(String id) throws ParseException, IOException {
            Query query = new QueryParser("_id", analyzer).parse(id);
            Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
            IndexReader ireader  = DirectoryReader.open(directory);
            IndexSearcher isearcher = new IndexSearcher(ireader);
            ScoreDoc[] scoreDoc = isearcher.search(query, 100).scoreDocs;
            for(ScoreDoc scdoc: scoreDoc){
                Document doc = isearcher.doc(scdoc.doc);
                System.out.println(doc.get("_id"));
            }
        }
    
        public static void delete(String id){
            try {
                 Directory directory = FSDirectory.open(Paths.get("/tmp/test/lucene"));
                IndexWriterConfig config = new IndexWriterConfig(analyzer);
                IndexWriter iwriter = new IndexWriter(directory, config);
                Term term = new Term("_id", id);
                iwriter.deleteDocuments(term);
                iwriter.commit();
                iwriter.close();
            }catch (IOException e){
                e.printStackTrace();
            }
        }
    }
    
    首先,
    generateIndex()
    将在
    /tmp/test/lucene
    中生成索引,
    query()
    将显示
    id
    将被成功查询,然后
    delete()
    有望删除文档,但
    query()
    将再次证明删除操作失败

    下面是pom依赖项,以防有人需要进行测试

        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>5.5.4</version>
            <type>jar</type>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>5.5.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>5.5.4</version>
        </dependency>
        <dependency>
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-smartcn</artifactId>
            <version>5.5.4</version>
        </dependency>
    
    
    org.apache.lucene
    lucene岩芯
    5.5.4
    罐子
    org.apache.lucene
    lucene分析仪通用
    5.5.4
    org.apache.lucene
    卢克尼探险家
    5.5.4
    org.apache.lucene
    lucene分析仪smartcn
    5.5.4
    

    急切地想得到答案。

    您的问题出在分析仪上
    SimpleAnalyzer
    将标记定义为最大的字母字符串(
    StandardAnalyzer
    ,甚至是
    WhitespaceAnalyzer
    ,都是更典型的选择),因此索引的值被分为标记:“b”、“a”、“b”、“d”、“f”。但是,您定义的delete方法不会通过分析器,而是创建一个原始术语。如果您尝试将
    main
    替换为以下内容,则可以看到此操作:

    generateIndex("5836962b0293a47b09d345f1");
    query("5836962b0293a47b09d345f1");
    delete("b");
    query("5836962b0293a47b09d345f1");
    
    一般来说,查询和术语等不进行分析,QueryParser进行分析

    对于(看起来像)标识符字段,您可能根本不想分析该字段。在这种情况下,将其添加到FieldType:

    fieldType.setTokenized(false);
    
    然后,您必须更改查询(同样,QueryParser分析),并改用
    TermQuery

    Query query = new TermQuery(new Term("_id", id));
    

    感谢您不仅提供了正确的解决方案,而且还提示了如何执行
    TermQuery