Java Lucene:点后的通配符与数字不匹配_Java_Lucene

Java Lucene:点后的通配符与数字不匹配

java lucene

Java Lucene:点后的通配符与数字不匹配,java,lucene,Java,Lucene,我最近从Lucene 3升级到了Lucene 6，在v6中我发现通配符？不再匹配点后面的数字。下面是一个例子：要匹配的字符串：a.1a 查询：a.？a 在本例中，查询匹配Lucene 3中的字符串，但不匹配Lucene 6中的字符串。另一方面，查询a*在Lucene 3和6中都匹配。进一步的测试表明，只有当一个点后面跟着一个数字时，才会出现这种行为差异。顺便说一下，我在Lucene 3和6中都使用了StandardAnalyzer 有人知道这里发生了什么吗？如何恢复Lucene 3的行为，或者

我最近从Lucene 3升级到了Lucene 6，在v6中我发现通配符

？

不再匹配点后面的数字。下面是一个例子：

要匹配的字符串：

a.1a

查询：

a.？a

在本例中，查询匹配Lucene 3中的字符串，但不匹配Lucene 6中的字符串。另一方面，查询

a*

在Lucene 3和6中都匹配。进一步的测试表明，只有当一个点后面跟着一个数字时，才会出现这种行为差异。顺便说一下，我在Lucene 3和6中都使用了

StandardAnalyzer

有人知道这里发生了什么吗？如何恢复Lucene 3的行为，或者，调整我的Lucene 6查询，使其与Lucene 3的查询等效

更新

Lucene 6.6代码片段（按要求）

public List<ResultDocument> search(String queryString)
        throws SearchException, CheckedOutOfMemoryError {
    stopped =false;

    QueryWrapper queryWrapper = createQuery(queryString);
    Query query = queryWrapper.query;
    boolean isPhraseQuery = queryWrapper.isPhraseQuery;

    readLock.lock();
    try {
        checkIndexesExist();

        DelegatingCollector collector= new DelegatingCollector(){
            @Override
            public void collect(int doc) throws IOException {
                leafDelegate.collect(doc);
                if(stopped){
                    throw new StoppedSearcherException();
                }
            }
        };
        collector.setDelegate(TopScoreDocCollector.create(MAX_RESULTS, null));
        try{
            luceneSearcher.search(query, collector);
        }
        catch (StoppedSearcherException e){}
        ScoreDoc[] scoreDocs = ((TopScoreDocCollector)collector.getDelegate()).topDocs().scoreDocs;

        ResultDocument[] results = new ResultDocument[scoreDocs.length];
        for (int i = 0; i < scoreDocs.length; i++) {
            Document doc = luceneSearcher.doc(scoreDocs[i].doc);
            float score = scoreDocs[i].score;
            LuceneIndex index = indexes.get(((DecoratedMultiReader) luceneSearcher.getIndexReader()).decoratedReaderIndex(i));
            IndexingConfig config = index.getConfig();
            results[i] = new ResultDocument(
                doc, score, query, isPhraseQuery, config, fileFactory,
                outlookMailFactory);
        }
        return Arrays.asList(results);
    }
    catch (IllegalArgumentException e) {
        throw wrapEmptyIndexException(e);
    }
    catch (IOException e) {
        throw new SearchException(e.getMessage());
    }
    catch (OutOfMemoryError e) {
        throw new CheckedOutOfMemoryError(e);
    }
    finally {
        readLock.unlock();
    }
}

更多代码：

public final class PhraseDetectingQueryParser extends QueryParser {

    /*
     * This class is used for determining whether the parsed query is supported
     * by the fast-vector highlighter. The latter only supports queries that are
     * a combination of TermQuery, PhraseQuery and/or BooleanQuery.
     */

    private boolean isPhraseQuery = true;

    public PhraseDetectingQueryParser(  String defaultField,
                                        Analyzer analyzer) {
        super(defaultField, analyzer);
    }

    public boolean isPhraseQuery() {
        return isPhraseQuery;
    }

    protected Query newFuzzyQuery(  Term term,
                                    float minimumSimilarity,
                                    int prefixLength) {
        isPhraseQuery = false;
        return super.newFuzzyQuery(term, minimumSimilarity, prefixLength);
    }

    protected Query newMatchAllDocsQuery() {
        isPhraseQuery = false;
        return super.newMatchAllDocsQuery();
    }

    protected Query newPrefixQuery(Term prefix) {
        isPhraseQuery = false;
        return super.newPrefixQuery(prefix);
    }

    protected Query newWildcardQuery(org.apache.lucene.index.Term t) {
        isPhraseQuery = false;
        return super.newWildcardQuery(t);
    }

}

StandardAnalyzer在该期间将输入拆分为术语（除非其两侧有字母或数字）。所以它将其分为两个术语：“a”和“1a”

由于您使用的是通配符查询，因此在查询端没有进行任何分析，因此没有进行标记化，索引中也没有与查询匹配的任何术语。如果您要搜索“1a”，没有通配符或任何东西，您应该会找到该文档。

您可以显示您在lucene 6中运行的lucene查询吗？@Mystion:我在我的帖子中添加了相关代码。我的意思是，最感兴趣的部分是createQuery（）@Mystion:好的，我添加了更多代码。我猜查询中有太多古怪的东西…？不幸的是，你错了，有一个特殊的解析器，它可以对通配符查询进行分析well@Mysterion-你的意思是：我对他们看到的行为的解释是正确的，你有另一种解决办法。此外，80%的人确信这对所述案例实际上不起作用。不测试，因为该解析器在lucene 7中已被删除，但它不太可能将

a.？a

拆分为多项查询。我特别反对声明不能分析通配符查询，因为这不是真的。在Lucene 7中，您可以从QueryParserBase@Mysterion-好的，但是你说你有一个解决方案，当你说的解决方案实际上不能解决这个问题时。解析器消失的原因是规范化被添加到了分析器中。基本上，它只对规范化有用，而不适用于完全标记化和分析，这正是您给出的解决方案所需要的。@pythondude-最有可能的解决方案是重新思考您的分析。我没有信息告诉你正确的使用方法是什么。如果您试图搜索某种实际上根本不应该标记的标识符，那么请尝试使用StringField。如果你真的只想重新使用旧的分析器，可以使用Classicalyzer，不过要记住，你是在用更好、更精细的分析重新处理这个查询。

public final class PhraseDetectingQueryParser extends QueryParser {

    /*
     * This class is used for determining whether the parsed query is supported
     * by the fast-vector highlighter. The latter only supports queries that are
     * a combination of TermQuery, PhraseQuery and/or BooleanQuery.
     */

    private boolean isPhraseQuery = true;

    public PhraseDetectingQueryParser(  String defaultField,
                                        Analyzer analyzer) {
        super(defaultField, analyzer);
    }

    public boolean isPhraseQuery() {
        return isPhraseQuery;
    }

    protected Query newFuzzyQuery(  Term term,
                                    float minimumSimilarity,
                                    int prefixLength) {
        isPhraseQuery = false;
        return super.newFuzzyQuery(term, minimumSimilarity, prefixLength);
    }

    protected Query newMatchAllDocsQuery() {
        isPhraseQuery = false;
        return super.newMatchAllDocsQuery();
    }

    protected Query newPrefixQuery(Term prefix) {
        isPhraseQuery = false;
        return super.newPrefixQuery(prefix);
    }

    protected Query newWildcardQuery(org.apache.lucene.index.Term t) {
        isPhraseQuery = false;
        return super.newWildcardQuery(t);
    }

}