Lucene问题在连字符字段中搜索

Lucene问题在连字符字段中搜索,lucene,lucene.net,Lucene,Lucene.net,我和Lucene之间有些问题让我发疯。我有以下字段: doc.Add(new Field("cataloguenumber", i.CatalogueNumber.ToLower(), Field.Store.YES, Field.Index.ANALYZED)); 它将包含一个目录号,看起来像这样: DF-GH5 DF-FJ4 狗 AC-DP AC-123 AC-DOCO i、 e.两个字符后跟连字符,后跟2-5个字母数字字符 我正在尝试运行布尔查询,以允许用户搜索数据: // spec

我和Lucene之间有些问题让我发疯。我有以下字段:

doc.Add(new Field("cataloguenumber", i.CatalogueNumber.ToLower(), Field.Store.YES, Field.Index.ANALYZED));
它将包含一个目录号,看起来像这样:

  • DF-GH5
  • DF-FJ4
  • AC-DP
  • AC-123
  • AC-DOCO
i、 e.两个字符后跟连字符,后跟2-5个字母数字字符

我正在尝试运行布尔查询,以允许用户搜索数据:

// specify the search fields, lucene search in multiple fields
        string[] searchfields = new string[] { "cataloguenumber", "title", "author", "categories", "year", "length", "keyword", "description" };

        // Making a boolean query for searching and get the searched hits                
        BooleanQuery mainQuery = new BooleanQuery();
        QueryParser parser;

        //Add filter for main keyword
        parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, searchfields, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30));
        parser.AllowLeadingWildcard = true;
        mainQuery.Add(parser.Parse(GetMainSearchQueryString(SearchPhrase)), Occur.MUST);
该系统在除CatalogEnumber之外的所有字段都工作正常,CatalogEnumber由于任何原因根本不工作

理想情况下,我们希望能够通过完整或部分CatalogEnumber进行搜索,因此,例如“DF-”应该返回前缀为DF的所有项目

有人知道我怎样才能做到这一点吗

非常感谢


Olly

问题的一个常见来源是在索引时间和查询时间上使用不同的分析器。使用
StandardAnalyzer
,您应该能够获得良好的结果-它将文本
DF-GH5
视为单个标记,因此您可以使用fx
DF-GH5
DF-*
进行搜索,但请确保将其用于
IndexWriter
QueryParser

public static void Test()
{
    // Use an in-memory index.
    RAMDirectory indexDirectory = new RAMDirectory();

    // Make sure to use the same analyzer for indexing 
    Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);

    // Add single document to the index.
    using (IndexWriter writer = new IndexWriter(indexDirectory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
    {
        Document document = new Document();
        document.Add(new Field("content", "This is just some text", Field.Store.YES, Field.Index.ANALYZED));
        document.Add(new Field("cataloguenumber", "DF-GH5", Field.Store.YES, Field.Index.ANALYZED));

        writer.AddDocument(document);
    }

    var parser = new MultiFieldQueryParser(
        Lucene.Net.Util.Version.LUCENE_30,
        new[] { "cataloguenumber", "content" },
        analyzer);

    var searcher = new IndexSearcher(indexDirectory);

    DoSearch("df-gh5", parser, searcher);
    DoSearch("df-*", parser, searcher);
}

private static void DoSearch(string queryString, MultiFieldQueryParser parser, IndexSearcher searcher)
{
    var query = parser.Parse(queryString);

    TopDocs docs = searcher.Search(query, 10);

    foreach (ScoreDoc scoreDoc in docs.ScoreDocs)
    {
        Document searchHit = searcher.Doc(scoreDoc.Doc);
        string cataloguenumber = searchHit.GetValues("cataloguenumber").FirstOrDefault();
        string content = searchHit.GetValues("content").FirstOrDefault();
        Console.WriteLine($"Found object: {cataloguenumber} {content}");
    }
}
下面是一个简单的示例,它使用单个文档构建内存索引,并尝试通过
catalogenumber
查询索引

public static void Test()
{
    // Use an in-memory index.
    RAMDirectory indexDirectory = new RAMDirectory();

    // Make sure to use the same analyzer for indexing 
    Analyzer analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);

    // Add single document to the index.
    using (IndexWriter writer = new IndexWriter(indexDirectory, analyzer, IndexWriter.MaxFieldLength.UNLIMITED))
    {
        Document document = new Document();
        document.Add(new Field("content", "This is just some text", Field.Store.YES, Field.Index.ANALYZED));
        document.Add(new Field("cataloguenumber", "DF-GH5", Field.Store.YES, Field.Index.ANALYZED));

        writer.AddDocument(document);
    }

    var parser = new MultiFieldQueryParser(
        Lucene.Net.Util.Version.LUCENE_30,
        new[] { "cataloguenumber", "content" },
        analyzer);

    var searcher = new IndexSearcher(indexDirectory);

    DoSearch("df-gh5", parser, searcher);
    DoSearch("df-*", parser, searcher);
}

private static void DoSearch(string queryString, MultiFieldQueryParser parser, IndexSearcher searcher)
{
    var query = parser.Parse(queryString);

    TopDocs docs = searcher.Search(query, 10);

    foreach (ScoreDoc scoreDoc in docs.ScoreDocs)
    {
        Document searchHit = searcher.Doc(scoreDoc.Doc);
        string cataloguenumber = searchHit.GetValues("cataloguenumber").FirstOrDefault();
        string content = searchHit.GetValues("content").FirstOrDefault();
        Console.WriteLine($"Found object: {cataloguenumber} {content}");
    }
}

值得补充的是,我知道这个字段正在被索引(以某种形式),因为我已经打开了_mcd.cfs文件,并且我可以看到其中的一些目录号像一个符咒一样工作!!非常感谢你!