C# Lucene.net-can'；不要做多词搜索_C#_Lucene

C# Lucene.net-can'；不要做多词搜索

c# lucene

C# Lucene.net-can'；不要做多词搜索,c#,lucene,C#,Lucene,我已将以下文档存储在我的lucene索引中： { "id" : 1, "name": "John Smith" "description": "worker" "additionalData": "faster data" "attributes": "is_hired=not" }, { "id" : 2, "name": "Alan Smith" "description": "hired" "additionalData": "faster drive" "attributes": "is

我已将以下文档存储在我的lucene索引中：

{
"id" : 1,
"name": "John Smith"
"description": "worker"
"additionalData": "faster data"
"attributes": "is_hired=not"
},
{
"id" : 2,
"name": "Alan Smith"
"description": "hired"
"additionalData": "faster drive"
"attributes": "is_hired=not"
},
{
"id" : 3,
"name": "Mike Std"
"description": "hired"
"additionalData": "faster check"
"attributes": "is_hired=not"
}

现在我想搜索所有字段，检查给定值是否存在：

search term: "John data check"

这会让我返回ID为1和3的文档。但事实并非如此，为什么

var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);

BooleanQuery mainQuery = new BooleanQuery();
mainQuery.MinimumNumberShouldMatch = 1;

var cols = new string[] {
                         "name",
                         "additionalData"
                        };

 string[] words = searchData.text.Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries);

 var queryParser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, cols, analyzer);

 foreach (var word in words)
 {
    BooleanQuery innerQuery = new BooleanQuery();
    innerQuery.MinimumNumberShouldMatch = 1;

    innerQuery.Add(queryParser.Parse(word), Occur.SHOULD);

    mainQuery.Add(innerQuery, Occur.MUST);
 }

 TopDocs hits = searcher.Search(mainQuery, null, int.MaxValue, Sort.RELEVANCE);

 //hits.TotalHits is 0 !!

您构造的查询基本上需要匹配所有三个单词

在

BooleanQuery

中用

SHOULD

子句包装每个单词。这相当于直接使用内部查询（只需添加一个不改变查询行为的间接寻址）。布尔查询只有一个子句，该子句应与布尔查询匹配

然后，在另一个布尔查询中封装它们中的每一个，这次为每一个都使用

MUST

子句。这意味着每个子句必须匹配，查询才能匹配

对于要匹配的

布尔查询

，必须满足所有

子句，如果没有，则至少必须满足最小数shouldMatch
子句。将该属性保留为其默认值，因为记录的行为是：
默认情况下，匹配不需要可选子句（除非没有必需子句）
实际上，您的查询是（为了简单起见，假设没有MultiFieldQueryParser
）：
或者，以树的形式：
BooleanQuery
    MUST: BooleanQuery
        SHOULD: TermQuery: john
    MUST: BooleanQuery
        SHOULD: TermQuery: data
    MUST: BooleanQuery
        SHOULD: TermQuery: check

可简化为：
BooleanQuery
    MUST: TermQuery: john
    MUST: TermQuery: data
    MUST: TermQuery: check

但您需要的查询是：
BooleanQuery
    SHOULD: TermQuery: john
    SHOULD: TermQuery: data
    SHOULD: TermQuery: check

因此，删除mainQuery.MinimumNumberShouldMatch=1行，然后将您的foreach
主体替换为以下内容，它应该可以完成工作：
mainQuery.Add(queryParser.Parse(word), Occur.SHOULD);


好的，这里有一个完整的例子，对我来说很有用：
var analyzer = new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30);

var directory = new RAMDirectory();

using (var writer = new IndexWriter(directory, analyzer, true, IndexWriter.MaxFieldLength.UNLIMITED))
{
    var doc = new Document();
    doc.Add(new Field("id", "1", Field.Store.YES, Field.Index.NOT_ANALYZED));
    doc.Add(new Field("name", "John Smith", Field.Store.NO, Field.Index.ANALYZED));
    doc.Add(new Field("additionalData", "faster data", Field.Store.NO, Field.Index.ANALYZED));
    writer.AddDocument(doc);

    doc = new Document();
    doc.Add(new Field("id", "2", Field.Store.YES, Field.Index.NOT_ANALYZED));
    doc.Add(new Field("name", "Alan Smith", Field.Store.NO, Field.Index.ANALYZED));
    doc.Add(new Field("additionalData", "faster drive", Field.Store.NO, Field.Index.ANALYZED));
    writer.AddDocument(doc);

    doc = new Document();
    doc.Add(new Field("id", "3", Field.Store.YES, Field.Index.NOT_ANALYZED));
    doc.Add(new Field("name", "Mike Std", Field.Store.NO, Field.Index.ANALYZED));
    doc.Add(new Field("additionalData", "faster check", Field.Store.NO, Field.Index.ANALYZED));
    writer.AddDocument(doc);
}

var words = new[] {"John", "data", "check"};
var parser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, new[] {"name", "additionalData"}, analyzer);


var mainQuery = new BooleanQuery();
foreach (var word in words)
    mainQuery.Add(parser.Parse(word), Occur.SHOULD); // Should probably use parser.Parse(QueryParser.Escape(word)) instead

using (var searcher = new IndexSearcher(directory))
{
    var results = searcher.Search(mainQuery, null, int.MaxValue, Sort.RELEVANCE);
    var idFieldSelector = new MapFieldSelector("id");

    foreach (var scoreDoc in results.ScoreDocs)
    {
        var doc = searcher.Doc(scoreDoc.Doc, idFieldSelector);
        Console.WriteLine("Found: {0}", doc.Get("id"));
    }
}

在我的例子中，我存储了一个具有相同字段名的字符串数组，我必须从结果Document
中检索所有字段值，因为Document.Get（“field_name”）
在有许多具有相同方式的字段时只返回第一列值
var multi_fields = doc.GetFields("field_name");
var field_values = multi_fields.Select(x => x.StringValue).ToArray();

另外，我必须启用通配符搜索，因为如果我没有键入完整的单词，例如，Jo
而不是John

 string[] words = "Jo data check".Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries).Select(x => string.Format("*{0}*", x)).ToArray();

 var queryParser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, cols, analyzer);
 parser.AllowLeadingWildcard = true;

但是，当我按照您的建议更改代码时，也返回了那些不符合搜索标准的文档hmmm，您能告诉我什么是mainQuery.ToString（）
prints吗？{（（name:Joh additionalData:Joh）（name:dat additionalData:dat）（name:chec additionalData:chec））~1}您没有删除MinimumNumberShouldMatch=1
，所以请尝试删除它，尽管我认为问题不在于此。奇怪的是它去掉了每个单词的最后一个字母StandardAnalyzer不应该这样做。好吧，即使我删除了MinimumNumberShouldMatch=1，结果也是一样的。它也没有剥离lat字母，我这样做是为了举例。
 string[] words = "Jo data check".Split(new string[] { " " }, StringSplitOptions.RemoveEmptyEntries).Select(x => string.Format("*{0}*", x)).ToArray();

 var queryParser = new MultiFieldQueryParser(Lucene.Net.Util.Version.LUCENE_30, cols, analyzer);
 parser.AllowLeadingWildcard = true;