<img src="//i.stack.imgur.com/RUiNP.png" height="16" width="18" alt="" class="sponsor tag img">elasticsearch Elasticseach-匹配更多术语的文档比匹配更少术语的文档得分更低_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch

elasticsearch Elasticseach-匹配更多术语的文档比匹配更少术语的文档得分更低

elasticsearch Elasticseach-匹配更多术语的文档比匹配更少术语的文档得分更低,elasticsearch,elasticsearch,我有一个查询，应该返回具有类似兴趣的配置文件。问题是匹配项越多的文档得分越低在boolquery中，我有shouldwithinterests=[“游戏”、“音乐”、“运动”] 兴趣为['games']的文档得分为0.14981213 兴趣为['games'，'music']的文档得分为0.11516824 为什么?？我正在使用AWS elasticsearch，v。2.3.2 查询如下所示： { "explain": true, "from": 0, "query":

我有一个查询，应该返回具有类似兴趣的配置文件。问题是匹配项越多的文档得分越低

在

bool

query中，我有

should

with

interests=[“游戏”、“音乐”、“运动”]

兴趣为['games']的文档得分为0.14981213

兴趣为['games'，'music']的文档得分为0.11516824

为什么?？我正在使用AWS elasticsearch，v。2.3.2

查询如下所示：

{
    "explain": true,
    "from": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "bool": {
                        "must_not": [
                            {
                                "term": {
                                    "id": 3918
                                }
                            }
                        ]
                    }
                }
            ],
            "should": [
                {
                    "terms": {
                        "interests": [
                            "games",
                            "music",
                            "sport"
                        ]
                    }
                }
            ]
        }
    },
    "size": 10
}

然后，我得到的结果是：

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_explanation": {
                    "description": "sum of:",
                    "details": [
                        {
                            "description": "match on required clause, product of:",
                            "details": [
                                {
                                    "description": "# clause",
                                    "details": [],
                                    "value": 0.0
                                },
                                {
                                    "description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:",
                                    "details": [
                                        {
                                            "description": "boost",
                                            "details": [],
                                            "value": 1.0
                                        },
                                        {
                                            "description": "queryNorm",
                                            "details": [],
                                            "value": 0.4494364
                                        }
                                    ],
                                    "value": 0.4494364
                                }
                            ],
                            "value": 0.0
                        },
                        {
                            "description": "product of:",
                            "details": [
                                {
                                    "description": "sum of:",
                                    "details": [
                                        {
                                            "description": "weight(interests:games in 1) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "description": "score(doc=1,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "idf(docFreq=2, maxDocs=3)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "queryNorm",
                                                                    "details": [],
                                                                    "value": 0.4494364
                                                                }
                                                            ],
                                                            "value": 0.4494364
                                                        },
                                                        {
                                                            "description": "fieldWeight in 1, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "description": "termFreq=1.0",
                                                                            "details": [],
                                                                            "value": 1.0
                                                                        }
                                                                    ],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "idf(docFreq=2, maxDocs=3)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "fieldNorm(doc=1)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                }
                                                            ],
                                                            "value": 1.0
                                                        }
                                                    ],
                                                    "value": 0.4494364
                                                }
                                            ],
                                            "value": 0.4494364
                                        }
                                    ],
                                    "value": 0.4494364
                                },
                                {
                                    "description": "coord(1/3)",
                                    "details": [],
                                    "value": 0.33333334
                                }
                            ],
                            "value": 0.14981213
                        }
                    ],
                    "value": 0.14981213
                },
                "_id": "3917",
                "_index": "test_44024988_profiles",
                "_node": "urWXg5KhREyffYielaa6Rw",
                "_score": 0.14981213,
                "_shard": 2,
                "_source": {
                    "full_name": "Bob Doe",
                    "id": 3916,
                    "interests": [
                        "games"
                    ],
                    "user_id": 3917
                },
                "_type": "profile_document"
            },
            {
                "_explanation": {
                    "description": "sum of:",
                    "details": [
                        {
                            "description": "match on required clause, product of:",
                            "details": [
                                {
                                    "description": "# clause",
                                    "details": [],
                                    "value": 0.0
                                },
                                {
                                    "description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:",
                                    "details": [
                                        {
                                            "description": "boost",
                                            "details": [],
                                            "value": 1.0
                                        },
                                        {
                                            "description": "queryNorm",
                                            "details": [],
                                            "value": 0.9173473
                                        }
                                    ],
                                    "value": 0.9173473
                                }
                            ],
                            "value": 0.0
                        },
                        {
                            "description": "product of:",
                            "details": [
                                {
                                    "description": "sum of:",
                                    "details": [
                                        {
                                            "description": "weight(interests:games in 0) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "description": "score(doc=0,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "queryNorm",
                                                                    "details": [],
                                                                    "value": 0.9173473
                                                                }
                                                            ],
                                                            "value": 0.2814906
                                                        },
                                                        {
                                                            "description": "fieldWeight in 0, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "description": "termFreq=1.0",
                                                                            "details": [],
                                                                            "value": 1.0
                                                                        }
                                                                    ],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "fieldNorm(doc=0)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                }
                                                            ],
                                                            "value": 0.30685282
                                                        }
                                                    ],
                                                    "value": 0.08637618
                                                }
                                            ],
                                            "value": 0.08637618
                                        },
                                        {
                                            "description": "weight(interests:music in 0) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "description": "score(doc=0,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "queryNorm",
                                                                    "details": [],
                                                                    "value": 0.9173473
                                                                }
                                                            ],
                                                            "value": 0.2814906
                                                        },
                                                        {
                                                            "description": "fieldWeight in 0, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "description": "termFreq=1.0",
                                                                            "details": [],
                                                                            "value": 1.0
                                                                        }
                                                                    ],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "fieldNorm(doc=0)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                }
                                                            ],
                                                            "value": 0.30685282
                                                        }
                                                    ],
                                                    "value": 0.08637618
                                                }
                                            ],
                                            "value": 0.08637618
                                        }
                                    ],
                                    "value": 0.17275237
                                },
                                {
                                    "description": "coord(2/3)",
                                    "details": [],
                                    "value": 0.6666667
                                }
                            ],
                            "value": 0.11516824
                        }
                    ],
                    "value": 0.11516824
                },
                "_id": "3918",
                "_index": "test_44024988_profiles",
                "_node": "urWXg5KhREyffYielaa6Rw",
                "_score": 0.11516824,
                "_shard": 4,
                "_source": {
                    "full_name": "Alex Test",
                    "id": 3917,
                    "interests": [
                        "games",
                        "music"
                    ],
                    "user_id": 3918
                },
                "_type": "profile_document"
            },
            ... # not interesting doc
        ],
        "max_score": 0.14981213,
        "total": 3
    },
    "timed_out": false,
    "took": 3
}

我的输入数据：

[{
    "full_name": "Bob Doe",
    "id": 3916,
    "interests": [
        "games"
    ],
    "user_id": 3917
}, {
    "full_name": "Alex Test",
    "id": 3917,
    "interests": [
        "games",
        "music"
    ],
    "user_id": 3918
}, {
    "full_name": "Joe Test",
    "id": 3918,
    "user_id": 3919
}]

让我们看看Elasticsearch中的评分公式

score(q,d)  =  
            queryNorm(q)  
          · coord(q,d)    
          · ∑ (           
                tf(t in d)   
              · idf(t)²      
              · t.getBoost() 
              · norm(t,d)    
            ) (t in q)

参考是，如果你不知道，你可以在这里得到一些描述。但是对你的案例的解释将非常简单，它只是做事情的公式，以及所有这些因素的组合（tf，idf，queryNorm，等等）。此外，如果索引是虚拟的，并且只包含两个文档，那么这些值可能会非常奇怪

我可以深入解释，但主要是一个评分公式。如果你想解决这个问题，那是另一个问题，你可以通过做不同的查询来解决！谢谢你的回复。我理解这个公式，但现在有一个问题——这个公式是错误的还是我的期望？我认为，

filter

不应该影响分数，

应该

作为一个查询，应该向前迈进。是的，你是对的，filter不会影响分数，这正是你的情况，你只是从术语查询中得到分数。问题是，我们可以手工计算tf idf，看看公式是否完全相同，相信我，它会的。tf idf是一个棘手的问题，因为它考虑到了术语的稀有性，我不会告诉你分数与公式给出的分数不同。考虑到公式，我们同意它是正确的，但考虑到用户的共同期望，我只是想知道它是否正确。但也许那只是我。另一件事是它似乎不稳定。关于这个问题的更多上下文是，这是我在CI服务器上进行单元测试时得到的结果，在我的本地机器上的分数是“正确的”（符合我的预期）。即使使用相同的elasticsearch，也只是不同的索引名。您能提供您的完整样本数据吗？我已经添加到原始问题的底部。