Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/oracle/9.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
<img src="//i.stack.imgur.com/RUiNP.png" height="16" width="18" alt="" class="sponsor tag img">elasticsearch Elasticseach-匹配更多术语的文档比匹配更少术语的文档得分更低_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch - Fatal编程技术网 elasticsearch Elasticseach-匹配更多术语的文档比匹配更少术语的文档得分更低,elasticsearch,elasticsearch" /> elasticsearch Elasticseach-匹配更多术语的文档比匹配更少术语的文档得分更低,elasticsearch,elasticsearch" />

elasticsearch Elasticseach-匹配更多术语的文档比匹配更少术语的文档得分更低

elasticsearch Elasticseach-匹配更多术语的文档比匹配更少术语的文档得分更低,elasticsearch,elasticsearch,我有一个查询,应该返回具有类似兴趣的配置文件。问题是匹配项越多的文档得分越低 在boolquery中,我有shouldwithinterests=[“游戏”、“音乐”、“运动”] 兴趣为['games']的文档得分为0.14981213 兴趣为['games','music']的文档得分为0.11516824 为什么??我正在使用AWS elasticsearch,v。2.3.2 查询如下所示: { "explain": true, "from": 0, "query":

我有一个查询,应该返回具有类似兴趣的配置文件。问题是匹配项越多的文档得分越低

bool
query中,我有
should
with
interests=[“游戏”、“音乐”、“运动”]

兴趣为['games']的文档得分为0.14981213

兴趣为['games','music']的文档得分为0.11516824

为什么??我正在使用AWS elasticsearch,v。2.3.2

查询如下所示:

{
    "explain": true,
    "from": 0,
    "query": {
        "bool": {
            "filter": [
                {
                    "bool": {
                        "must_not": [
                            {
                                "term": {
                                    "id": 3918
                                }
                            }
                        ]
                    }
                }
            ],
            "should": [
                {
                    "terms": {
                        "interests": [
                            "games",
                            "music",
                            "sport"
                        ]
                    }
                }
            ]
        }
    },
    "size": 10
}
然后,我得到的结果是:

{
    "_shards": {
        "failed": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_explanation": {
                    "description": "sum of:",
                    "details": [
                        {
                            "description": "match on required clause, product of:",
                            "details": [
                                {
                                    "description": "# clause",
                                    "details": [],
                                    "value": 0.0
                                },
                                {
                                    "description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:",
                                    "details": [
                                        {
                                            "description": "boost",
                                            "details": [],
                                            "value": 1.0
                                        },
                                        {
                                            "description": "queryNorm",
                                            "details": [],
                                            "value": 0.4494364
                                        }
                                    ],
                                    "value": 0.4494364
                                }
                            ],
                            "value": 0.0
                        },
                        {
                            "description": "product of:",
                            "details": [
                                {
                                    "description": "sum of:",
                                    "details": [
                                        {
                                            "description": "weight(interests:games in 1) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "description": "score(doc=1,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "idf(docFreq=2, maxDocs=3)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "queryNorm",
                                                                    "details": [],
                                                                    "value": 0.4494364
                                                                }
                                                            ],
                                                            "value": 0.4494364
                                                        },
                                                        {
                                                            "description": "fieldWeight in 1, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "description": "termFreq=1.0",
                                                                            "details": [],
                                                                            "value": 1.0
                                                                        }
                                                                    ],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "idf(docFreq=2, maxDocs=3)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "fieldNorm(doc=1)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                }
                                                            ],
                                                            "value": 1.0
                                                        }
                                                    ],
                                                    "value": 0.4494364
                                                }
                                            ],
                                            "value": 0.4494364
                                        }
                                    ],
                                    "value": 0.4494364
                                },
                                {
                                    "description": "coord(1/3)",
                                    "details": [],
                                    "value": 0.33333334
                                }
                            ],
                            "value": 0.14981213
                        }
                    ],
                    "value": 0.14981213
                },
                "_id": "3917",
                "_index": "test_44024988_profiles",
                "_node": "urWXg5KhREyffYielaa6Rw",
                "_score": 0.14981213,
                "_shard": 2,
                "_source": {
                    "full_name": "Bob Doe",
                    "id": 3916,
                    "interests": [
                        "games"
                    ],
                    "user_id": 3917
                },
                "_type": "profile_document"
            },
            {
                "_explanation": {
                    "description": "sum of:",
                    "details": [
                        {
                            "description": "match on required clause, product of:",
                            "details": [
                                {
                                    "description": "# clause",
                                    "details": [],
                                    "value": 0.0
                                },
                                {
                                    "description": "-id:`\b\u0000\u0000\u001eN #*:*, product of:",
                                    "details": [
                                        {
                                            "description": "boost",
                                            "details": [],
                                            "value": 1.0
                                        },
                                        {
                                            "description": "queryNorm",
                                            "details": [],
                                            "value": 0.9173473
                                        }
                                    ],
                                    "value": 0.9173473
                                }
                            ],
                            "value": 0.0
                        },
                        {
                            "description": "product of:",
                            "details": [
                                {
                                    "description": "sum of:",
                                    "details": [
                                        {
                                            "description": "weight(interests:games in 0) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "description": "score(doc=0,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "queryNorm",
                                                                    "details": [],
                                                                    "value": 0.9173473
                                                                }
                                                            ],
                                                            "value": 0.2814906
                                                        },
                                                        {
                                                            "description": "fieldWeight in 0, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "description": "termFreq=1.0",
                                                                            "details": [],
                                                                            "value": 1.0
                                                                        }
                                                                    ],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "fieldNorm(doc=0)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                }
                                                            ],
                                                            "value": 0.30685282
                                                        }
                                                    ],
                                                    "value": 0.08637618
                                                }
                                            ],
                                            "value": 0.08637618
                                        },
                                        {
                                            "description": "weight(interests:music in 0) [PerFieldSimilarity], result of:",
                                            "details": [
                                                {
                                                    "description": "score(doc=0,freq=1.0), product of:",
                                                    "details": [
                                                        {
                                                            "description": "queryWeight, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "queryNorm",
                                                                    "details": [],
                                                                    "value": 0.9173473
                                                                }
                                                            ],
                                                            "value": 0.2814906
                                                        },
                                                        {
                                                            "description": "fieldWeight in 0, product of:",
                                                            "details": [
                                                                {
                                                                    "description": "tf(freq=1.0), with freq of:",
                                                                    "details": [
                                                                        {
                                                                            "description": "termFreq=1.0",
                                                                            "details": [],
                                                                            "value": 1.0
                                                                        }
                                                                    ],
                                                                    "value": 1.0
                                                                },
                                                                {
                                                                    "description": "idf(docFreq=1, maxDocs=1)",
                                                                    "details": [],
                                                                    "value": 0.30685282
                                                                },
                                                                {
                                                                    "description": "fieldNorm(doc=0)",
                                                                    "details": [],
                                                                    "value": 1.0
                                                                }
                                                            ],
                                                            "value": 0.30685282
                                                        }
                                                    ],
                                                    "value": 0.08637618
                                                }
                                            ],
                                            "value": 0.08637618
                                        }
                                    ],
                                    "value": 0.17275237
                                },
                                {
                                    "description": "coord(2/3)",
                                    "details": [],
                                    "value": 0.6666667
                                }
                            ],
                            "value": 0.11516824
                        }
                    ],
                    "value": 0.11516824
                },
                "_id": "3918",
                "_index": "test_44024988_profiles",
                "_node": "urWXg5KhREyffYielaa6Rw",
                "_score": 0.11516824,
                "_shard": 4,
                "_source": {
                    "full_name": "Alex Test",
                    "id": 3917,
                    "interests": [
                        "games",
                        "music"
                    ],
                    "user_id": 3918
                },
                "_type": "profile_document"
            },
            ... # not interesting doc
        ],
        "max_score": 0.14981213,
        "total": 3
    },
    "timed_out": false,
    "took": 3
}

我的输入数据:

[{
    "full_name": "Bob Doe",
    "id": 3916,
    "interests": [
        "games"
    ],
    "user_id": 3917
}, {
    "full_name": "Alex Test",
    "id": 3917,
    "interests": [
        "games",
        "music"
    ],
    "user_id": 3918
}, {
    "full_name": "Joe Test",
    "id": 3918,
    "user_id": 3919
}]

让我们看看Elasticsearch中的评分公式

score(q,d)  =  
            queryNorm(q)  
          · coord(q,d)    
          · ∑ (           
                tf(t in d)   
              · idf(t)²      
              · t.getBoost() 
              · norm(t,d)    
            ) (t in q)    
参考是,如果你不知道,你可以在这里得到一些描述。但是对你的案例的解释将非常简单,它只是做事情的公式,以及所有这些因素的组合(tfidfqueryNorm,等等)。此外,如果索引是虚拟的,并且只包含两个文档,那么这些值可能会非常奇怪


我可以深入解释,但主要是一个评分公式。如果你想解决这个问题,那是另一个问题,你可以通过做不同的查询来解决!谢谢你的回复。我理解这个公式,但现在有一个问题——这个公式是错误的还是我的期望?我认为,
filter
不应该影响分数,
应该
作为一个查询,应该向前迈进。是的,你是对的,filter不会影响分数,这正是你的情况,你只是从术语查询中得到分数。问题是,我们可以手工计算tf idf,看看公式是否完全相同,相信我,它会的。tf idf是一个棘手的问题,因为它考虑到了术语的稀有性,我不会告诉你分数与公式给出的分数不同。考虑到公式,我们同意它是正确的,但考虑到用户的共同期望,我只是想知道它是否正确。但也许那只是我。另一件事是它似乎不稳定。关于这个问题的更多上下文是,这是我在CI服务器上进行单元测试时得到的结果,在我的本地机器上的分数是“正确的”(符合我的预期)。即使使用相同的elasticsearch,也只是不同的索引名。您能提供您的完整样本数据吗?我已经添加到原始问题的底部。