理解Solr中的得分

理解Solr中的得分,solr,lucene,Solr,Lucene,我是Solr的新手,尝试索引一些文档,其中每个文档都是json。有些文档的分数应该很高,但分数很低。我查询的字段类型是text\u general。 需要对tfNorm、字段长度等字段有一些了解 附件是调试查询的结果 "718152d81b4db95f":"\n1.0891073 = sum of:\n 0.5578956 = weight(channel_genre:sports in 53) [SchemaSimilarity], result of:\n 0.5578956 = s

我是Solr的新手,尝试索引一些文档,其中每个文档都是json。有些文档的分数应该很高,但分数很低。我查询的字段类型是text\u general。 需要对tfNorm、字段长度等字段有一些了解

附件是调试查询的结果

"718152d81b4db95f":"\n1.0891073 = sum of:\n  0.5578956 = weight(channel_genre:sports in 53) [SchemaSimilarity], result of:\n    0.5578956 = score(doc=53,freq=11.0 = termFreq=11.0\n), product of:\n      0.29769886 = idf(docFreq=223, docCount=300)\n      1.8740268 = tfNorm, computed from:\n        11.0 = termFreq=11.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        256.0 = fieldLength\n  0.53121173 = weight(channel_genre:kids in 53) [SchemaSimilarity], result of:\n    0.53121173 = score(doc=53,freq=12.0 = termFreq=12.0\n), product of:\n      0.27996004 = idf(docFreq=227, docCount=300)\n      1.8974556 = tfNorm, computed from:\n        12.0 = termFreq=12.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        256.0 = fieldLength\n",
  "7071fa048f60603":"\n1.0834496 = sum of:\n  0.5491592 = weight(channel_genre:sports in 75) [SchemaSimilarity], result of:\n    0.5491592 = score(doc=75,freq=23.0 = termFreq=23.0\n), product of:\n      0.29769886 = idf(docFreq=223, docCount=300)\n      1.8446804 = tfNorm, computed from:\n        23.0 = termFreq=23.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        655.36 = fieldLength\n  0.53429043 = weight(channel_genre:kids in 75) [SchemaSimilarity], result of:\n    0.53429043 = score(doc=75,freq=29.0 = termFreq=29.0\n), product of:\n      0.27996004 = idf(docFreq=227, docCount=300)\n      1.9084525 = tfNorm, computed from:\n        29.0 = termFreq=29.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        655.36 = fieldLength\n",
  "17e4a205707dc974":"\n1.0824875 = sum of:\n  0.62048614 = weight(channel_genre:sports in 64) [SchemaSimilarity], result of:\n    0.62048614 = score(doc=64,freq=24.0 = termFreq=24.0\n), product of:\n      0.29769886 = idf(docFreq=223, docCount=300)\n      2.0842745 = tfNorm, computed from:\n        24.0 = termFreq=24.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        163.84 = fieldLength\n  0.46200132 = weight(channel_genre:kids in 64) [SchemaSimilarity], result of:\n    0.46200132 = score(doc=64,freq=4.0 = termFreq=4.0\n), product of:\n      0.27996004 = idf(docFreq=227, docCount=300)\n      1.6502403 = tfNorm, computed from:\n        4.0 = termFreq=4.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        163.84 = fieldLength\n",
  "1a48c3a658cc07af":"\n1.0820175 = sum of:\n  0.58498204 = weight(channel_genre:sports in 59) [SchemaSimilarity], result of:\n    0.58498204 = score(doc=59,freq=16.0 = termFreq=16.0\n), product of:\n      0.29769886 = idf(docFreq=223, docCount=300)\n      1.9650128 = tfNorm, computed from:\n        16.0 = termFreq=16.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        256.0 = fieldLength\n  0.49703547 = weight(channel_genre:kids in 59) [SchemaSimilarity], result of:\n    0.49703547 = score(doc=59,freq=8.0 = termFreq=8.0\n), product of:\n      0.27996004 = idf(docFreq=227, docCount=300)\n      1.7753801 = tfNorm, computed from:\n        8.0 = termFreq=8.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        256.0 = fieldLength\n",
  "e073dacae12f494b":"\n1.0804946 = sum of:\n  0.5613358 = weight(channel_genre:sports in 17) [SchemaSimilarity], result of:\n    0.5613358 = score(doc=17,freq=19.0 = termFreq=19.0\n), product of:\n      0.29769886 = idf(docFreq=223, docCount=300)\n      1.8855827 = tfNorm, computed from:\n        19.0 = termFreq=19.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        455.1111 = fieldLength\n  0.51915884 = weight(channel_genre:kids in 17) [SchemaSimilarity], result of:\n    0.51915884 = score(doc=17,freq=17.0 = termFreq=17.0\n), product of:\n      0.27996004 = idf(docFreq=227, docCount=300)\n      1.8544034 = tfNorm, computed from:\n        17.0 = termFreq=17.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        455.1111 = fieldLength\n",
  "c69628bbb1d9f3ca":"\n1.0785265 = sum of:\n  0.55884564 = weight(channel_genre:sports in 96) [SchemaSimilarity], result of:\n    0.55884564 = score(doc=96,freq=14.0 = termFreq=14.0\n), product of:\n      0.29769886 = idf(docFreq=223, docCount=300)\n      1.877218 = tfNorm, computed from:\n        14.0 = termFreq=14.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        334.36734 = fieldLength\n  0.51968086 = weight(channel_genre:kids in 96) [SchemaSimilarity], result of:\n    0.51968086 = score(doc=96,freq=13.0 = termFreq=13.0\n), product of:\n      0.27996004 = idf(docFreq=227, docCount=300)\n      1.8562679 = tfNorm, computed from:\n        13.0 = termFreq=13.0\n        1.2 = parameter k1\n        0.75 = parameter b\n        142.80667 = avgFieldLength\n        334.36734 = fieldLength\n",

根据我提交的查询,“c69628bbb1d9f3ca”的分数应该高于其他文件。我在这里缺少的是理解。请解释。

从调试中查询频道类型字段。对于c69628bbb1d9f3ca字段,分数受术语数量和字段长度的影响,但结果中的分数仅略有不同

  • 术语频率是术语出现在字段中的频率,匹配越多,结果越重要
  • 字段长度-较短的字段不太可能包含命中,因此获得提升
您正在使用标准查询解析器吗

也许你可以解释一下为什么你认为结果不正确


如果你想禁用长度标准化,也要考虑OMITSimult=“true”。在SURR中,

查询是信道类型:“体育”和信道类型:“孩子”,即(观看儿童和运动的用户的数量)。返回的文档数量:150最高分数:1.2256454我特别添加了100名经常观看儿童和体育节目的用户,以验证他们是否进入前100名。但有6个用户低于100,其中“c69628bbb1d9f3ca”就是这样一个用户。只是想了解字段长度对分数是否有很大的影响。考虑到你所发表的字段的接近程度,我会说它在这个案例中是这样的。顺便问一下,你是否考虑过在你的字段中尝试OMITStase=“true”(应该禁用长度标准化)?是的,我尝试过OmiTrase=“true”。.现在结果与预期一样:)