elasticsearch ElasticSearch查询/搜索/匹配,elasticsearch,elasticsearch" /> elasticsearch ElasticSearch查询/搜索/匹配,elasticsearch,elasticsearch" />

elasticsearch ElasticSearch查询/搜索/匹配

elasticsearch ElasticSearch查询/搜索/匹配,elasticsearch,elasticsearch,我在ElasticSearch索引中插入了3条记录,如下所示: curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1' -d ' { "cityNames" : [ { "language" : "ENG", "name" : "w bridgewater", "raw_name" : "W BRIDGEWATER" }, { "language" : "ENG", "name" : "

我在ElasticSearch索引中插入了3条记录,如下所示:

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "w bridgewater",
    "raw_name" : "W BRIDGEWATER"
  },
  { "language" : "ENG",
    "name" : "west bridgewater",
    "raw_name" : "West Bridgewater"
  }
],
"id" : 1,
  "streetNames" : [ { "language" : "ENG",
    "name" : "cram rd",
    "raw_name" : "Cram Rd"
  } ]
}'

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "bridgewater corners",
    "raw_name" : "BRIDGEWATER CORNERS"
  },
  { "language" : "ENG",
    "name" : "bridgewater center",
    "raw_name" : "Bridgewater Center"
  }
],
"id" : 2,
"streetNames" : [ { "language" : "ENG",
    "name" : "valley view rd",
    "raw_name" : "Valley View Rd"
  } ]
}'

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "bridgewater",
    "raw_name" : "Bridgewater"
  },
  { "language" : "ENG",
    "name" : "windsor",
    "raw_name" : "Windsor"
  }
],
"id" : 3,
"streetNames" : [ { "language" : "ENG",
    "name" : "valley view rd",
    "raw_name" : "Valley View Rd"
  } ]
}'
curl -XGET 'http://127.0.0.1:9200/geoindex_test/STREET/_search?pretty=1'  -d '
{
"query" : {
    "match" : { "cityNames.name" : "bridgewater" }
}
}'
我执行如下搜索:

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "w bridgewater",
    "raw_name" : "W BRIDGEWATER"
  },
  { "language" : "ENG",
    "name" : "west bridgewater",
    "raw_name" : "West Bridgewater"
  }
],
"id" : 1,
  "streetNames" : [ { "language" : "ENG",
    "name" : "cram rd",
    "raw_name" : "Cram Rd"
  } ]
}'

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "bridgewater corners",
    "raw_name" : "BRIDGEWATER CORNERS"
  },
  { "language" : "ENG",
    "name" : "bridgewater center",
    "raw_name" : "Bridgewater Center"
  }
],
"id" : 2,
"streetNames" : [ { "language" : "ENG",
    "name" : "valley view rd",
    "raw_name" : "Valley View Rd"
  } ]
}'

curl -XPOST 'http://127.0.0.1:9200/geoindex_test/STREET?pretty=1'  -d '
{ "cityNames" : [ { "language" : "ENG",
    "name" : "bridgewater",
    "raw_name" : "Bridgewater"
  },
  { "language" : "ENG",
    "name" : "windsor",
    "raw_name" : "Windsor"
  }
],
"id" : 3,
"streetNames" : [ { "language" : "ENG",
    "name" : "valley view rd",
    "raw_name" : "Valley View Rd"
  } ]
}'
curl -XGET 'http://127.0.0.1:9200/geoindex_test/STREET/_search?pretty=1'  -d '
{
"query" : {
    "match" : { "cityNames.name" : "bridgewater" }
}
}'

我认为ElasticSearch将返回第三条记录(id==3)作为最佳匹配(记录3是唯一与“bridgewater”完全匹配的记录),但它将返回id 1(w bridgewater)的记录作为最佳匹配。我做错了什么?

我想这是因为你使用的是内部对象,基本上是将它下面的对象折叠成一个,用于搜索目的。因此,当您查询对象1的搜索字段时,例如,您查询的是[“w bridgewater”,“west bridgewater”],而不是您可能想象的离散字段

由于“bridgewater”在对象1和2(两个名称字段)中出现两次,而在对象3中出现一次,因此这些项目在搜索中排名较高。最终会拾取对象1,因为“bridgewater”出现的字段比对象2中的字符串短(“w bridgewater”与“bridgewater角”)


不要像现在这样使用内部对象,而是使用嵌套对象。将分数模式设置为“max”将使您更直观地了解情况。

您可以看到关于在请求中启用解释输出的原因的详细说明。只需将
explain=true
请求参数添加到url。如果您能将输出添加到您的答案中,我将很乐意为您提供帮助。@javanna-谢谢您的回复。explain=true的输出超出了stackoverflow允许的字符数。抱歉,我无法提供信息。也许你可以发布相关部分或使用第三方服务,如pastebin或github gist。@javanna-我以前从未使用过pastebin。希望你能访问我的帖子:你得到的答案非常好。正如你所看到的,你的前两份文件都有tf(术语频率)2,原因与答案中所解释的完全相同。第三个文档的fieldNorm更高,这是表明它是完美匹配的因素,但由于术语频率只有一个,其他文档更相关。你知道吗?我喜欢你编辑答案的方式,有道理!即使看到解释的输出总是有帮助的!