如何在elasticsearch中配置同义词路径
我是elasticsearch的新手,我想使用同义词,我在配置文件中添加了以下几行:如何在elasticsearch中配置同义词路径,search,
elasticsearch,search-engine,Search,
elasticsearch,Search Engine,我是elasticsearch的新手,我想使用同义词,我在配置文件中添加了以下几行: index : analysis : analyzer : synonym : type : custom tokenizer : whitespace filter : [synonym] filter : synonym :
index :
analysis :
analyzer :
synonym :
type : custom
tokenizer : whitespace
filter : [synonym]
filter :
synonym :
type : synonym
synonyms_path: synonyms.txt
然后我创建了一个索引测试:
"mappings" : {
"test" : {
"properties" : {
"text_1" : {
"type" : "string",
"analyzer" : "synonym"
},
"text_2" : {
"search_analyzer" : "standard",
"index_analyzer" : "synonym",
"type" : "string"
},
"text_3" : {
"type" : "string",
"analyzer" : "synonym"
}
}
}
}
并使用以下数据进行了型式试验:
{
"text_3" : "foo dog cat",
"text_2" : "foo dog cat",
"text_1" : "foo dog cat"
}
synonyms.txt包含“foo,bar,baz”,当我搜索foo时,它返回我期望的结果,但当我搜索baz或bar时,它返回零结果:
{
"query":{
"query_string":{
"query" : "bar",
"fields" : [ "text_1"],
"use_dis_max" : true,
"boost" : 1.0
}}}
结果:
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":0,
"max_score":null,
"hits":[
]
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
{
"tokens": [
{
"token": "foo",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "baz",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "bar",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
}
]
}
我不知道,如果你的问题是因为你定义了“酒吧”的同义词。正如你所说的,你是个新手,我将举一个与你相似的例子。我想展示elasticsearch如何在搜索时和索引时处理同义词。希望能有帮助 首先创建同义词文件:
foo => foo bar, baz
现在,我使用您尝试测试的特定设置创建索引:
curl -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "synonyms.txt"
}
}
}
}
},
"mappings": {
"test" : {
"properties" : {
"text_1" : {
"type" : "string",
"analyzer" : "synonym"
},
"text_2" : {
"search_analyzer" : "standard",
"index_analyzer" : "standard",
"type" : "string"
},
"text_3" : {
"type" : "string",
"search_analyzer" : "synonym",
"index_analyzer" : "standard"
}
}
}
}
}'
请注意,synonyms.txt必须与配置文件位于同一目录中,因为该路径是相对于config dir的
现在索引一个文档:
curl -XPUT 'http://localhost:9200/test/test/1' -d '{
"text_3": "baz dog cat",
"text_2": "foo dog cat",
"text_1": "foo dog cat"
}'
现在搜索
在字段文本中搜索\u 1
curl -XGET 'http://localhost:9200/test/_search?q=text_1:baz'
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.15342641,
"_source": {
"text_3": "baz dog cat",
"text_2": "foo dog cat",
"text_1": "foo dog cat"
}
}
]
}
}
获得该文档是因为baz是foo的同义词,而在索引时foo是用其同义词展开的
在字段文本_2中搜索
curl -XGET 'http://localhost:9200/test/_search?q=text_2:baz'
结果:
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":0,
"max_score":null,
"hits":[
]
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
{
"tokens": [
{
"token": "foo",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "baz",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "bar",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
}
]
}
我没有得到点击,因为我在索引时没有扩展同义词(标准分析器)。而且,因为我正在搜索baz,而baz不在文本中,所以我没有得到任何结果
在字段文本_3中搜索
curl -XGET 'http://localhost:9200/test/_search?q=text_3:foo'
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.15342641,
"_source": {
"text_3": "baz dog cat",
"text_2": "foo dog cat",
"text_1": "foo dog cat"
}
}
]
}
}
注:文本_3为“baz狗猫”
文本_3是索引,没有扩展同义词。当我搜索foo时,我得到了结果,foo的同义词之一是“baz”
如果要调试,可以使用\u analyze
端点,例如:
curl -XGET 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'
结果:
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":0,
"max_score":null,
"hits":[
]
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
{
"tokens": [
{
"token": "foo",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "baz",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "bar",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
}
]
}
如何为同义词文件提供用户定义的路径