Python 3.6 ElasticSearch查询未返回预期结果

Python 3.6 ElasticSearch查询未返回预期结果,python-3.6,elasticsearch-7,Python 3.6,Elasticsearch 7,我有一个json结构,如下所示: {"DocumentName":"es","DocumentId":"2","Content": [{"PageNo":1,"Text": "The full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was appli

我有一个json结构,如下所示:

{"DocumentName":"es","DocumentId":"2","Content": [{"PageNo":1,"Text": "The full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing."},{"PageNo":2,"Text": "The query string is processed using the same analyzer that was applied to the field during indexing."}]}
我需要得到Content.Text字段的词干分析结果。为此,我在创建索引时创建了一个映射。如下所示:

curl -X PUT "localhost:9200/myindex?pretty" -H "Content-Type: application/json" -d"{
    "settings": {
        "analysis": {
            "analyzer": {
                "my_analyzer": {
                    "tokenizer": "standard",
                    "filter": ["lowercase", "my_stemmer"]
                }
            },
            "filter": {
                "my_stemmer": {
                    "type": "stemmer",
                    "name": "english"
                }
            }
        }
    }
}, {
    "mappings": {
        "properties": {
            "DocumentName": {
                "type": "text"
            },
            "DocumentId": {
                "type": "keyword"
            },
            "Content": {
                "properties": {
                    "PageNo": {
                        "type": "integer"
                    },
                    "Text": "_all": {
                        "type": "text",
                        "analyzer": "my_analyzer",
                        "search_analyzer": "my_analyzer"
                    }
                }
            }
        }
    }
}
}"
我检查了创建的分析器:

curl -X GET "localhost:9200/myindex/_analyze?pretty" -H "Content-Type: application/json" -d"{\"analyzer\":\"my_analyzer\",\"text\":\"indexing\"}"
结果是:

{
  "tokens" : [
    {
      "token" : "index",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    }
  ]
}

任何帮助都将不胜感激。提前谢谢。

忽略我的评论。词干分析器正在工作。请尝试以下操作:

映射:

curl -X DELETE "localhost:9200/myindex"

curl -X PUT "localhost:9200/myindex?pretty" -H "Content-Type: application/json" -d'
{ 
    "settings":{ 
       "analysis":{ 
          "analyzer":{ 
             "english_exact":{ 
                "tokenizer":"standard",
                "filter":[ 
                   "lowercase"
                ]
             }
          }
       }
    },
    "mappings":{ 
       "properties":{ 
          "DocumentName":{ 
             "type":"text"
          },
          "DocumentId":{ 
             "type":"keyword"
          },
          "Content":{ 
             "properties":{ 
                "PageNo":{ 
                   "type":"integer"
                },
                "Text":{ 
                   "type":"text",
                   "analyzer":"english",
                   "fields":{ 
                      "exact":{ 
                         "type":"text",
                         "analyzer":"english_exact"
                      }
                   }
                }
             }
          }
       }
    }
 }'
curl -XPOST "localhost:9200/myindex/_doc/1" -H "Content-Type: application/json" -d'
{ 
   "DocumentName":"es",
   "DocumentId":"2",
   "Content":[ 
      { 
         "PageNo":1,
         "Text":"The full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing."
      },
      { 
         "PageNo":2,
         "Text":"The query string is processed using the same analyzer that was applied to the field during indexing."
      }
   ]
}'
curl -XGET 'localhost:9200/myindex/_search?pretty' -H "Content-Type: application/json"  -d '
{ 
   "query":{ 
      "simple_query_string":{ 
         "fields":[ 
            "Content.Text"
         ],
         "query":"index"
      }
   }
}'
数据:

curl -X DELETE "localhost:9200/myindex"

curl -X PUT "localhost:9200/myindex?pretty" -H "Content-Type: application/json" -d'
{ 
    "settings":{ 
       "analysis":{ 
          "analyzer":{ 
             "english_exact":{ 
                "tokenizer":"standard",
                "filter":[ 
                   "lowercase"
                ]
             }
          }
       }
    },
    "mappings":{ 
       "properties":{ 
          "DocumentName":{ 
             "type":"text"
          },
          "DocumentId":{ 
             "type":"keyword"
          },
          "Content":{ 
             "properties":{ 
                "PageNo":{ 
                   "type":"integer"
                },
                "Text":{ 
                   "type":"text",
                   "analyzer":"english",
                   "fields":{ 
                      "exact":{ 
                         "type":"text",
                         "analyzer":"english_exact"
                      }
                   }
                }
             }
          }
       }
    }
 }'
curl -XPOST "localhost:9200/myindex/_doc/1" -H "Content-Type: application/json" -d'
{ 
   "DocumentName":"es",
   "DocumentId":"2",
   "Content":[ 
      { 
         "PageNo":1,
         "Text":"The full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing."
      },
      { 
         "PageNo":2,
         "Text":"The query string is processed using the same analyzer that was applied to the field during indexing."
      }
   ]
}'
curl -XGET 'localhost:9200/myindex/_search?pretty' -H "Content-Type: application/json"  -d '
{ 
   "query":{ 
      "simple_query_string":{ 
         "fields":[ 
            "Content.Text"
         ],
         "query":"index"
      }
   }
}'
查询:

curl -X DELETE "localhost:9200/myindex"

curl -X PUT "localhost:9200/myindex?pretty" -H "Content-Type: application/json" -d'
{ 
    "settings":{ 
       "analysis":{ 
          "analyzer":{ 
             "english_exact":{ 
                "tokenizer":"standard",
                "filter":[ 
                   "lowercase"
                ]
             }
          }
       }
    },
    "mappings":{ 
       "properties":{ 
          "DocumentName":{ 
             "type":"text"
          },
          "DocumentId":{ 
             "type":"keyword"
          },
          "Content":{ 
             "properties":{ 
                "PageNo":{ 
                   "type":"integer"
                },
                "Text":{ 
                   "type":"text",
                   "analyzer":"english",
                   "fields":{ 
                      "exact":{ 
                         "type":"text",
                         "analyzer":"english_exact"
                      }
                   }
                }
             }
          }
       }
    }
 }'
curl -XPOST "localhost:9200/myindex/_doc/1" -H "Content-Type: application/json" -d'
{ 
   "DocumentName":"es",
   "DocumentId":"2",
   "Content":[ 
      { 
         "PageNo":1,
         "Text":"The full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing."
      },
      { 
         "PageNo":2,
         "Text":"The query string is processed using the same analyzer that was applied to the field during indexing."
      }
   ]
}'
curl -XGET 'localhost:9200/myindex/_search?pretty' -H "Content-Type: application/json"  -d '
{ 
   "query":{ 
      "simple_query_string":{ 
         "fields":[ 
            "Content.Text"
         ],
         "query":"index"
      }
   }
}'
正如预期的那样,只返回了一个文档。我还测试了以下词干,它们都正确地使用了建议的映射:apply(applied)、text(text)、use(using)

Python示例:

import requests
from elasticsearch import Elasticsearch

res = requests.get('http://localhost:9200')
es = Elasticsearch([{'host': 'localhost', 'port': '9200'}])
res = es.search(index='myindex', body={"query": {"match": {"Content.Text": "index"}}})

print(res)

在Elasticsearch 7.4上进行了测试。

IMHO茎秆切割机只是断裂。我尝试过“企业”或“员工”等复数词。第一个词干是“businesse”,第二个词干完全没有。你会发现更多的统计信息Hi-Tomas,给定的链接不起作用