Elasticsearch多值匹配,无需分析器

Elasticsearch多值匹配,无需分析器,
Warning: implode(): Invalid arguments passed in /data/phpspider/zhask/webroot/tpl/detail.html on line 45
,,请原谅我对ElasticSearch的了解。我有一个Elasticsearch集合,其中包含以下文档: { "date": "2013-12-30T00:00:00.000Z", "value": 2, "dimensions": { "region": "Coimbra District" } } { "date": "2013-12-30T00:00:00.000Z", "value": 1, "dimensions"

请原谅我对ElasticSearch的了解。我有一个Elasticsearch集合,其中包含以下文档:

{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 2,
    "dimensions": {
        "region": "Coimbra District"

    }
}
{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 1,
    "dimensions": {
        "region": "Federal District"        
    }
}
{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 1,
    "dimensions": {
        "region": "Masovian Voivodeship"
    }
}
这3个json文档在ES服务器中编制索引。我没有提供任何分析器类型(也不知道如何提供:) 我正在使用spring data Elasticsearch并执行以下查询来搜索区域为“Masovian Voivodeship”或“Federal District”的文档:

{
  "query_string" : {
    "query" : "Masovian Voivodeship OR Federal District",
    "fields" : [ "dimensions.region" ]
  }
}
我希望它能返回2支安打。但是,它会返回所有3个文档(可能是因为第3个文档中有地区)。如何修改查询,使其能够执行精确匹配并仅提供2个文档?我使用以下方法:

QueryBuilders.queryString(<OR string>).field("dimensions.region")
QueryBuilders.queryString()字段(“dimensions.region”)
我试过
QueryBuilders.termsQuery
QueryBuilders.inQuery
QueryBuilders.matchQuery
(带数组)但运气不好


有人能帮忙吗?提前谢谢。

这里有几件事你可以做

首先,我在没有任何显式映射或分析的情况下设置了一个索引,这意味着将使用。这很重要,因为它决定了我们如何查询文本字段

所以我从以下几点开始:

DELETE /test_index

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
   }
}

PUT /test_index/doc/1
{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 2,
    "dimensions": {
        "region": "Coimbra District"

    }
}

PUT /test_index/doc/2
{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 1,
    "dimensions": {
        "region": "Federal District"        
    }
}

PUT /test_index/doc/3
{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 1,
    "dimensions": {
        "region": "Masovian Voivodeship"
    }
}
然后我尝试了你的查询,结果没有找到。我不明白为什么在
字段中有
“dimensions.ga:region”
,但当我将其更改为
“dimensions.region”
时,我得到了一些结果:

POST /test_index/doc/_search
{
   "query": {
      "query_string": {
         "query": "Masovian Voivodeship OR Federal District",
         "fields": [
            "dimensions.region"
         ]
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0.46911472,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "3",
            "_score": 0.46911472,
            "_source": {
               "date": "2013-12-30T00:00:00.000Z",
               "value": 1,
               "dimensions": {
                  "region": "Masovian Voivodeship"
               }
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.3533006,
            "_source": {
               "date": "2013-12-30T00:00:00.000Z",
               "value": 1,
               "dimensions": {
                  "region": "Federal District"
               }
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.05937162,
            "_source": {
               "date": "2013-12-30T00:00:00.000Z",
               "value": 2,
               "dimensions": {
                  "region": "Coimbra District"
               }
            }
         }
      ]
   }
}
但是,这会返回一个您不想要的结果。解决此问题的一种方法如下:

POST /test_index/doc/_search
{
   "query": {
      "query_string": {
         "query": "(Masovian AND Voivodeship) OR (Federal AND District)",
         "fields": [
            "dimensions.region"
         ]
      }
   }
}
...
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.46911472,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "3",
            "_score": 0.46911472,
            "_source": {
               "date": "2013-12-30T00:00:00.000Z",
               "value": 1,
               "dimensions": {
                  "region": "Masovian Voivodeship"
               }
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.3533006,
            "_source": {
               "date": "2013-12-30T00:00:00.000Z",
               "value": 1,
               "dimensions": {
                  "region": "Federal District"
               }
            }
         }
      ]
   }
}
另一种方法(我更喜欢这一种)可以给出相同的结果,即使用和的组合:

以下是我使用的代码:


尝试将默认\u运算符设置为和。或者让您的查询“Masovian AND Voivodeship Or Federal AND District”嗨,我尝试了查询
{“query_string”:{“query”:“Masovian AND Voivodeship Or Federal AND District”,“fields”:[“dimensions.region”]}
,但没有返回任何点击。您好@Sloan,首先,非常感谢您的详细回答。我尝试了你的第三个解决方案(因为我也认为这是一个更好的方法),效果非常好!我唯一缺少的是
操作员
。我没有指定
运算符
,因此,它在生成查询时使用默认运算符。默认值是或,因此,它是使用或搜索bu令牌,这就是为什么我得到3个结果(甚至在第一次尝试时,通过运行相同的查询也得到了3个结果)。我已经从查询中删除了“ga”部分,因为它是一个打字错误。再一次,为解决方案欢呼:)从某种意义上讲,这是一个很好的例子!
POST /test_index/doc/_search
{
   "query": {
      "bool": {
         "should": [
            {
               "match": {
                  "dimensions.region": {
                     "query": "Masovian Voivodeship",
                     "operator": "and"
                  }
               }
            },
            {
               "match": {
                  "dimensions.region": {
                     "query": "Federal District",
                     "operator": "and"
                  }
               }
            }
         ]
      }
   }
}