elasticsearch,Node.js,elasticsearch" /> elasticsearch,Node.js,elasticsearch" />

Node.js 有没有一种方法可以使用elasticsearch在每个匹配字段中只返回一次命中? 注意:已更新以包含NodeJS客户端详细信息。请参见下面的编辑。

Node.js 有没有一种方法可以使用elasticsearch在每个匹配字段中只返回一次命中? 注意:已更新以包含NodeJS客户端详细信息。请参见下面的编辑。,node.js,elasticsearch,Node.js,elasticsearch,我试图避免重复查询ElasticSearch以获取所需信息 假设我有一个由城市事件组成的数据集。数据集中的文档可能如下所示: { city: 'Berlin', event: 'Dance party', date: '2017-04-15' }, { city: 'Seattle', event: 'Wine tasting', date: '2017-04-18' }, { city: 'Berlin', event: 'Dan

我试图避免重复查询ElasticSearch以获取所需信息

假设我有一个由城市事件组成的数据集。数据集中的文档可能如下所示:

{
    city: 'Berlin',
    event: 'Dance party',
    date: '2017-04-15'
},
{
    city: 'Seattle',
    event: 'Wine tasting',
    date: '2017-04-18'
},
{
    city: 'Berlin',
    event: 'Dance party,
    date: '2017-04-21'
},
{
    city: 'Hong Kong',
    event: 'Theater',
    date: '2017-04-25'
}...
{
    'query': {
        'match_all': {}
    },
    '_source': ['city', 'event', 'date'],
    'aggs': {
        'cities': {
            'terms': {
                'field': 'city',
                'size': 100
            },
            'aggs': {
                'top_cities': {
                    'top_hits': {
                        'size': 1,
                        '_source': 'event',
                        'sort': {
                            'date': 'desc'
                        }
                    }
                }
            }
        }
    }
}
现在假设所有跟踪城市的列表都是已知的,我只需要从每个城市获取最近的事件。因此,我需要能够在查询中输入一组城市名称,类似于
[‘柏林’、‘香港’、‘西雅图’]
,并且只返回最后三个事件

我当前的查询只能通过以1的大小重复运行,并对城市名称进行精确匹配来实现这一点,如下所示:

{
    size: 1,
    body: {
        sort: [
            {'date': {'order': 'desc'}}
        ],
        query: {
            'match_phrase': {'city': 'Berlin'}
        }
    }
}
有没有一种方法可以编写脚本,这样我就可以将整个城市列表传递到一个查询中,并且可以预期地只获得每个城市的最新条目

编辑

我的新脚本如下所示:

{
    city: 'Berlin',
    event: 'Dance party',
    date: '2017-04-15'
},
{
    city: 'Seattle',
    event: 'Wine tasting',
    date: '2017-04-18'
},
{
    city: 'Berlin',
    event: 'Dance party,
    date: '2017-04-21'
},
{
    city: 'Hong Kong',
    event: 'Theater',
    date: '2017-04-25'
}...
{
    'query': {
        'match_all': {}
    },
    '_source': ['city', 'event', 'date'],
    'aggs': {
        'cities': {
            'terms': {
                'field': 'city',
                'size': 100
            },
            'aggs': {
                'top_cities': {
                    'top_hits': {
                        'size': 1,
                        '_source': 'event',
                        'sort': {
                            'date': 'desc'
                        }
                    }
                }
            }
        }
    }
}
这看起来真的应该管用。但我仍然错过了很多我知道的城市,其中一个出现了很多次

我用elasticsearch js包在节点中运行这个。客户端以这种方式执行:

let client=new elasticSearch.client(
{
“主持人”:[
“host1:9200”,
“host2:9200”,
“host3:9200”
]
}
);
客户端搜索(搜索参数)
.然后(功能(resp){
log(JSON.stringify(resp));
});
以下是生成的JSON的(净化)版本:

{
    "took": 77,
    "timed_out": false,
    "_shards": {
        "total": 42,
        "successful": 42,
        "failed": 0
    },
    "hits": {
        "total": 5685608,
        "max_score": 1,
        "hits": [{
            "_index": "sanitized",
            "_type": "sanitized",
            "_id": "AVu489lVgqYk_9QxQb-U",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-15",
                "city": "Berlin"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized",
            "_id": "AVu489lVgqYk_9QxQb-X",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-15",
                "city": "Berlin"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_1",
            "_id": "AVu489lVgqYk_9QxQb-a",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Berlin"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu489lVgqYk_9QxQb-b",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Berlin"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu489lVgqYk_9QxQb-d",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Hong Kong"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu489lVgqYk_9QxQb-f",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Hong Kong"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu49AkKCe9swQD44WnN",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Seattle"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu49AkKCe9swQD44WnP",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "New York"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_1",
            "_id": "AVu49AkKCe9swQD44WnY",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Berlin"
            }
        }, {
            "_index": "sanitized",
            "_type": "sanitized_variant_2",
            "_id": "AVu49AkKCe9swQD44Wnb",
            "_score": 1,
            "_source": {
                "event": "Dance party",
                "date": "2017-04-29",
                "city": "Berlin"
            }
        }]
    }
}
仔细检查后,由于某种原因,聚合没有添加到
resp
对象中。

您可以使用a,通过所有这些城市,类似于:

"query": {
    "terms": {
      "city": [
        "BERLIN",
        "RIO DE JANEIRO"
      ]
    }
  }, 
  "size": 3,
  "_source": "city",
  "sort": [
    {
      "date": {
        "order": "desc"
      }
    }
  ]
}

除了过滤查询中的城市外,我建议在城市字段上使用
terms
聚合,然后单击
top\u
子聚合以检索每个城市的最新事件:

{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "cities": {
      "terms": {
        "field": "city",
        "size": 100
      },
      "aggs": {
        "top_events": {
          "top_hits": {
            "size": 1,
            "_source": "event",
            "sort": {
              "date": "desc"
            }
          }
        }
      }
    }
  }
}

这看起来很有希望,但返回的文档比我预期的要少得多,而且返回的文档有多个副本。“城市”阵列的大小是否与此有关?大约有70个,我希望每个城市有一个点击率。然后只需删除查询部分。更新了我的答案越来越近,但仍然有一个城市出现多次不同的日期。每个记录上都有我不关心的其他字段。甚至可以排除“事件”字段。我只想知道城市的名字和日期,没有重复的城市。有没有办法明确告诉ElasticSearch,如果已经返回了一个城市名称的点击,就不要返回点击?哦,非常感谢您的回复。由于我们使用的是
术语
聚合,我非常怀疑您是否多次得到同一个城市。除非他们写的不一样,这很奇怪。它看起来真的不应该这样做,但它是,我保证拼写是一样的。我已经用我的脚本的当前版本更新了帖子。