Node.js 有没有一种方法可以使用elasticsearch在每个匹配字段中只返回一次命中? 注意:已更新以包含NodeJS客户端详细信息。请参见下面的编辑。
我试图避免重复查询ElasticSearch以获取所需信息 假设我有一个由城市事件组成的数据集。数据集中的文档可能如下所示:Node.js 有没有一种方法可以使用elasticsearch在每个匹配字段中只返回一次命中? 注意:已更新以包含NodeJS客户端详细信息。请参见下面的编辑。,node.js,
elasticsearch,Node.js,
elasticsearch,我试图避免重复查询ElasticSearch以获取所需信息 假设我有一个由城市事件组成的数据集。数据集中的文档可能如下所示: { city: 'Berlin', event: 'Dance party', date: '2017-04-15' }, { city: 'Seattle', event: 'Wine tasting', date: '2017-04-18' }, { city: 'Berlin', event: 'Dan
{
city: 'Berlin',
event: 'Dance party',
date: '2017-04-15'
},
{
city: 'Seattle',
event: 'Wine tasting',
date: '2017-04-18'
},
{
city: 'Berlin',
event: 'Dance party,
date: '2017-04-21'
},
{
city: 'Hong Kong',
event: 'Theater',
date: '2017-04-25'
}...
{
'query': {
'match_all': {}
},
'_source': ['city', 'event', 'date'],
'aggs': {
'cities': {
'terms': {
'field': 'city',
'size': 100
},
'aggs': {
'top_cities': {
'top_hits': {
'size': 1,
'_source': 'event',
'sort': {
'date': 'desc'
}
}
}
}
}
}
}
现在假设所有跟踪城市的列表都是已知的,我只需要从每个城市获取最近的事件。因此,我需要能够在查询中输入一组城市名称,类似于[‘柏林’、‘香港’、‘西雅图’]
,并且只返回最后三个事件
我当前的查询只能通过以1的大小重复运行,并对城市名称进行精确匹配来实现这一点,如下所示:
{
size: 1,
body: {
sort: [
{'date': {'order': 'desc'}}
],
query: {
'match_phrase': {'city': 'Berlin'}
}
}
}
有没有一种方法可以编写脚本,这样我就可以将整个城市列表传递到一个查询中,并且可以预期地只获得每个城市的最新条目
编辑
我的新脚本如下所示:
{
city: 'Berlin',
event: 'Dance party',
date: '2017-04-15'
},
{
city: 'Seattle',
event: 'Wine tasting',
date: '2017-04-18'
},
{
city: 'Berlin',
event: 'Dance party,
date: '2017-04-21'
},
{
city: 'Hong Kong',
event: 'Theater',
date: '2017-04-25'
}...
{
'query': {
'match_all': {}
},
'_source': ['city', 'event', 'date'],
'aggs': {
'cities': {
'terms': {
'field': 'city',
'size': 100
},
'aggs': {
'top_cities': {
'top_hits': {
'size': 1,
'_source': 'event',
'sort': {
'date': 'desc'
}
}
}
}
}
}
}
这看起来真的应该管用。但我仍然错过了很多我知道的城市,其中一个出现了很多次
我用elasticsearch js包在节点中运行这个。客户端以这种方式执行:
let client=new elasticSearch.client(
{
“主持人”:[
“host1:9200”,
“host2:9200”,
“host3:9200”
]
}
);
客户端搜索(搜索参数)
.然后(功能(resp){
log(JSON.stringify(resp));
});
以下是生成的JSON的(净化)版本:
{
"took": 77,
"timed_out": false,
"_shards": {
"total": 42,
"successful": 42,
"failed": 0
},
"hits": {
"total": 5685608,
"max_score": 1,
"hits": [{
"_index": "sanitized",
"_type": "sanitized",
"_id": "AVu489lVgqYk_9QxQb-U",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-15",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized",
"_id": "AVu489lVgqYk_9QxQb-X",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-15",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_1",
"_id": "AVu489lVgqYk_9QxQb-a",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu489lVgqYk_9QxQb-b",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu489lVgqYk_9QxQb-d",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Hong Kong"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu489lVgqYk_9QxQb-f",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Hong Kong"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu49AkKCe9swQD44WnN",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Seattle"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu49AkKCe9swQD44WnP",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "New York"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_1",
"_id": "AVu49AkKCe9swQD44WnY",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}, {
"_index": "sanitized",
"_type": "sanitized_variant_2",
"_id": "AVu49AkKCe9swQD44Wnb",
"_score": 1,
"_source": {
"event": "Dance party",
"date": "2017-04-29",
"city": "Berlin"
}
}]
}
}
仔细检查后,由于某种原因,聚合没有添加到resp
对象中。您可以使用a,通过所有这些城市,类似于:
"query": {
"terms": {
"city": [
"BERLIN",
"RIO DE JANEIRO"
]
}
},
"size": 3,
"_source": "city",
"sort": [
{
"date": {
"order": "desc"
}
}
]
}
除了过滤查询中的城市外,我建议在城市字段上使用
terms
聚合,然后单击top\u
子聚合以检索每个城市的最新事件:
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"cities": {
"terms": {
"field": "city",
"size": 100
},
"aggs": {
"top_events": {
"top_hits": {
"size": 1,
"_source": "event",
"sort": {
"date": "desc"
}
}
}
}
}
}
}
这看起来很有希望,但返回的文档比我预期的要少得多,而且返回的文档有多个副本。“城市”阵列的大小是否与此有关?大约有70个,我希望每个城市有一个点击率。然后只需删除查询部分。更新了我的答案越来越近,但仍然有一个城市出现多次不同的日期。每个记录上都有我不关心的其他字段。甚至可以排除“事件”字段。我只想知道城市的名字和日期,没有重复的城市。有没有办法明确告诉ElasticSearch,如果已经返回了一个城市名称的点击,就不要返回点击?哦,非常感谢您的回复。由于我们使用的是
术语
聚合,我非常怀疑您是否多次得到同一个城市。除非他们写的不一样,这很奇怪。它看起来真的不应该这样做,但它是,我保证拼写是一样的。我已经用我的脚本的当前版本更新了帖子。