Java 弹性搜索嵌套排序
我试图在Elasticsearch中进行嵌套排序,但到目前为止没有成功 我的数据结构:Java 弹性搜索嵌套排序,java,sorting,
elasticsearch,nested,Java,Sorting,
elasticsearch,Nested,我试图在Elasticsearch中进行嵌套排序,但到目前为止没有成功 我的数据结构: { "_id" : 1, "authorList" : [ {"lastName":"hawking", "firstName":"stephan"}, {"lastName":"frey", "firstName":"richard"} ] } { "_id" : 2, "authorList" : [ {"lastName":"roger", "firstName":"christina"}
{ "_id" : 1,
"authorList" : [
{"lastName":"hawking", "firstName":"stephan"},
{"lastName":"frey", "firstName":"richard"}
]
}
{ "_id" : 2,
"authorList" : [
{"lastName":"roger", "firstName":"christina"},
{"lastName":"freud", "firstName":"damian"}
]
}
我想根据文件中第一作者的姓氏对文件进行排序
使用的映射:
"authorList" : { "type" : "nested", "properties" : {"lastName":{"type":"keyword"}}}
使用SearchRequestBuilder(JAVA)进行排序:
这是可行的,但不会给出想要的结果(例如,先是“霍金”,然后是“罗杰”)
我错过什么了吗?是否有方法指示Elasticsearch访问数组authorList的index=0?是否有映射/规范化程序来单独索引数组的第一个条目?嵌套文档不会保存为简单数组或列表。它们由Elasticsearch内部管理: Elasticsearch基本上仍然是扁平的,但它管理嵌套的 内部关系,以提供嵌套层次结构的外观。什么时候 创建嵌套文档时,Elasticsearch实际上索引了两个 分离文档(根对象和嵌套对象),然后关联 内部两个。(更多) 我认为您需要为elasticsearch提供一些额外的信息,这将是一个指标,表明哪个作者是“主要/第一”作者。在嵌套对象中仅将此附加字段放置给一位作者就足够了(您的映射可以保持与以前一样),如下所示:
{
"authorList" : [
{"lastName":"roger", "firstName":"christina", "authorOrder": 1},
{"lastName":"freud", "firstName":"damian"}
]
},
{
"authorList" : [
{"lastName":"hawking", "firstName":"stephan", "authorOrder": 1},
{"lastName":"adams", "firstName": "mark" }
{"lastName":"frey", "firstName":"richard"}
]
},
{
"authorList" : [
{"lastName":"adams", "firstName":"monica", "authorOrder": 1},
{"lastName":"adams", "firstName":"richard"}
]
}
那么查询可以是:
{
"query" : {
"nested" : {
"query" : {
"bool" : {
"must" : [
{
"match" : {
"authorList.authorOrder" : 1
}
}
]
}
},
"path" : "authorList"
}
},
"sort" : [
{
"authorList.lastName" : {
"order" : "asc",
"nested_filter" : {
"bool" : {
"must" : [
{
"match" : {
"authorList.authorOrder" : 1
}
}
]
}
},
"nested_path" : "authorList"
}
}
]
}
使用Java API:
QueryBuilder matchFirst = QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("authorList.authorOrder", 1));
QueryBuilder mainQuery = QueryBuilders.nestedQuery("authorList", matchFirst, ScoreMode.None);
SortBuilder sb = SortBuilders.fieldSort("authorList.lastName")
.order(SortOrder.ASC)
.setNestedPath("authorList")
.setNestedFilter(matchFirst);
SearchRequestBuilder builder = client.prepareSearch("test")
.setSize(50)
.setQuery(mainQuery)
.addSort(sb);
请注意,SortBuilder
具有.setNestedFilter(matchAll)
,这意味着排序基于authorList.lastName
字段,但仅基于“主/第一”嵌套元素。如果没有它,elasticsearch将首先对所有嵌套文档进行排序,从升序排序列表中选择第一个元素,并在此基础上对父文档进行排序。所以带“霍金”的文件可能是第一个,因为它有“亚当斯”的姓
最终结果是:
"authorList" : [
{"lastName":"adams", "firstName":"monica", "authorOrder": 1},
{"lastName":"adams", "firstName":"richard"}
],
}
"authorList" : [
{"lastName":"hawking", "firstName":"stephan", "authorOrder": 1},
{"lastName":"adams", "firstName":"mark"},
{"lastName":"frey", "firstName":"richard"}
]
},
{
"authorList" : [
{"lastName":"roger", "firstName":"christina", "authorOrder": 1},
{"lastName":"freud", "firstName":"damian"}
]
}
好的,那就解决问题了。但是如果我必须引入一个新字段,那么仅仅创建一个字段“firstAuthorLastName”而不是复制第一个数组索引的值不是更容易吗?这也将简化查询/排序部分。是的,如果您可以用这种方式重新排列模型,那么查询数据肯定会更容易。如果文档可以具有例如
id
,firstAuthorLastName
和其他作者的嵌套列表,那么在顶级字段firstAuthorLastName
上排序(而不是嵌套)也会更快。
"authorList" : [
{"lastName":"adams", "firstName":"monica", "authorOrder": 1},
{"lastName":"adams", "firstName":"richard"}
],
}
"authorList" : [
{"lastName":"hawking", "firstName":"stephan", "authorOrder": 1},
{"lastName":"adams", "firstName":"mark"},
{"lastName":"frey", "firstName":"richard"}
]
},
{
"authorList" : [
{"lastName":"roger", "firstName":"christina", "authorOrder": 1},
{"lastName":"freud", "firstName":"damian"}
]
}