MongoDb:为复杂数据聚合$avg
我试图在Mongo聚合中获得平均评级,但访问嵌套数组时遇到问题。我已经得到了我的聚合,以得到以下数组。我试图让city_评论返回一系列平均值MongoDb:为复杂数据聚合$avg,mongodb,aggregation-framework,Mongodb,Aggregation Framework,我试图在Mongo聚合中获得平均评级,但访问嵌套数组时遇到问题。我已经得到了我的聚合,以得到以下数组。我试图让city_评论返回一系列平均值 [ { "_id": "Dallas", "city_reviews": [ //arrays of restaurant objects that include the rating //I would like to get an average of the rating in ea
[
{
"_id": "Dallas",
"city_reviews": [
//arrays of restaurant objects that include the rating
//I would like to get an average of the rating in each review, so these arrays will be numbers (averages)
[ {
"_id": "5b7ead6d106f0553d8807276",
"created": "2018-08-23T12:41:29.791Z",
"text": "Crackin good place. ",
"rating": 4,
"store": "5b7d67d5356114089909e58d",
"author": "5b7d675e356114089909e58b",
"__v": 0
}, {review2}, {review3}]
[{review1}, {review2}, {review3}],
[{review1}. {review2}],
[{review1}, {review2}, {review3}, {review4}],
[]
]
},
{
"_id": "Houston",
"city_reviews": [
// arrays of restaurants
[{review1}, {review2}, {review3}],
[{review1}, {review2}, {review3}],
[{review1}, {review2}, {review3}, {review4}],
[],
[]
]
}
]
我想在此基础上做一个聚合,返回city_评论中的一系列平均值,如下所示:
{
"_id": "Dallas",
"city_reviews": [
// arrays of rating averages
[4.7],
[4.3],
[3.4],
[],
[]
]
}
db.collection.aggregate([{
$group: {
_id: '$city', // group by city
"averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
}
}, {
$lookup: {
from: 'reviews',
localField: 'averageRating',
foreignField: 'store',
as: 'averageRating'
},
}, {
$project: {
"averageRating": {
$avg: {
$map: {
input: "$averageRating",
in: { $avg: "$$this.rating" }
}
}
}
}
}, {
$sort: { averageRating: -1 }
}, {
$limit: 5
}])
这是我试过的。它还给了我null的平均值,因为$city_reviews是一个对象数组,我不会告诉它深入到足够深的程度来获取评级键
return this.aggregate([
{ $lookup: { from: 'reviews', localField: '_id', foreignField: 'store', as:
'reviews' }},
{$group: {_id: '$city', city_reviews: { $push : '$reviews'}}},
{ $project: {
averageRating: { $avg: '$city_reviews'}
}}
])
有没有办法使用这一行,这样我就可以返回平均值数组而不是完整的review对象
averageRating: { $avg: '$city_reviews'}
编辑:要求对整个管道进行编辑
return this.aggregate([
{ $lookup: { from: 'reviews', localField: '_id', foreignField: 'store', as: 'reviews' }},
{$group: {
_id: '$city',
city_reviews: { $push : '$reviews'}}
},
{ $project: {
photo: '$$ROOT.photo',
name: '$$ROOT.name',
reviews: '$$ROOT.reviews',
slug: '$$ROOT.slug',
city: '$$ROOT.city',
"averageRatingIndex":{
"$map":{
"input":"$city_reviews",
"in":[{"$avg":"$$this.rating"}]
}
},
}
},
{ $sort: { averageRating: -1 }},
{ $limit: 5 }
])
我的第一个问题是将两个模型连接在一起:
{ $lookup: { from: 'reviews', localField: '_id', foreignField: 'store', as: 'reviews' }},
其结果是:
[ {
"_id": "5b7d67d5356114089909e58d",
"location": {},
"tags": [],
"created": "2018-08-22T13:23:23.224Z",
"name": "Lucia",
"description": "Great name",
"city": "Dallas",
"photo": "ab64b3e7-6207-41d8-a670-94315e4b23af.jpeg",
"author": "5b7d675e356114089909e58b",
"slug": "lucia",
"__v": 0,
"reviews": []
},
{..more object like above}
]
然后,我将它们分组如下:
{$group: {
_id: '$city',
city_reviews: { $push : '$reviews'}}
}
这返回了我最初的问题。基本上,我只想得到每个城市的总平均评分。我接受的答案确实回答了我原来的问题。我要回去了:
{
"_id": "Dallas",
"averageRatingIndex": [
[ 4.2 ],
[ 3.6666666666666665 ],
[ null ],
[ 3.2 ],
[ 5 ],
[ null ]
]
}
我尝试使用$avg操作符返回一个,我可以显示每个城市的最终平均值,但我遇到了麻烦 您可以使用
$map
与$avg
一起输出avg
{"$project":{
"averageRating":{
"$map":{
"input":"$city_reviews",
"in":[{"$avg":"$$this.rating"}]
}
}
}}
您可以使用
$map
与$avg
一起输出avg
{"$project":{
"averageRating":{
"$map":{
"input":"$city_reviews",
"in":[{"$avg":"$$this.rating"}]
}
}
}}
关于您的优化请求,我认为除了您已有的版本之外,没有太多的改进空间。但是,由于初始的
$group
阶段会导致$lookup
的减少,因此以下管道可能比您当前的解决方案更快。我不确定MongoDB将如何在内部优化所有这些,因此您可能希望根据实际数据集分析这两个版本
db.getCollection('something').aggregate([{
$group: {
_id: '$city', // group by city
"averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
}
}, {
$lookup: {
from: 'reviews',
let: { "averageRating": "$averageRating" }, // create a variable called "$$ids" which will hold the previously created array of "_id"s
pipeline: [{
$match: { $expr: { $in: [ "$store", "$$averageRating" ] } } // do the usual "joining"
}, {
$group: {
"_id": null, // group all found items into the same single bucket
"rating": { $avg: "$rating" }, // calculate the avg on a per "store" basis
}
}],
as: 'averageRating'
}
}, {
$sort: { "averageRating.rating": -1 }
}, {
$limit: 5
}, {
$addFields: { // beautification of the output only, technically not needed - we do this as the last stage in order to only do it for the max. of 5 documents that we're interested in
"averageRating": { // this is where we reuse the field we created in the first stage
$arrayElemAt: [ "$averageRating.rating", 0 ] // pull the first element inside the array outside of the array
}
}
}])
事实上,“初始$group
阶段”方法也可以与@Veerams解决方案结合使用,如下所示:
{
"_id": "Dallas",
"city_reviews": [
// arrays of rating averages
[4.7],
[4.3],
[3.4],
[],
[]
]
}
db.collection.aggregate([{
$group: {
_id: '$city', // group by city
"averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
}
}, {
$lookup: {
from: 'reviews',
localField: 'averageRating',
foreignField: 'store',
as: 'averageRating'
},
}, {
$project: {
"averageRating": {
$avg: {
$map: {
input: "$averageRating",
in: { $avg: "$$this.rating" }
}
}
}
}
}, {
$sort: { averageRating: -1 }
}, {
$limit: 5
}])
关于您的优化请求,我认为除了您已有的版本之外,没有太多的改进空间。但是,由于初始的
$group
阶段会导致$lookup
的减少,因此以下管道可能比您当前的解决方案更快。我不确定MongoDB将如何在内部优化所有这些,因此您可能希望根据实际数据集分析这两个版本
db.getCollection('something').aggregate([{
$group: {
_id: '$city', // group by city
"averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
}
}, {
$lookup: {
from: 'reviews',
let: { "averageRating": "$averageRating" }, // create a variable called "$$ids" which will hold the previously created array of "_id"s
pipeline: [{
$match: { $expr: { $in: [ "$store", "$$averageRating" ] } } // do the usual "joining"
}, {
$group: {
"_id": null, // group all found items into the same single bucket
"rating": { $avg: "$rating" }, // calculate the avg on a per "store" basis
}
}],
as: 'averageRating'
}
}, {
$sort: { "averageRating.rating": -1 }
}, {
$limit: 5
}, {
$addFields: { // beautification of the output only, technically not needed - we do this as the last stage in order to only do it for the max. of 5 documents that we're interested in
"averageRating": { // this is where we reuse the field we created in the first stage
$arrayElemAt: [ "$averageRating.rating", 0 ] // pull the first element inside the array outside of the array
}
}
}])
事实上,“初始$group
阶段”方法也可以与@Veerams解决方案结合使用,如下所示:
{
"_id": "Dallas",
"city_reviews": [
// arrays of rating averages
[4.7],
[4.3],
[3.4],
[],
[]
]
}
db.collection.aggregate([{
$group: {
_id: '$city', // group by city
"averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
}
}, {
$lookup: {
from: 'reviews',
localField: 'averageRating',
foreignField: 'store',
as: 'averageRating'
},
}, {
$project: {
"averageRating": {
$avg: {
$map: {
input: "$averageRating",
in: { $avg: "$$this.rating" }
}
}
}
}
}, {
$sort: { averageRating: -1 }
}, {
$limit: 5
}])
如果您提供原始源数据,我们可能会提供整个管道的优化版本。使用整个管道和结果数据进行编辑。我希望这能说明我做了什么。我非常感谢您对如何优化提供一些建议或帮助。如果您提供原始源数据,我们可能会为您的整个管道提供一个优化版本。使用整个管道和结果数据进行编辑。我希望这能说明我做了什么。我真的很感激一些关于如何优化的建议或帮助!非常感谢。我必须为将来阅读$map。谢谢你回答我最初的问题。我编辑了这篇文章以反映我正在努力做的事情,因为我仍然有困难。我希望再看一眼!嗯。如果您只需要一个avg,您可以使用
{“$project”:{“averageRatingIndex”:{“$avg”:{“$map”:{“input”:“$city\u reviews”,“in”:{“$avg”:“$$this.rating”}}
。在$group阶段后添加。效果很好。再次感谢!啊哈!非常感谢。我必须为将来阅读$map。谢谢你回答我最初的问题。我编辑了这篇文章以反映我正在努力做的事情,因为我仍然有困难。我希望再看一眼!嗯。如果您只需要一个avg,您可以使用{“$project”:{“averageRatingIndex”:{“$avg”:{“$map”:{“input”:“$city\u reviews”,“in”:{“$avg”:“$$this.rating”}}
。在$group阶段后添加。效果很好。再次感谢!令人惊叹的!谢谢你看。我来比较一下,太棒了!谢谢你看。我来比较一下。