MongoDb:为复杂数据聚合$avg

MongoDb:为复杂数据聚合$avg,mongodb,aggregation-framework,Mongodb,Aggregation Framework,我试图在Mongo聚合中获得平均评级,但访问嵌套数组时遇到问题。我已经得到了我的聚合,以得到以下数组。我试图让city_评论返回一系列平均值 [ { "_id": "Dallas", "city_reviews": [ //arrays of restaurant objects that include the rating //I would like to get an average of the rating in ea

我试图在Mongo聚合中获得平均评级,但访问嵌套数组时遇到问题。我已经得到了我的聚合,以得到以下数组。我试图让city_评论返回一系列平均值

[ 
  {
    "_id": "Dallas",
    "city_reviews": [
           //arrays of restaurant objects that include the rating
           //I would like to get an average of the rating in each review, so these arrays will be numbers (averages)
           [ {
              "_id": "5b7ead6d106f0553d8807276",
              "created": "2018-08-23T12:41:29.791Z",
              "text": "Crackin good place. ",
              "rating": 4,
              "store": "5b7d67d5356114089909e58d",
              "author": "5b7d675e356114089909e58b",
              "__v": 0
              }, {review2}, {review3}]
           [{review1}, {review2}, {review3}],
           [{review1}. {review2}],
           [{review1}, {review2}, {review3}, {review4}],
           []
      ]

   },
  {
    "_id": "Houston",
    "city_reviews": [
           // arrays of restaurants 
           [{review1}, {review2}, {review3}],
           [{review1}, {review2}, {review3}],
           [{review1}, {review2}, {review3}, {review4}],
           [],
           []
      ]
  }

]
我想在此基础上做一个聚合,返回city_评论中的一系列平均值,如下所示:

{
    "_id": "Dallas",
    "city_reviews": [
           // arrays of rating averages
           [4.7],
           [4.3],
           [3.4],
           [],
           []
      ]
  }
db.collection.aggregate([{
    $group: {
        _id: '$city', // group by city
        "averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
    }
}, {
    $lookup: {
        from: 'reviews',
        localField: 'averageRating',
        foreignField: 'store',
        as: 'averageRating'
    },
}, {
    $project: {
        "averageRating": {
            $avg: {
                $map: {
                    input: "$averageRating",
                    in: { $avg: "$$this.rating" }
                }
            }
        }
    }
}, {
    $sort: { averageRating: -1 }
}, {
    $limit: 5
}])
这是我试过的。它还给了我null的平均值,因为$city_reviews是一个对象数组,我不会告诉它深入到足够深的程度来获取评级键

return this.aggregate([
    { $lookup: { from: 'reviews', localField: '_id', foreignField: 'store', as: 
      'reviews' }},
    {$group: {_id: '$city', city_reviews: { $push : '$reviews'}}},
    { $project: {
       averageRating: { $avg: '$city_reviews'}
    }}
 ])
有没有办法使用这一行,这样我就可以返回平均值数组而不是完整的review对象

averageRating: { $avg: '$city_reviews'}
编辑:要求对整个管道进行编辑

return this.aggregate([
    { $lookup: { from: 'reviews', localField: '_id', foreignField: 'store', as: 'reviews' }},
    {$group: {
        _id: '$city', 
        city_reviews: { $push : '$reviews'}}
    },
    { $project: {
        photo: '$$ROOT.photo',
        name: '$$ROOT.name',
        reviews: '$$ROOT.reviews',
        slug: '$$ROOT.slug',
        city: '$$ROOT.city',
        "averageRatingIndex":{
            "$map":{
            "input":"$city_reviews",
            "in":[{"$avg":"$$this.rating"}]
            }
        },
     }
    },
    { $sort: { averageRating: -1 }},
    { $limit: 5 }
])
我的第一个问题是将两个模型连接在一起:

{ $lookup: { from: 'reviews', localField: '_id', foreignField: 'store', as: 'reviews' }},
其结果是:

[ {
    "_id": "5b7d67d5356114089909e58d",
    "location": {},
    "tags": [],
    "created": "2018-08-22T13:23:23.224Z",
    "name": "Lucia",
    "description": "Great name",
    "city": "Dallas",
    "photo": "ab64b3e7-6207-41d8-a670-94315e4b23af.jpeg",
    "author": "5b7d675e356114089909e58b",
    "slug": "lucia",
    "__v": 0,
    "reviews": []
  },
  {..more object like above}
]
然后,我将它们分组如下:

{$group: {
     _id: '$city', 
     city_reviews: { $push : '$reviews'}}
 }
这返回了我最初的问题。基本上,我只想得到每个城市的总平均评分。我接受的答案确实回答了我原来的问题。我要回去了:

{
  "_id": "Dallas",
  "averageRatingIndex": [
     [ 4.2 ],
     [ 3.6666666666666665 ],
     [ null ],
     [ 3.2 ],
     [ 5 ],
     [ null ]
   ]
}

我尝试使用$avg操作符返回一个,我可以显示每个城市的最终平均值,但我遇到了麻烦

您可以使用
$map
$avg
一起输出avg

{"$project":{
  "averageRating":{
     "$map":{
      "input":"$city_reviews",
      "in":[{"$avg":"$$this.rating"}]
    }
  }
}}

您可以使用
$map
$avg
一起输出avg

{"$project":{
  "averageRating":{
     "$map":{
      "input":"$city_reviews",
      "in":[{"$avg":"$$this.rating"}]
    }
  }
}}

关于您的优化请求,我认为除了您已有的版本之外,没有太多的改进空间。但是,由于初始的
$group
阶段会导致
$lookup
的减少,因此以下管道可能比您当前的解决方案更快。我不确定MongoDB将如何在内部优化所有这些,因此您可能希望根据实际数据集分析这两个版本

db.getCollection('something').aggregate([{
    $group: {
        _id: '$city', // group by city
        "averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
    }
}, {
    $lookup: {
        from: 'reviews',
        let: { "averageRating": "$averageRating" }, // create a variable called "$$ids" which will hold the previously created array of "_id"s
        pipeline: [{
            $match: { $expr: { $in: [ "$store", "$$averageRating" ] } } // do the usual "joining"
        }, {
            $group: {
                "_id": null, // group all found items into the same single bucket
                "rating": { $avg: "$rating" }, // calculate the avg on a per "store" basis
            }
        }],
        as: 'averageRating' 
    }
}, {
    $sort: { "averageRating.rating": -1 }
}, {
    $limit: 5
}, { 
    $addFields: { // beautification of the output only, technically not needed - we do this as the last stage in order to only do it for the max. of 5 documents that we're interested in
        "averageRating": { // this is where we reuse the field we created in the first stage
            $arrayElemAt: [ "$averageRating.rating", 0 ] // pull the first element inside the array outside of the array
        }
    }
}])
事实上,“初始
$group
阶段”方法也可以与@Veerams解决方案结合使用,如下所示:

{
    "_id": "Dallas",
    "city_reviews": [
           // arrays of rating averages
           [4.7],
           [4.3],
           [3.4],
           [],
           []
      ]
  }
db.collection.aggregate([{
    $group: {
        _id: '$city', // group by city
        "averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
    }
}, {
    $lookup: {
        from: 'reviews',
        localField: 'averageRating',
        foreignField: 'store',
        as: 'averageRating'
    },
}, {
    $project: {
        "averageRating": {
            $avg: {
                $map: {
                    input: "$averageRating",
                    in: { $avg: "$$this.rating" }
                }
            }
        }
    }
}, {
    $sort: { averageRating: -1 }
}, {
    $limit: 5
}])

关于您的优化请求,我认为除了您已有的版本之外,没有太多的改进空间。但是,由于初始的
$group
阶段会导致
$lookup
的减少,因此以下管道可能比您当前的解决方案更快。我不确定MongoDB将如何在内部优化所有这些,因此您可能希望根据实际数据集分析这两个版本

db.getCollection('something').aggregate([{
    $group: {
        _id: '$city', // group by city
        "averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
    }
}, {
    $lookup: {
        from: 'reviews',
        let: { "averageRating": "$averageRating" }, // create a variable called "$$ids" which will hold the previously created array of "_id"s
        pipeline: [{
            $match: { $expr: { $in: [ "$store", "$$averageRating" ] } } // do the usual "joining"
        }, {
            $group: {
                "_id": null, // group all found items into the same single bucket
                "rating": { $avg: "$rating" }, // calculate the avg on a per "store" basis
            }
        }],
        as: 'averageRating' 
    }
}, {
    $sort: { "averageRating.rating": -1 }
}, {
    $limit: 5
}, { 
    $addFields: { // beautification of the output only, technically not needed - we do this as the last stage in order to only do it for the max. of 5 documents that we're interested in
        "averageRating": { // this is where we reuse the field we created in the first stage
            $arrayElemAt: [ "$averageRating.rating", 0 ] // pull the first element inside the array outside of the array
        }
    }
}])
事实上,“初始
$group
阶段”方法也可以与@Veerams解决方案结合使用,如下所示:

{
    "_id": "Dallas",
    "city_reviews": [
           // arrays of rating averages
           [4.7],
           [4.3],
           [3.4],
           [],
           []
      ]
  }
db.collection.aggregate([{
    $group: {
        _id: '$city', // group by city
        "averageRating": { $push: "$_id" } // create array of all encountered "_id"s per "city" bucket - we use the target field name to avoid creation of superfluous fields which would need to be removed from the output later on
    }
}, {
    $lookup: {
        from: 'reviews',
        localField: 'averageRating',
        foreignField: 'store',
        as: 'averageRating'
    },
}, {
    $project: {
        "averageRating": {
            $avg: {
                $map: {
                    input: "$averageRating",
                    in: { $avg: "$$this.rating" }
                }
            }
        }
    }
}, {
    $sort: { averageRating: -1 }
}, {
    $limit: 5
}])

如果您提供原始源数据,我们可能会提供整个管道的优化版本。使用整个管道和结果数据进行编辑。我希望这能说明我做了什么。我非常感谢您对如何优化提供一些建议或帮助。如果您提供原始源数据,我们可能会为您的整个管道提供一个优化版本。使用整个管道和结果数据进行编辑。我希望这能说明我做了什么。我真的很感激一些关于如何优化的建议或帮助!非常感谢。我必须为将来阅读$map。谢谢你回答我最初的问题。我编辑了这篇文章以反映我正在努力做的事情,因为我仍然有困难。我希望再看一眼!嗯。如果您只需要一个avg,您可以使用
{“$project”:{“averageRatingIndex”:{“$avg”:{“$map”:{“input”:“$city\u reviews”,“in”:{“$avg”:“$$this.rating”}}
。在$group阶段后添加。效果很好。再次感谢!啊哈!非常感谢。我必须为将来阅读$map。谢谢你回答我最初的问题。我编辑了这篇文章以反映我正在努力做的事情,因为我仍然有困难。我希望再看一眼!嗯。如果您只需要一个avg,您可以使用
{“$project”:{“averageRatingIndex”:{“$avg”:{“$map”:{“input”:“$city\u reviews”,“in”:{“$avg”:“$$this.rating”}}
。在$group阶段后添加。效果很好。再次感谢!令人惊叹的!谢谢你看。我来比较一下,太棒了!谢谢你看。我来比较一下。