node.js mongodb中的总和嵌套数组
我在mongodb中有一个类似这样的模式node.js mongodb中的总和嵌套数组,mongodb,mongodb-query,aggregation-framework,Mongodb,Mongodb Query,Aggregation Framework,我在mongodb中有一个类似这样的模式 first_level:[{ first_item : String, second_level:[{ second_item: String, third_level:[{ third_item :String, forth_level :[{//4th level price : Num
first_level:[{
first_item : String,
second_level:[{
second_item: String,
third_level:[{
third_item :String,
forth_level :[{//4th level
price : Number, // 5th level
sales_date : Date,
quantity_sold : Number
}]
}]
}]
}]
1) 。我想在中添加基于匹配条件的销售数量
第一项、第二项、第三项和销售日期
2) 。我还想找到一个特定日期内所有销售数量的平均值
3) 。我还想找到一个特定日期所有销售数量的平均值,以及相应的价格
我一直很困惑,我如何才能做到这一点,我来自sql
背景所以这是相当令人困惑的让我们从一个基本的免责声明开始,因为回答问题的主体已经在这里得到了回答。“记录在案”的Double也适用于Triple或fourbal或任何嵌套级别,因为基本上相同的原则总是 任何答案的另一个要点也是不要嵌套数组,因为正如该答案中所解释的(我已经重复了很多次),无论你“认为”你有什么“嵌套”的理由,实际上都不会给你带来你认为会带来的优势。事实上,“筑巢”实际上只是让生活变得更加困难 嵌套问题 从“关系”模型转换数据结构的主要误解总是被解释为对每个关联模型“添加嵌套数组级别”。您在这里介绍的内容也不例外,因为它看起来非常“规范化”,因此每个子数组都包含与其父数组相关的项 MongoDB是一个基于“文档”的数据库,因此它几乎允许您执行此操作,或者实际上允许您执行任何基本需要的数据结构内容。然而,这并不意味着这种形式的数据很容易使用,或者对于实际目的来说确实是实用的 让我们用一些实际数据填写模式,以演示:
{
"_id": 1,
"first_level": [
{
"first_item": "A",
"second_level": [
{
"second_item": "A",
"third_level": [
{
"third_item": "A",
"forth_level": [
{
"price": 1,
"sales_date": new Date("2018-10-31"),
"quantity": 1
},
{
"price": 1,
"sales_date": new Date("2018-11-01"),
"quantity": 1
},
{
"price": 1,
"sales_date": new Date("2018-11-02"),
"quantity": 1
},
]
},
{
"third_item": "B",
"forth_level": [
{
"price": 1,
"sales_date": new Date("2018-10-31"),
"quantity": 1
},
]
}
]
},
{
"second_item": "A",
"third_level": [
{
"third_item": "B",
"forth_level": [
{
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
},
]
}
]
}
]
},
{
"first_item": "A",
"second_level": [
{
"second_item": "B",
"third_level": [
{
"third_item": "A",
"forth_level": [
{
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
},
]
}
]
}
]
}
]
},
{
"_id": 2,
"first_level": [
{
"first_item": "A",
"second_level": [
{
"second_item": "A",
"third_level": [
{
"third_item": "A",
"forth_level": [
{
"price": 2,
"sales_date": new Date("2018-11-03"),
"quantity": 1
},
{
"price": 1,
"sales_date": new Date("2018-10-31"),
"quantity": 1
},
{
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
}
]
}
]
}
]
}
]
},
{
"_id": 3,
"first_level": [
{
"first_item": "A",
"second_level": [
{
"second_item": "B",
"third_level": [
{
"third_item": "A",
"forth_level": [
{
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
}
]
}
]
}
]
}
]
}
这与问题中的结构“有点”不同,但出于演示目的,它有我们需要查看的内容。文档中主要有一个数组,其中包含带有子数组的项,而子数组中又包含子数组中的项,以此类推。这里的“规范化”当然是通过每个“级别”上的标识符作为“项目类型”或任何您实际拥有的东西来实现的
核心问题是您只需要从这些嵌套数组中获取“部分”数据,而MongoDB实际上只需要返回“文档”,这意味着您需要进行一些操作,以获取那些匹配的“子项”
即使是在“正确”选择与所有这些“子标准”匹配的文档的问题上,也需要广泛使用,以便在每个级别的数组元素上获得条件的正确组合。由于需要这些,您不能直接使用。如果没有这些语句,就不会得到确切的“组合”,只会得到在任何数组元素上条件为真的文档
至于实际“过滤掉数组内容”,那么这实际上是额外区别的一部分:
db.collection.aggregate([
{ "$match": {
"first_level": {
"$elemMatch": {
"first_item": "A",
"second_level": {
"$elemMatch": {
"second_item": "A",
"third_level": {
"$elemMatch": {
"third_item": "A",
"forth_level": {
"$elemMatch": {
"sales_date": {
"$gte": new Date("2018-11-01"),
"$lt": new Date("2018-12-01")
}
}
}
}
}
}
}
}
}
}},
{ "$addFields": {
"first_level": {
"$filter": {
"input": {
"$map": {
"input": "$first_level",
"in": {
"first_item": "$$this.first_item",
"second_level": {
"$filter": {
"input": {
"$map": {
"input": "$$this.second_level",
"in": {
"second_item": "$$this.second_item",
"third_level": {
"$filter": {
"input": {
"$map": {
"input": "$$this.third_level",
"in": {
"third_item": "$$this.third_item",
"forth_level": {
"$filter": {
"input": "$$this.forth_level",
"cond": {
"$and": [
{ "$gte": [ "$$this.sales_date", new Date("2018-11-01") ] },
{ "$lt": [ "$$this.sales_date", new Date("2018-12-01") ] }
]
}
}
}
}
}
},
"cond": {
"$and": [
{ "$eq": [ "$$this.third_item", "A" ] },
{ "$gt": [ { "$size": "$$this.forth_level" }, 0 ] }
]
}
}
}
}
}
},
"cond": {
"$and": [
{ "$eq": [ "$$this.second_item", "A" ] },
{ "$gt": [ { "$size": "$$this.third_level" }, 0 ] }
]
}
}
}
}
}
},
"cond": {
"$and": [
{ "$eq": [ "$$this.first_item", "A" ] },
{ "$gt": [ { "$size": "$$this.second_level" }, 0 ] }
]
}
}
}
}},
{ "$unwind": "$first_level" },
{ "$unwind": "$first_level.second_level" },
{ "$unwind": "$first_level.second_level.third_level" },
{ "$unwind": "$first_level.second_level.third_level.forth_level" },
{ "$group": {
"_id": {
"date": "$first_level.second_level.third_level.forth_level.sales_date",
"price": "$first_level.second_level.third_level.forth_level.price",
},
"quantity_sold": {
"$avg": "$first_level.second_level.third_level.forth_level.quantity"
}
}},
{ "$group": {
"_id": "$_id.date",
"prices": {
"$push": {
"price": "$_id.price",
"quanity_sold": "$quantity_sold"
}
},
"quanity_sold": { "$avg": "$quantity_sold" }
}}
])
这是最好的描述为“混乱”和“涉及”。不仅我们对文档选择的初始查询非常复杂,而且我们对每个数组级别都有后续的查询和处理。如前所述,无论实际有多少层,这都是一种模式
您可以交替地执行and组合,而不是就地过滤阵列,但这确实会在删除不需要的内容之前增加额外的开销,因此在MongoDB的现代版本中,通常最好先从阵列中删除
这里的最后一点是,您希望通过实际位于数组中的元素进行排序,因此在此之前,您最终需要对数组的每个级别进行排序
然后,实际的“分组”通常是直接使用sales_date
和price
属性进行第一次累积,然后将后续阶段添加到不同的价格
值中,您希望在每个日期内将平均值累积为秒累积
注意:日期的实际处理在实际使用中可能会有所不同,具体取决于存储日期的粒度。在此示例中,日期都已四舍五入到每个“天”的开始。如果您确实需要累积实际的“datetime”值,那么您可能真的需要这样或类似的构造:
{ "$group": {
"_id": {
"date": {
"$dateFromParts": {
"year": { "$year": "$first_level.second_level.third_level.forth_level.sales_date" },
"month": { "$month": "$first_level.second_level.third_level.forth_level.sales_date" },
"day": { "$dayOfMonth": "$first_level.second_level.third_level.forth_level.sales_date" }
}
}.
"price": "$first_level.second_level.third_level.forth_level.price"
}
...
}}
使用和其他来提取“日期”信息,并将日期以该形式显示以供累积
开始去规范化
从上面的“混乱”中应该清楚的是,使用嵌套数组并不容易。在MongoDB 3.6之前的版本中,这样的结构通常甚至不可能进行原子更新,即使您从未更新过它们,或者基本上一直在替换整个阵列,它们仍然不容易查询。这就是你正在展示的东西
如果必须在父文档中包含数组内容,通常建议“展平”和“反规范化”此类结构。这可能与关系思维相反,但实际上,出于性能原因,这是处理此类数据的最佳方式:
{
"_id": 1,
"data": [
{
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-10-31"),
"quantity": 1
},
{
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-11-01"),
"quantity": 1
},
{
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-11-02"),
"quantity": 1
},
{
"first_item": "A",
"second_item": "A",
"third_item": "B",
"price": 1,
"sales_date": new Date("2018-10-31"),
"quantity": 1
},
{
"first_item": "A",
"second_item": "A",
"third_item": "B",
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
},
{
"first_item": "A",
"second_item": "B",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
},
]
},
{
"_id": 2,
"data": [
{
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 2,
"sales_date": new Date("2018-11-03"),
"quantity": 1
},
{
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-10-31"),
"quantity": 1
},
{
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
}
]
},
{
"_id": 3,
"data": [
{
"first_item": "A",
"second_item": "B",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
}
]
}
这与最初显示的数据完全相同,但我们实际上没有嵌套,而是将所有内容放入每个父文档中的单个扁平数组中。当然,这意味着不同数据点的重复,但查询复杂性和性能的差异应该是不言而喻的:
db.collection.aggregate([
{ "$match": {
"data": {
"$elemMatch": {
"first_item": "A",
"second_item": "A",
"third_item": "A",
"sales_date": {
"$gte": new Date("2018-11-01"),
"$lt": new Date("2018-12-01")
}
}
}
}},
{ "$addFields": {
"data": {
"$filter": {
"input": "$data",
"cond": {
"$and": [
{ "$eq": [ "$$this.first_item", "A" ] },
{ "$eq": [ "$$this.second_item", "A" ] },
{ "$eq": [ "$$this.third_item", "A" ] },
{ "$gte": [ "$$this.sales_date", new Date("2018-11-01") ] },
{ "$lt": [ "$$this.sales_date", new Date("2018-12-01") ] }
]
}
}
}
}},
{ "$unwind": "$data" },
{ "$group": {
"_id": {
"date": "$data.sales_date",
"price": "$data.price",
},
"quantity_sold": { "$avg": "$data.quantity" }
}},
{ "$group": {
"_id": "$_id.date",
"prices": {
"$push": {
"price": "$_id.price",
"quantity_sold": "$quantity_sold"
}
},
"quantity_sold": { "$avg": "$quantity_sold" }
}}
])
现在,与嵌套这些调用和类似的表达式不同,所有内容都更加清晰、易于阅读,并且处理起来非常简单。还有一个优点是,实际上您甚至可以像查询中使用的那样为数组中元素的键编制索引。这是嵌套模型的一个约束,MongoDB根本不允许这样的操作
{
"_id": 1,
"parent_id": 1,
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-10-31"),
"quantity": 1
},
{
"_id": 2,
"parent_id": 1,
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-11-01"),
"quantity": 1
},
{
"_id": 3,
"parent_id": 1,
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-11-02"),
"quantity": 1
},
{
"_id": 4,
"parent_id": 1,
"first_item": "A",
"second_item": "A",
"third_item": "B",
"price": 1,
"sales_date": new Date("2018-10-31"),
"quantity": 1
},
{
"_id": 5,
"parent_id": 1,
"first_item": "A",
"second_item": "A",
"third_item": "B",
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
},
{
"_id": 6,
"parent_id": 1,
"first_item": "A",
"second_item": "B",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
},
{
"_id": 7,
"parent_id": 2,
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 2,
"sales_date": new Date("2018-11-03"),
"quantity": 1
},
{
"_id": 8,
"parent_id": 2,
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-10-31"),
"quantity": 1
},
{
"_id": 9,
"parent_id": 2,
"first_item": "A",
"second_item": "A",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
},
{
"_id": 10,
"parent_id": 3,
"first_item": "A",
"second_item": "B",
"third_item": "A",
"price": 1,
"sales_date": new Date("2018-11-03"),
"quantity": 1
}
db.collection.aggregate([
{ "$match": {
"first_item": "A",
"second_item": "A",
"third_item": "A",
"sales_date": {
"$gte": new Date("2018-11-01"),
"$lt": new Date("2018-12-01")
}
}},
{ "$group": {
"_id": {
"date": "$sales_date",
"price": "$price"
},
"quantity_sold": { "$avg": "$quantity" }
}},
{ "$group": {
"_id": "$_id.date",
"prices": {
"$push": {
"price": "$_id.price",
"quantity_sold": "$quantity_sold"
}
},
"quantity_sold": { "$avg": "$quantity_sold" }
}}
])
{
"_id" : ISODate("2018-11-01T00:00:00Z"),
"prices" : [
{
"price" : 1,
"quantity_sold" : 1
}
],
"quantity_sold" : 1
}
{
"_id" : ISODate("2018-11-02T00:00:00Z"),
"prices" : [
{
"price" : 1,
"quantity_sold" : 1
}
],
"quantity_sold" : 1
}
{
"_id" : ISODate("2018-11-03T00:00:00Z"),
"prices" : [
{
"price" : 1,
"quantity_sold" : 1
},
{
"price" : 2,
"quantity_sold" : 1
}
],
"quantity_sold" : 1
}