改进MongoDB聚合
我有两个MongoDB集合:一个用于保存某些产品的数据,另一个用于保存某些类别的数据。我想做的是:给定一个类别,通过查看它的子类别,获取所有相关产品。本质上,我试图通过分类树来获取所有相关的产品页面。我有这样的分类文档改进MongoDB聚合,mongodb,aggregation-framework,Mongodb,Aggregation Framework,我有两个MongoDB集合:一个用于保存某些产品的数据,另一个用于保存某些类别的数据。我想做的是:给定一个类别,通过查看它的子类别,获取所有相关产品。本质上,我试图通过分类树来获取所有相关的产品页面。我有这样的分类文档 _id: "cat-id", "url": /cat-1", childs: [ {"position": 1, "childs": [ {"category": "sub-category-1-id", "productPage": ""},
_id: "cat-id",
"url": /cat-1",
childs: [
{"position": 1, "childs": [
{"category": "sub-category-1-id", "productPage": ""},
{"category": "sub-category-2-id", "productPage": ""}
]
},
{"position": 2, "childs": [
{"category": "", "productPage": "product-page-1-id"},
{"category": "", "productPage": "product-page-2-id"}
]
}
],
"links": [
{"position": 0, "url": "/related-category-1-url"},
{"position": 1, "url": "/related-category-2-url"}
],
"productPages":[
{"position": 0, "productPage": "product-page-1-id"},
{"position": 1, "productPage": "product-page-2-id"}
]
["product-1", "product-2", ...]
从每个类别中,我获取productPages
数组,如果它有一些值,那么我将获得直接链接到该类别的页面。接下来,我递归地获取所有相关的类别和子类别,直到子类别是叶子,或者子类别的子类别的productPage字段不为空(这里的问题是:一个类别可以是叶子,但在它的父类别中可以显示其他子类别…我知道,这很奇怪,可能是错误的,但我没有决定这个结构…)
我在MongoDB上完成了Category集合和ProductPages集合之间的聚合。聚合本身是有效的,但是对于大类别(假设类别有50个子类别,每个类别有30个子类别,等等…)查询需要花费太多时间,有时甚至需要几分钟才能最终崩溃。。。这是我现在使用的聚合
db.getCollection('Categories').aggregate([
{$match: { "url": "/cat-1"}},
{$unwind: {path: "$links", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$links.values", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$childs", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$childs.childs", preserveNullAndEmptyArrays: true}},
{$graphLookup: {
from: "ProductPages",
startWith: "$productPages.productPage",
connectFromField: "productPages.productPage",
connectToField: "_id",
as: "rootPages"
}},
{$graphLookup: {
from: "ProductPages",
startWith: "$childs.childs.productPage",
connectFromField: "childs.childs.productPage",
connectToField: "_id",
as: "childPages"
}},
{$graphLookup: {
from: "Categories",
startWith: "$links.values.url",
connectFromField: "links.values.url",
connectToField: "url",
as: "linkCategories"
}},
{$graphLookup: {
from: "Categories",
startWith: "$childs.url",
connectFromField: "childs.url",
connectToField: "url",
as: "childUrlCategories"
}},
{$graphLookup: {
from: "Categories",
startWith: "$childs.childs.category",
connectFromField: "childs.childs.category",
connectToField: "_id",
as: "childCategories"
}},
{$unwind: {path: "$linkCategories", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$childUrlCategories", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$childCategories", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$childCategories.childs", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$childCategories.childs.childs", preserveNullAndEmptyArrays: true}},
{$graphLookup: {
from: "ProductPages",
startWith: "$linkCategories.productPages.productPage",
connectFromField: "linkCategories.productPages.productPage",
connectToField: "_id",
as: "linkPages"
}},
{$graphLookup: {
from: "ProductPages",
startWith: "$childUrlCategories.productPages.productPage",
connectFromField: "childUrlCategories.productPages.productPage",
connectToField: "_id",
as: "childUrlPages"
}},
{$graphLookup: {
from: "ProductPages",
startWith: "$childCategories.childs.childs.productPage",
connectFromField: "childCategories.childs.childs.productPage",
connectToField: "_id",
as: "childCategoryPages"
}},
{$unwind: {path: "$rootPages", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$linkPages", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$childUrlPages", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$childPages", preserveNullAndEmptyArrays: true}},
{$unwind: {path: "$childCategoryPages", preserveNullAndEmptyArrays: true}},
{$group: {_id: "",
rootPages: {$addToSet: "$rootPages"},
linkPages: {$addToSet: "$linkPages"},
childUrlPages: {$addToSet: "$childUrlPages"},
childPages: {$addToSet: "$childPages"},
childCategoryPages: {$addToSet: "$childCategoryPages"}
}},
{$project: {_id: 0,
rootPages: {_id: 1, product: 1},
linkPages: {_id: 1, product: 1},
childUrlPages: {_id: 1, product: 1},
childPages: {_id: 1, product: 1},
childCategoryPages: {_id: 1, product: 1}
}},
{$addFields: {
"childCategoryPages": {
$map: {
"input": "$childCategoryPages",
"as": "el",
"in": "$$el.product"
}
}
}},
{$addFields: {
"childPages": {
$map: {
"input": "$childPages",
"as": "el",
"in": "$$el.product"
}
}
}},
{$addFields: {
"childUrlPages": {
$map: {
"input": "$childUrlPages",
"as": "el",
"in": "$$el.product"
}
}
}},
{$addFields: {
"linkPages": {
$map: {
"input": "$linkPages",
"as": "el",
"in": "$$el.product"
}
}
}},
{$addFields: {
"rootPages": {
$map: {
"input": "$rootPages",
"as": "el",
"in": "$$el.product"
}
}
}},
{$project: {products: {$concatArrays: ["$rootPages", "$linkPages", "$childUrlPages", "$childPages", "$childCategoryPages"]}}},
{$unwind: {path: "$products", preserveNullAndEmptyArrays: true}},
{$group: {
_id: "",
products: {$addToSet: "$products"}
}},
{$project: {_id: 0, products: 1}},
]);
正如我所说,对于小类别,这是可行的,但是对于大类别,这是非常缓慢的(productPages、links、url和childs字段已经是索引了,如果你想知道的话)。那么,如何改进此查询,使其也适用于大型类别
编辑:这是一个示例ProductPage文档(在聚合中,我从中获取product字段)
聚合结果是从检索到的页面中获取的产品数组,如下所示
_id: "cat-id",
"url": /cat-1",
childs: [
{"position": 1, "childs": [
{"category": "sub-category-1-id", "productPage": ""},
{"category": "sub-category-2-id", "productPage": ""}
]
},
{"position": 2, "childs": [
{"category": "", "productPage": "product-page-1-id"},
{"category": "", "productPage": "product-page-2-id"}
]
}
],
"links": [
{"position": 0, "url": "/related-category-1-url"},
{"position": 1, "url": "/related-category-2-url"}
],
"productPages":[
{"position": 0, "productPage": "product-page-1-id"},
{"position": 1, "productPage": "product-page-2-id"}
]
["product-1", "product-2", ...]
确保所有
connectToField:field\u name
都在其集合中编入了索引(除了_id,默认情况下它是aleady索引的)。另外,在这里发布ProductPages
sample来查看,也发布所需的结果。我认为,$facet
是更优雅的解决方案,我已经在问题中添加了产品页面和结果。我也在考虑将聚合拆分到多个管道中,但我担心在执行递归部分时仍然会很慢(因为如果我从管道中排除childs
相关部分,即使是大型类别,一切都很快),请在MongoPlayground中共享“url”:“/cat-1”的完整数据(带子类别的真实数据等)如图所示,我们可以看到如何提高性能。由于links.values
不存在,并且不清楚如果预期结果是纯数组,为什么需要使用$graphLookup
。我将尝试将其添加到mongoplayground,但同时我提出了这个解决方案,它似乎也适用于大类别