当有组时,MongoDB聚合框架非常慢
我正在尝试使用当有组时,MongoDB聚合框架非常慢,mongodb,performance,aggregation-framework,Mongodb,Performance,Aggregation Framework,我正在尝试使用“group”进行聚合查询,以获得结果的总数 “请求的项目”(我的结果)的总数为+-1.900.000 如果使用“group”执行,查询速度非常慢(+-300秒) 如果我在没有“组”的情况下执行,查询速度非常快(+-1秒) 我做错了什么 下面是示例代码。 慢速查询 db.minute.aggregate([ { $match: { $and: [ { "status": "Homologado" }, {
“group”
进行聚合查询,以获得结果的总数
“请求的项目”(我的结果)的总数为+-1.900.000
如果使用“group”执行,查询速度非常慢(+-300秒)
如果我在没有“组”的情况下执行,查询速度非常快(+-1秒)
我做错了什么
下面是示例代码。
慢速查询
db.minute.aggregate([
{ $match: {
$and: [
{ "status": "Homologado" },
{ "requested_items.status": /aceito/i },
]
} },
{ $sort: {'_id': 1}},
{ $unwind: "$requested_items" },
{ $unwind: "$requested_items.winner" },
{ $match: {
$and: [
{ "status": "Homologado" },
{ "requested_items.status": /aceito/i },
]
} },
{ $project: {
"_id": 1
} },
{ $group: {
"_id" : null,
"total" : {$sum: 1},
} },
], {allowDiskUse: true});
db.minute.aggregate([
{ $match: {
$and: [
{ "status": "Homologado" },
{ "requested_items.status": /aceito/i },
]
} },
{ $sort: {'_id': 1}},
{ $unwind: "$requested_items" },
{ $unwind: "$requested_items.winner" },
{ $match: {
$and: [
{ "status": "Homologado" },
{ "requested_items.status": /aceito/i },
]
} },
{ $project: {
"_id": 1
} },
], {allowDiskUse: true});
快速查询
db.minute.aggregate([
{ $match: {
$and: [
{ "status": "Homologado" },
{ "requested_items.status": /aceito/i },
]
} },
{ $sort: {'_id': 1}},
{ $unwind: "$requested_items" },
{ $unwind: "$requested_items.winner" },
{ $match: {
$and: [
{ "status": "Homologado" },
{ "requested_items.status": /aceito/i },
]
} },
{ $project: {
"_id": 1
} },
{ $group: {
"_id" : null,
"total" : {$sum: 1},
} },
], {allowDiskUse: true});
db.minute.aggregate([
{ $match: {
$and: [
{ "status": "Homologado" },
{ "requested_items.status": /aceito/i },
]
} },
{ $sort: {'_id': 1}},
{ $unwind: "$requested_items" },
{ $unwind: "$requested_items.winner" },
{ $match: {
$and: [
{ "status": "Homologado" },
{ "requested_items.status": /aceito/i },
]
} },
{ $project: {
"_id": 1
} },
], {allowDiskUse: true});
DB结构
{
"_id" : "12345678ABCD",
"field_1" : [
{
"a" : null,
"b" : "ABC"
},
{
"code" : null,
"b" : "ABCD"
}
],
"status" : "Homologado",
"initial_date" : ISODate("2016-05-24T11:31:00.000Z"),
"field_2" : [
{
"a" : "ABC",
"b" : "ABCDE"
},
{
"a" : "ABCF",
"b" : "ABCDEF"
}
],
"field_3" : "Lorem ipsum dolor sit amet...",
"field_4" : [
{
"date" : ISODate("2016-05-24T13:54:48.000Z"),
"a" : "Text",
"b" : "More text..."
}
],
"field_4" : 12312321,
"field_5" : ISODate("2016-05-24T13:55:00.000Z"),
"field_6" : "ABCD",
"requested_items" : [
{
"status" : " Aceito e Habilitado",
"field_a" : "Text...",
"winner" : [
{
"a" : "23213.213213.23/232-23",
"b" : 130446,
"c" : 543223,
"d" : NumberLong(2),
"e" : "ABC 123 FULANO",
"f" : "text",
"g" : {
"description" : "TEXT TEXT TEXT"
}
},
{
"a" : "23213.213213.23/232-23",
"b" : 130446,
"c" : 543223,
"d" : NumberLong(2),
"e" : "ABC 123 FULANO",
"f" : "text",
"g" : {
"description" : "TEXT TEXT TEXT"
}
}
],
"field_c" : {
"_id" : ObjectId("5744dd3271af88052f0cc343"),
"a" : "TEXT",
"b" : "TEXT"
},
"field_d" : NumberLong(2),
"field_e" : 5223,
"field_f" : "Não",
"field_g" : "-",
"field_h" : {
"field_a1" : [
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
},
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
}
],
"field_a2" : [
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
},
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
}
],
"field_a3" : {},
"field_a4" : [
{
"date" : ISODate("2016-05-24T11:34:32.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:12:54.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:48:21.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:55:38.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:55:47.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T13:01:36.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T13:15:02.000Z"),
"A" : "TEXT",
"B" : "TEXT"
}
]
},
"field_i" : "Não",
"field_j" : 1
},
{
"status" : " Aceito e Habilitado",
"field_a" : "Text...",
"winner" : [
{
"a" : "23213.213213.23/232-23",
"b" : 130446,
"c" : 543223,
"d" : NumberLong(2),
"e" : "ABC 123 FULANO",
"f" : "text",
"g" : {
"description" : "TEXT TEXT TEXT"
}
}
],
"field_c" : {
"_id" : ObjectId("5744dd3271af88052f0cc343"),
"a" : "TEXT",
"b" : "TEXT"
},
"field_d" : NumberLong(2),
"field_e" : 5223,
"field_f" : "Não",
"field_g" : "-",
"field_h" : {
"field_a1" : [
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
},
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
}
],
"field_a2" : [
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
},
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
}
],
"field_a3" : {},
"field_a4" : [
{
"date" : ISODate("2016-05-24T11:34:32.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:12:54.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:48:21.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:55:38.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:55:47.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T13:01:36.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T13:15:02.000Z"),
"A" : "TEXT",
"B" : "TEXT"
}
]
},
"field_i" : "Não",
"field_j" : 2
},
{
"status" : " Aceito e Habilitado",
"field_a" : "Text...",
"winner" : [
{
"a" : "23213.213213.23/232-23",
"b" : 130446,
"c" : 543223,
"d" : NumberLong(2),
"e" : "ABC 123 FULANO",
"f" : "text",
"g" : {
"description" : "TEXT TEXT TEXT"
}
}
],
"field_c" : {
"_id" : ObjectId("5744dd3271af88052f0cc343"),
"a" : "TEXT",
"b" : "TEXT"
},
"field_d" : NumberLong(2),
"field_e" : 5223,
"field_f" : "Não",
"field_g" : "-",
"field_h" : {
"field_a1" : [
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
},
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
}
],
"field_a2" : [
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
},
{
"a" : "23213.213213.23/232-23",
"b" : ISODate("2016-05-23T23:54:21.000Z"),
"c" : 103432446,
"d" : 522343,
"e" : "Sim",
"f" : NumberLong(2),
"g" : "TEXT TEXT TEXT",
"h" : "Sim",
"i" : {
"a" : "TEXT TEXT TEXT"
}
}
],
"field_a3" : {},
"field_a4" : [
{
"date" : ISODate("2016-05-24T11:34:32.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:12:54.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:48:21.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:55:38.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T12:55:47.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T13:01:36.000Z"),
"A" : "TEXT",
"B" : "TEXT"
},
{
"date" : ISODate("2016-05-24T13:15:02.000Z"),
"A" : "TEXT",
"B" : "TEXT"
}
]
},
"field_i" : "Não",
"field_j" : 3
},
],
"field_7" : "TEXT",
"field_8" : {
"a" : "TEXT",
"b" : "TEXT",
"c" : "324234",
"d" : "TEXT TEXT TEXT TEXT"
},
"field_9" : 43234
}
解释
{
"waitedMS" : NumberLong(0),
"stages" : [
{
"$cursor" : {
"query" : {
"$and" : [
{
"status" : "Homologado"
},
{
"requested_items.status" : /aceito/i
}
]
},
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "module_database.minute",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"status" : {
"$eq" : "Homologado"
}
},
{
"requested_items.status" : /aceito/i
}
]
},
"winningPlan" : {
"stage" : "COLLSCAN",
"filter" : {
"$and" : [
{
"status" : {
"$eq" : "Homologado"
}
},
{
"requested_items.status" : /aceito/i
}
]
},
"direction" : "forward"
},
"rejectedPlans" : []
}
}
},
{
"$unwind" : {
"path" : "$requested_items"
}
},
{
"$unwind" : {
"path" : "$requested_items.winner"
}
},
{
"$match" : {
"$and" : [
{
"status" : "Homologado"
},
{
"requested_items.status" : /aceito/i
}
]
}
},
{
"$group" : {
"_id" : {
"$const" : null
},
"numberOfdocs" : {
"$sum" : {
"$const" : 1
}
}
}
}
],
"ok" : 1
}
我的服务器是:
OS:UBUNTU14/64
CPU:6
内存:16 GB
总存储容量:80 GB
只运行我问题的测试。很难确定速度,因为我们没有环境详细信息。
通过添加以下内容,您可以尝试了解explain是如何预测您的查询的:
{
explain:true
}
到您的聚合查询db.coll.aggregate([pipeline],{explain:true},{allowDiskUse:true})
。
还需要考虑的是,$unwind
将要处理的文档数量增加一倍
当您要计算文档的数量时->可能会更快只需取一个数组(在第一次展开后)并在以后求和即可
db.inventory.aggregate(
[
{
$group: {
_id: null,
numberOfdocs: { $sum:{$size: "$requested_items.winner" }}
}
}
]
)
编辑
在使用这个查询之后,我能够将它的执行时间减少大约45%。
主要的一点是跳过第二个$match
,因为它扫描整个结果集,所以最后一个$group
包含所有数据,我们可以在最后过滤出所需的内容,因为此操作是在一个小的结果集上完成的
db.coll.aggregate([{
$match : {
"status" : "Homologado"
}
}, {
$unwind : "$requested_items"
}, {
$unwind : "$requested_items.winner"
}, {
$project : {
x : "$requested_items.status",
}
}, {
$group : {
_id : "$x",
numberOfdocs : {
$sum : 1
}
}
}, {
$match : {
"_id" : /acesssito/i
}
}
], {
allowDiskUse: true
});
最终解决了我与group的查询问题。
这是设计模式的错误。考虑到SQL世界,在考虑我的应用程序之前,我设计了这些集合。因此,查询速度较慢
为了解决这个问题,我不得不重新设计我的收藏,并将相关数据放在我文档的第一级。
在我的搜索中,我发现在聚合上,索引需要处于管道的第一阶段。如果在$unwind阶段后使用带有索引的字段,则不考虑该字段
除此之外,我还使用包为文本字段创建了一个int散列。因此,我的文本字段可以被索引
因此,我的查询从300秒改为5秒。请记住,您的管道中有很多操作,$group
阶段将对最终管道中的所有文档进行分组。同样,您也不一定需要$sort
管道如果您的组操作不需要任何已排序的文档进入管道,它只是对所有传入文档进行计数。考虑删除<代码> $s> <代码> >管道中的<代码> $Project < /Cord>流水线阶段,看看如何提高您的性能。你能给我们看看你的数据库结构吗?@chridam,谢谢你的回答。删除$sort
和$project
阶段后,速度降至+-250秒。仍然很慢。如果总计数是您所需要的,那么是否将$group
操作全部删除,将其替换为管道并将结果输出到另一个集合,您可以使用输出集合上的方法获取总计数?@titi23,感谢您的编辑和回答。我编辑了这个问题,现在有了我文件的结构<代码>结果总数
表示请求的\u项的计数
数组。如果我的数据库中只有我放在问题中的文档,那么总数将是3。我用解释和服务器信息的结果编辑了问题。第一次展开后的数组大小不起作用。@GabrielCunha您能告诉我显示的文档中“$requested_items.winner”字段的名称吗?我试图处理它,但由于我不确定第二次展开时需要展开什么…问题再次编辑。现在显示winner
字段的位置。我正在阅读有关将MongoDB与Redis集成以解决此性能问题的文章。你觉得怎么样?@GabrielCunha请测试我的编辑,与redis集成是一个很好的解决方案,特别是当这个聚合查询将冲击数据库时,太棒了!请您提供最终的数据设计和快速查询,以便我们能够准确了解您所做的工作。聚合中的分组使用索引。。。如果分组速度慢,请确保存在匹配的索引。