Mongodb 嵌入还是不嵌入?
我试图找出我应该使用哪种模式设计 (这些是示例文档,实际文档包含更多属性) 嵌入式:Mongodb 嵌入还是不嵌入?,mongodb,schema,normalization,denormalization,Mongodb,Schema,Normalization,Denormalization,我试图找出我应该使用哪种模式设计 (这些是示例文档,实际文档包含更多属性) 嵌入式: { _id: ObjectId(), title: "trolo", subs: [ { owner: refUserId }, ... ] } Collection A: { _id: ObjectId(), title: "trolo" } Collection B: { parent: refId, o
{
_id: ObjectId(),
title: "trolo",
subs: [
{
owner: refUserId
},
...
]
}
Collection A:
{
_id: ObjectId(),
title: "trolo"
}
Collection B:
{
parent: refId,
owner: refUserId
}
ops = [
{op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]
ops = [
{op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]
for (x = 1; x <= 128; x *= 2) {
res = benchRun({
parallel : x,
seconds : 5,
ops : ops
});
print( "threads: " + x + "\t queries/sec: " + res.query);
}
我在上建立了索引:ensureIndex({“subs.owner”:1})
标准化:
{
_id: ObjectId(),
title: "trolo",
subs: [
{
owner: refUserId
},
...
]
}
Collection A:
{
_id: ObjectId(),
title: "trolo"
}
Collection B:
{
parent: refId,
owner: refUserId
}
ops = [
{op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]
ops = [
{op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]
for (x = 1; x <= 128; x *= 2) {
res = benchRun({
parallel : x,
seconds : 5,
ops : ops
});
print( "threads: " + x + "\t queries/sec: " + res.query);
}
我在上建立了索引:ensureIndex({owner:1})
我在不同的模型上运行了一些benchRun()
测试。但结果非常令人惊讶
嵌入式查询:
{
_id: ObjectId(),
title: "trolo",
subs: [
{
owner: refUserId
},
...
]
}
Collection A:
{
_id: ObjectId(),
title: "trolo"
}
Collection B:
{
parent: refId,
owner: refUserId
}
ops = [
{op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]
ops = [
{op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]
for (x = 1; x <= 128; x *= 2) {
res = benchRun({
parallel : x,
seconds : 5,
ops : ops
});
print( "threads: " + x + "\t queries/sec: " + res.query);
}
规范化查询:
{
_id: ObjectId(),
title: "trolo",
subs: [
{
owner: refUserId
},
...
]
}
Collection A:
{
_id: ObjectId(),
title: "trolo"
}
Collection B:
{
parent: refId,
owner: refUserId
}
ops = [
{op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]
ops = [
{op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]
for (x = 1; x <= 128; x *= 2) {
res = benchRun({
parallel : x,
seconds : 5,
ops : ops
});
print( "threads: " + x + "\t queries/sec: " + res.query);
}
benchRun脚本:
{
_id: ObjectId(),
title: "trolo",
subs: [
{
owner: refUserId
},
...
]
}
Collection A:
{
_id: ObjectId(),
title: "trolo"
}
Collection B:
{
parent: refId,
owner: refUserId
}
ops = [
{op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]
ops = [
{op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]
for (x = 1; x <= 128; x *= 2) {
res = benchRun({
parallel : x,
seconds : 5,
ops : ops
});
print( "threads: " + x + "\t queries/sec: " + res.query);
}
标准化:
threads: 1 queries/sec: 8.4
threads: 2 queries/sec: 13.2
threads: 4 queries/sec: 16.4
threads: 8 queries/sec: 17.4
threads: 16 queries/sec: 18.2
threads: 32 queries/sec: 20.8
threads: 64 queries/sec: 27.4
threads: 128 queries/sec: 39.6
为什么标准化模型会慢得多?我本以为这是最快的
更新
以下是.explain()
对我的查询的看法
嵌入式
> db.embedded.find({"subs.owner":ObjectId("516ea63322f2a93c4fef8542")}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 5,
"nscannedObjects" : 5,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 5,
"nscannedAllPlans" : 5,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
},
"server" : "localhost:27017"
}
> db.collectionB.find({owner: ObjectId("516ea63322f2a93c4fef8542")}).explain()
{
"cursor" : "BtreeCursor owner_1",
"isMultiKey" : false,
"n" : 76625,
"nscannedObjects" : 76625,
"nscanned" : 76625,
"nscannedObjectsAllPlans" : 76625,
"nscannedAllPlans" : 76625,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 91,
"indexBounds" : {
"owner" : [
[
ObjectId("516ea63322f2a93c4fef8542"),
ObjectId("516ea63322f2a93c4fef8542")
]
]
},
"server" : "localhost:27017"
}
标准化
> db.embedded.find({"subs.owner":ObjectId("516ea63322f2a93c4fef8542")}).explain()
{
"cursor" : "BasicCursor",
"isMultiKey" : false,
"n" : 5,
"nscannedObjects" : 5,
"nscanned" : 5,
"nscannedObjectsAllPlans" : 5,
"nscannedAllPlans" : 5,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
},
"server" : "localhost:27017"
}
> db.collectionB.find({owner: ObjectId("516ea63322f2a93c4fef8542")}).explain()
{
"cursor" : "BtreeCursor owner_1",
"isMultiKey" : false,
"n" : 76625,
"nscannedObjects" : 76625,
"nscanned" : 76625,
"nscannedObjectsAllPlans" : 76625,
"nscannedAllPlans" : 76625,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 91,
"indexBounds" : {
"owner" : [
[
ObjectId("516ea63322f2a93c4fef8542"),
ObjectId("516ea63322f2a93c4fef8542")
]
]
},
"server" : "localhost:27017"
}
你为什么期望标准化速度更快?使用embedded,文档存储在磁盘上的单个位置。使用一个磁盘seek,可以带回整个文档。如果它是标准化的,它将分布在整个磁盘上,这意味着2个磁盘将寻求获取信息。取决于磁盘的速度和指针必须进入的扇区,它不可避免地会比嵌入式文档模型慢。您是否尝试在查询中使用
explain
来查看发生了什么?这就是我现在正在做的:),不确定为什么我之前没有想到它。但是我的规范化查询有indexOnly:false
,所以我正在阅读其他人最近注意到的indexOnly:false
可能非常混乱,很难解释
[叹气]:)。它在“规范化”情况下扫描76000多个文档?嗯。那似乎一点也不对。我想,76625是文件数。但现在您提到它似乎有点高,需要检查我的构建脚本:)。无论如何,因为它使用所有者索引,所以不需要扫描整个集合,因为它必须搜索嵌套文档。我注意到,一旦我做了$unwind和$match,嵌入式模型的性能就下降到每秒0个查询。否,因为bsonspec允许跳过嵌套的子文档