Mongodb 嵌入还是不嵌入？_Mongodb_Schema_Normalization_Denormalization

Mongodb 嵌入还是不嵌入？

mongodb

Mongodb 嵌入还是不嵌入？,mongodb,schema,normalization,denormalization,Mongodb,Schema,Normalization,Denormalization,我试图找出我应该使用哪种模式设计（这些是示例文档，实际文档包含更多属性）嵌入式： { _id: ObjectId(), title: "trolo", subs: [ { owner: refUserId }, ... ] } Collection A: { _id: ObjectId(), title: "trolo" } Collection B: { parent: refId, o

我试图找出我应该使用哪种模式设计

（这些是示例文档，实际文档包含更多属性）

嵌入式：

{
   _id: ObjectId(),
   title: "trolo",
   subs: [
      {
         owner: refUserId
      },
      ...
   ]
}

Collection A:
{
   _id: ObjectId(),
   title: "trolo"
}
Collection B:
{
   parent: refId,
   owner: refUserId
}

ops = [
    {op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]

ops = [
    {op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]

for (x = 1; x <= 128; x *= 2) {
    res = benchRun({
        parallel : x,
        seconds : 5,
        ops : ops
    });
    print( "threads: " + x + "\t queries/sec: " + res.query);
}

我在上建立了索引：

ensureIndex（{“subs.owner”：1}）

标准化：

{
   _id: ObjectId(),
   title: "trolo",
   subs: [
      {
         owner: refUserId
      },
      ...
   ]
}

Collection A:
{
   _id: ObjectId(),
   title: "trolo"
}
Collection B:
{
   parent: refId,
   owner: refUserId
}

ops = [
    {op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]

ops = [
    {op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]

for (x = 1; x <= 128; x *= 2) {
    res = benchRun({
        parallel : x,
        seconds : 5,
        ops : ops
    });
    print( "threads: " + x + "\t queries/sec: " + res.query);
}

我在上建立了索引：

ensureIndex（{owner:1}）

我在不同的模型上运行了一些

benchRun（）

测试。但结果非常令人惊讶

嵌入式查询：

{
   _id: ObjectId(),
   title: "trolo",
   subs: [
      {
         owner: refUserId
      },
      ...
   ]
}

Collection A:
{
   _id: ObjectId(),
   title: "trolo"
}
Collection B:
{
   parent: refId,
   owner: refUserId
}

ops = [
    {op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]

ops = [
    {op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]

for (x = 1; x <= 128; x *= 2) {
    res = benchRun({
        parallel : x,
        seconds : 5,
        ops : ops
    });
    print( "threads: " + x + "\t queries/sec: " + res.query);
}

规范化查询：

{
   _id: ObjectId(),
   title: "trolo",
   subs: [
      {
         owner: refUserId
      },
      ...
   ]
}

Collection A:
{
   _id: ObjectId(),
   title: "trolo"
}
Collection B:
{
   parent: refId,
   owner: refUserId
}

ops = [
    {op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]

ops = [
    {op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]

for (x = 1; x <= 128; x *= 2) {
    res = benchRun({
        parallel : x,
        seconds : 5,
        ops : ops
    });
    print( "threads: " + x + "\t queries/sec: " + res.query);
}

benchRun脚本：

{
   _id: ObjectId(),
   title: "trolo",
   subs: [
      {
         owner: refUserId
      },
      ...
   ]
}

Collection A:
{
   _id: ObjectId(),
   title: "trolo"
}
Collection B:
{
   parent: refId,
   owner: refUserId
}

ops = [
    {op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]

ops = [
    {op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]

for (x = 1; x <= 128; x *= 2) {
    res = benchRun({
        parallel : x,
        seconds : 5,
        ops : ops
    });
    print( "threads: " + x + "\t queries/sec: " + res.query);
}

标准化：

threads: 1       queries/sec: 8.4
threads: 2       queries/sec: 13.2
threads: 4       queries/sec: 16.4
threads: 8       queries/sec: 17.4
threads: 16      queries/sec: 18.2
threads: 32      queries/sec: 20.8
threads: 64      queries/sec: 27.4
threads: 128     queries/sec: 39.6

为什么标准化模型会慢得多？我本以为这是最快的

更新

以下是

.explain（）

对我的查询的看法

嵌入式

> db.embedded.find({"subs.owner":ObjectId("516ea63322f2a93c4fef8542")}).explain()

{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 5,
        "nscannedObjects" : 5,
        "nscanned" : 5,
        "nscannedObjectsAllPlans" : 5,
        "nscannedAllPlans" : 5,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {

        },
        "server" : "localhost:27017"
}

> db.collectionB.find({owner: ObjectId("516ea63322f2a93c4fef8542")}).explain()
{
        "cursor" : "BtreeCursor owner_1",
        "isMultiKey" : false,
        "n" : 76625,
        "nscannedObjects" : 76625,
        "nscanned" : 76625,
        "nscannedObjectsAllPlans" : 76625,
        "nscannedAllPlans" : 76625,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 91,
        "indexBounds" : {
                "owner" : [
                        [
                                ObjectId("516ea63322f2a93c4fef8542"),
                                ObjectId("516ea63322f2a93c4fef8542")
                        ]
                ]
        },
        "server" : "localhost:27017"
}

标准化

> db.embedded.find({"subs.owner":ObjectId("516ea63322f2a93c4fef8542")}).explain()

{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 5,
        "nscannedObjects" : 5,
        "nscanned" : 5,
        "nscannedObjectsAllPlans" : 5,
        "nscannedAllPlans" : 5,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {

        },
        "server" : "localhost:27017"
}

> db.collectionB.find({owner: ObjectId("516ea63322f2a93c4fef8542")}).explain()
{
        "cursor" : "BtreeCursor owner_1",
        "isMultiKey" : false,
        "n" : 76625,
        "nscannedObjects" : 76625,
        "nscanned" : 76625,
        "nscannedObjectsAllPlans" : 76625,
        "nscannedAllPlans" : 76625,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 91,
        "indexBounds" : {
                "owner" : [
                        [
                                ObjectId("516ea63322f2a93c4fef8542"),
                                ObjectId("516ea63322f2a93c4fef8542")
                        ]
                ]
        },
        "server" : "localhost:27017"
}

你为什么期望标准化速度更快？使用embedded，文档存储在磁盘上的单个位置。使用一个磁盘seek，可以带回整个文档。如果它是标准化的，它将分布在整个磁盘上，这意味着2个磁盘将寻求获取信息。取决于磁盘的速度和指针必须进入的扇区，它不可避免地会比嵌入式文档模型慢。

您是否尝试在查询中使用

explain

来查看发生了什么？这就是我现在正在做的：），不确定为什么我之前没有想到它。但是我的规范化查询有

indexOnly:false

，所以我正在阅读其他人最近注意到的

indexOnly:false

可能非常混乱，很难

解释

[叹气]：）。它在“规范化”情况下扫描76000多个文档？嗯。那似乎一点也不对。我想，76625是文件数。但现在您提到它似乎有点高，需要检查我的构建脚本：）。无论如何，因为它使用所有者索引，所以不需要扫描整个集合，因为它必须搜索嵌套文档。我注意到，一旦我做了$unwind和$match，嵌入式模型的性能就下降到每秒0个查询。否，因为bsonspec允许跳过嵌套的子文档