Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/mongodb/11.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Mongodb 嵌入还是不嵌入?_Mongodb_Schema_Normalization_Denormalization - Fatal编程技术网

Mongodb 嵌入还是不嵌入?

Mongodb 嵌入还是不嵌入?,mongodb,schema,normalization,denormalization,Mongodb,Schema,Normalization,Denormalization,我试图找出我应该使用哪种模式设计 (这些是示例文档,实际文档包含更多属性) 嵌入式: { _id: ObjectId(), title: "trolo", subs: [ { owner: refUserId }, ... ] } Collection A: { _id: ObjectId(), title: "trolo" } Collection B: { parent: refId, o

我试图找出我应该使用哪种模式设计

(这些是示例文档,实际文档包含更多属性)

嵌入式:

{
   _id: ObjectId(),
   title: "trolo",
   subs: [
      {
         owner: refUserId
      },
      ...
   ]
}
Collection A:
{
   _id: ObjectId(),
   title: "trolo"
}
Collection B:
{
   parent: refId,
   owner: refUserId
}
ops = [
    {op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]
ops = [
    {op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]
for (x = 1; x <= 128; x *= 2) {
    res = benchRun({
        parallel : x,
        seconds : 5,
        ops : ops
    });
    print( "threads: " + x + "\t queries/sec: " + res.query);
}
我在上建立了索引:
ensureIndex({“subs.owner”:1})

标准化:

{
   _id: ObjectId(),
   title: "trolo",
   subs: [
      {
         owner: refUserId
      },
      ...
   ]
}
Collection A:
{
   _id: ObjectId(),
   title: "trolo"
}
Collection B:
{
   parent: refId,
   owner: refUserId
}
ops = [
    {op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]
ops = [
    {op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]
for (x = 1; x <= 128; x *= 2) {
    res = benchRun({
        parallel : x,
        seconds : 5,
        ops : ops
    });
    print( "threads: " + x + "\t queries/sec: " + res.query);
}
我在上建立了索引:
ensureIndex({owner:1})

我在不同的模型上运行了一些
benchRun()
测试。但结果非常令人惊讶

嵌入式查询:

{
   _id: ObjectId(),
   title: "trolo",
   subs: [
      {
         owner: refUserId
      },
      ...
   ]
}
Collection A:
{
   _id: ObjectId(),
   title: "trolo"
}
Collection B:
{
   parent: refId,
   owner: refUserId
}
ops = [
    {op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]
ops = [
    {op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]
for (x = 1; x <= 128; x *= 2) {
    res = benchRun({
        parallel : x,
        seconds : 5,
        ops : ops
    });
    print( "threads: " + x + "\t queries/sec: " + res.query);
}
规范化查询:

{
   _id: ObjectId(),
   title: "trolo",
   subs: [
      {
         owner: refUserId
      },
      ...
   ]
}
Collection A:
{
   _id: ObjectId(),
   title: "trolo"
}
Collection B:
{
   parent: refId,
   owner: refUserId
}
ops = [
    {op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]
ops = [
    {op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]
for (x = 1; x <= 128; x *= 2) {
    res = benchRun({
        parallel : x,
        seconds : 5,
        ops : ops
    });
    print( "threads: " + x + "\t queries/sec: " + res.query);
}
benchRun脚本:

{
   _id: ObjectId(),
   title: "trolo",
   subs: [
      {
         owner: refUserId
      },
      ...
   ]
}
Collection A:
{
   _id: ObjectId(),
   title: "trolo"
}
Collection B:
{
   parent: refId,
   owner: refUserId
}
ops = [
    {op: "find", ns: t.getFullName(), query: { "subs.owner": someUserId }}
]
ops = [
    {op: "find", ns: t.getFullName(), query: { owner: someUserId }}
]
for (x = 1; x <= 128; x *= 2) {
    res = benchRun({
        parallel : x,
        seconds : 5,
        ops : ops
    });
    print( "threads: " + x + "\t queries/sec: " + res.query);
}
标准化:

threads: 1       queries/sec: 8.4
threads: 2       queries/sec: 13.2
threads: 4       queries/sec: 16.4
threads: 8       queries/sec: 17.4
threads: 16      queries/sec: 18.2
threads: 32      queries/sec: 20.8
threads: 64      queries/sec: 27.4
threads: 128     queries/sec: 39.6
为什么标准化模型会慢得多?我本以为这是最快的

更新

以下是
.explain()
对我的查询的看法

嵌入式

> db.embedded.find({"subs.owner":ObjectId("516ea63322f2a93c4fef8542")}).explain()

{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 5,
        "nscannedObjects" : 5,
        "nscanned" : 5,
        "nscannedObjectsAllPlans" : 5,
        "nscannedAllPlans" : 5,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {

        },
        "server" : "localhost:27017"
}
> db.collectionB.find({owner: ObjectId("516ea63322f2a93c4fef8542")}).explain()
{
        "cursor" : "BtreeCursor owner_1",
        "isMultiKey" : false,
        "n" : 76625,
        "nscannedObjects" : 76625,
        "nscanned" : 76625,
        "nscannedObjectsAllPlans" : 76625,
        "nscannedAllPlans" : 76625,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 91,
        "indexBounds" : {
                "owner" : [
                        [
                                ObjectId("516ea63322f2a93c4fef8542"),
                                ObjectId("516ea63322f2a93c4fef8542")
                        ]
                ]
        },
        "server" : "localhost:27017"
}
标准化

> db.embedded.find({"subs.owner":ObjectId("516ea63322f2a93c4fef8542")}).explain()

{
        "cursor" : "BasicCursor",
        "isMultiKey" : false,
        "n" : 5,
        "nscannedObjects" : 5,
        "nscanned" : 5,
        "nscannedObjectsAllPlans" : 5,
        "nscannedAllPlans" : 5,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {

        },
        "server" : "localhost:27017"
}
> db.collectionB.find({owner: ObjectId("516ea63322f2a93c4fef8542")}).explain()
{
        "cursor" : "BtreeCursor owner_1",
        "isMultiKey" : false,
        "n" : 76625,
        "nscannedObjects" : 76625,
        "nscanned" : 76625,
        "nscannedObjectsAllPlans" : 76625,
        "nscannedAllPlans" : 76625,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 91,
        "indexBounds" : {
                "owner" : [
                        [
                                ObjectId("516ea63322f2a93c4fef8542"),
                                ObjectId("516ea63322f2a93c4fef8542")
                        ]
                ]
        },
        "server" : "localhost:27017"
}

你为什么期望标准化速度更快?使用embedded,文档存储在磁盘上的单个位置。使用一个磁盘seek,可以带回整个文档。如果它是标准化的,它将分布在整个磁盘上,这意味着2个磁盘将寻求获取信息。取决于磁盘的速度和指针必须进入的扇区,它不可避免地会比嵌入式文档模型慢。

您是否尝试在查询中使用
explain
来查看发生了什么?这就是我现在正在做的:),不确定为什么我之前没有想到它。但是我的规范化查询有
indexOnly:false
,所以我正在阅读其他人最近注意到的
indexOnly:false
可能非常混乱,很难
解释
[叹气]:)。它在“规范化”情况下扫描76000多个文档?嗯。那似乎一点也不对。我想,76625是文件数。但现在您提到它似乎有点高,需要检查我的构建脚本:)。无论如何,因为它使用所有者索引,所以不需要扫描整个集合,因为它必须搜索嵌套文档。我注意到,一旦我做了$unwind和$match,嵌入式模型的性能就下降到每秒0个查询。否,因为bsonspec允许跳过嵌套的子文档