Node.js 如何映射具有相互关联的复杂子文档的对象_Node.js_Mongodb_Hadoop_Mongoose

Node.js 如何映射具有相互关联的复杂子文档的对象

node.js mongodb hadoop mongoose

Node.js 如何映射具有相互关联的复杂子文档的对象,node.js,mongodb,hadoop,mongoose,Node.js,Mongodb,Hadoop,Mongoose,首先，这可能是一个误入歧途的问题，如果是这样的话，我希望能得到一些关于我应该如何继续的指导从我在网上发现的情况来看，mongodb/mongoose mapReduce似乎是实现这一点的最佳方法，但我一直在努力了解它，我正在努力理解它，因为它不是一件小事，我想知道是否有人可以帮助解释我的问题。我不一定要寻找一个完整的解决方案。我真的很欣赏解释得很好的伪代码。我认为让我特别困惑的是如何处理聚合和组合2个或更多集合的子文档此外，我知道这可能是由于一个糟糕的模型/集合设计，但不幸的是，这是完全超出

首先，这可能是一个误入歧途的问题，如果是这样的话，我希望能得到一些关于我应该如何继续的指导

从我在网上发现的情况来看，mongodb/mongoose mapReduce似乎是实现这一点的最佳方法，但我一直在努力了解它，我正在努力理解它，因为它不是一件小事，我想知道是否有人可以帮助解释我的问题。我不一定要寻找一个完整的解决方案。我真的很欣赏解释得很好的伪代码。我认为让我特别困惑的是如何处理聚合和组合2个或更多集合的子文档

此外，我知道这可能是由于一个糟糕的模型/集合设计，但不幸的是，这是完全超出我的控制，所以请不要建议重塑

我的特别问题是，我们有一个类似以下内容的现有模型：

survey: {
            _id: 1111,
            name: "name",
            questions: [
                {_id: 1, text: "a,b, or c?", type: "multipleChoice", options: [a, b, c,]},
                {_id: 2, text: "what do you think", type: "freeform"}
            ],
            participants: [{_id: 1, name: "user 1"}, {_id: 2, name: "user 2"}],
            results: [{_id: 123, userId: 1, questionId: 1, answer: "a"},
                {_id: 124, userId: 2, questionId: 1, answer: "b"},
                {_id: 125, userId: 1, questionId: 2, answer: "this is some answer"},
                {_id: 126, userId: 2, questionId: 2, answer: "this is another answer"}]

        }

然后我们有另一个单独开发的模型，用于跟踪用户在整个调查过程中的进度（这只是一个基本子集，我们还跟踪不同的事件）

我想做的是得到如下结果：

{
    survey: "survey name",
    _id : 1,
    totalAverageTime: "00:23:00",
    fastestTime : "00:23:00",
    slowestTime: "00:25:00",
    questions: [
    {
       _id: 1, text: "a,b, or c?", 
       type: "multipleChoice", 
       mostPopularAnswer: "a", 
       averageTime: "00:13:00", 
       anwers : [{ userId: 1, answer: "a", time:"00:14:00"},
                { userId: 2, answer: "a", time:"00:12:00"}]

    },{
        _id: 2, text:"what do you think",
        type:"freeform",
        averageTime : "00:10:00",
        answers : [{ userId: 1, answer: "this is some answer", time:"00:11:00"},
                { userId: 2, answer: "this is another answer", time:"00:09:00"}]


    }

  ]

}

以下方法使用得出更接近所需输出的解决方案。这取决于第三个集合，该集合可视为两个集合的合并

调查

和

跟踪

首先也是最重要的一点是，假设您有以下基于问题中示例的测试文档集合：

// survey collection
db.survey.insert({
    _id: 1111,
    name: "name",
    questions: [
        {_id: 1, text: "a,b, or c?", type: "multipleChoice", options: ["a", "b", "c",]},
        {_id: 2, text: "what do you think", type: "freeform"}
    ],
    participants: [{_id: 1, name: "user 1"}, {_id: 2, name: "user 2"}],
    results: [{_id: 123, userId: 1, questionId: 1, answer: "a"},
        {_id: 124, userId: 2, questionId: 1, answer: "b"},
        {_id: 125, userId: 1, questionId: 2, answer: "this is some answer"},
        {_id: 126, userId: 2, questionId: 2, answer: "this is another answer"}]

})

// trackings collection
db.trackings.insert([
    {
        _id:1,
        surveyId: 1111,
        userId: 1,
        starttime: "2015-05-13 10:46:20.347Z",
        endtime: "2015-05-13 10:59:20.347Z"
    },
    {
        _id:2,
        surveyId: 1111,
        userId: 2,
        starttime: "2015-05-13 10:13:06.176Z",
        endtime: "2015-05-13 10:46:28.176Z"
    }    
])

要创建第三个集合（称之为

output\u collection

），您需要使用光标的方法迭代

trackings

集合，将带有日期字符串的字段转换为实际的ISODate对象，创建一个数组字段，用于存储测量结果，然后将合并对象保存到第三个集合中。以下内容演示了此操作：

db.trackings.find().forEach(function(doc){
    var survey = db.survey.find({"_id": doc.surveyId}).toArray();
    doc.survey = survey;
    doc["starttime"] = ISODate(doc.starttime);
    doc["endtime"] = ISODate(doc.endtime);
    db.output_collection.save(doc);
});

将两个集合合并到输出集合中后，使用

db.output\u collection.findOne（）

查询它将产生：

{
    "_id" : 1,
    "surveyId" : 1111,
    "userId" : 1,
    "starttime" : ISODate("2015-05-13T10:46:20.347Z"),
    "endtime" : ISODate("2015-05-13T10:59:20.347Z"),
    "survey" : [ 
        {
            "_id" : 1111,
            "name" : "name",
            "questions" : [ 
                {
                    "_id" : 1,
                    "text" : "a,b, or c?",
                    "type" : "multipleChoice",
                    "options" : [ 
                        "a", 
                        "b", 
                        "c"
                    ]
                }, 
                {
                    "_id" : 2,
                    "text" : "what do you think",
                    "type" : "freeform"
                }
            ],
            "participants" : [ 
                {
                    "_id" : 1,
                    "name" : "user 1"
                }, 
                {
                    "_id" : 2,
                    "name" : "user 2"
                }
            ],
            "results" : [ 
                {
                    "_id" : 123,
                    "userId" : 1,
                    "questionId" : 1,
                    "answer" : "a"
                }, 
                {
                    "_id" : 124,
                    "userId" : 2,
                    "questionId" : 1,
                    "answer" : "b"
                }, 
                {
                    "_id" : 125,
                    "userId" : 1,
                    "questionId" : 2,
                    "answer" : "this is some answer"
                }, 
                {
                    "_id" : 126,
                    "userId" : 2,
                    "questionId" : 2,
                    "answer" : "this is another answer"
                }
            ]
        }
    ]
}

然后可以在此集合上应用聚合。聚合管道应该由四个操作符阶段组成，它们从输入文档解构数组，为每个元素输出一个文档。每个输出文档都用元素值替换数组

下一个操作员阶段将重塑流中的每个文档，例如添加一个新字段

duration

，该字段以分钟为单位计算starttime和endtime日期字段之间的时间差，并使用进行计算

之后是操作员管道阶段，通过

“survey”

键对输入文档进行分组，并将应用于每个组。使用所有输入文档，并为每个不同的组输出一个文档

因此，您的聚合管道应该如下所示：

db.output_collection.aggregate([
    { "$unwind": "$survey" },
    { "$unwind": "$survey.questions" },
    { "$unwind": "$survey.participants" },
    { "$unwind": "$survey.results" },
    {
        "$project": {
            "survey": 1,
            "surveyId": 1,
            "userId": 1,
            "starttime": 1,
            "endtime": 1,
            "duration": {
                "$divide": [
                    { "$subtract": [ "$endtime", "$starttime" ] },
                    1000 * 60
                ]
            }
        }
    },
    {
        "$group": {
            "_id": "$surveyId",
            "survey": { "$first": "$survey.name"},
            "totalAverageTime": {
                "$avg": "$duration"
            },
            "fastestTime": {
                "$min": "$duration"
            },
            "slowestTime": {
                "$max": "$duration"
            },
            "questions": {
                "$addToSet": "$survey.questions"
            },
            "answers": {
                "$addToSet": "$survey.results"
            }
        }
    },
    {
        "$out": "survey_results"
    }
])

db.survey\u results.find（）

输出

/* 0 */ { "result" : [ { "_id" : 1111, "survey" : "name", "totalAverageTime" : 23.18333333333334, "fastestTime" : 13, "slowestTime" : 33.36666666666667, "questions" : [ { "_id" : 2, "text" : "what do you think", "type" : "freeform" }, { "_id" : 1, "text" : "a,b, or c?", "type" : "multipleChoice", "options" : [ "a", "b", "c" ] } ], "answers" : [ { "_id" : 126, "userId" : 2, "questionId" : 2, "answer" : "this is another answer" }, { "_id" : 124, "userId" : 2, "questionId" : 1, "answer" : "b" }, { "_id" : 125, "userId" : 1, "questionId" : 2, "answer" : "this is some answer" }, { "_id" : 123, "userId" : 1, "questionId" : 1, "answer" : "a" } ] } ], "ok" : 1 }

更新
在通过聚合管道将聚合输出获取到另一个集合（例如
survey\u results
）后，您可以将一些本机JavaScript函数与游标的方法一起应用，以获取最终对象：

db.survey_results.find().forEach(function(doc){ var questions = []; doc.questions.forEach(function(q){ var answers = []; doc.answers.forEach(function(a){ if(a.questionId === q._id){ delete a.questionId; answers.push(a); } }); q.answers = answers; questions.push(q); }); delete doc.answers; doc.questions = questions; db.survey_results.save(doc); });
输出：

/* 0 */ { "_id" : 1111, "survey" : "name", "totalAverageTime" : 23.18333333333334, "fastestTime" : 13, "slowestTime" : 33.36666666666667, "questions" : [ { "_id" : 2, "text" : "what do you think", "type" : "freeform", "answers" : [ { "_id" : 126, "userId" : 2, "answer" : "this is another answer" }, { "_id" : 125, "userId" : 1, "answer" : "this is some answer" } ] }, { "_id" : 1, "text" : "a,b, or c?", "type" : "multipleChoice", "options" : [ "a", "b", "c" ], "answers" : [ { "_id" : 124, "userId" : 2, "answer" : "b" }, { "_id" : 123, "userId" : 1, "answer" : "a" } ] } ] }

我想到了一个解决方案，它包括创建另一个连接两个模式的输出集合，然后使用聚合框架来计算所需的聚合。但是，如果您可以指定正在使用的MongoDB版本，这将非常有帮助，因为这将影响聚合操作，因为它需要使用在更高版本中找到的一些运算符。我们目前使用的是MongoDB 3.8和mongoose 4，这是一个输入错误，mongodb 3.8？很抱歉，是的，是mongodb 3实际上更准确，我们目前在2.6.5上，但在接下来的几周内升级到3。这是一个非常棒的，我非常感谢细节。一个问题。如果我想把每个问题的所有答案都放在问题子文档中，这在上面的查询中是否可行？这是否需要对输出进行另一次查询？@jonnie是的，确实很可能。只是没有足够的时间来完成聚合管道，该聚合管道可以生成准确的所需输出，就像您可以在问题数组中将答案作为子文档一样。一旦我有足够的时间，我会尝试更新答案。但最重要的想法是有另一个
$group
管道阶段，它使用
$addToSet
操作符将元素添加到数组中。@jonnie我已经更新了答案，包括一个额外的步骤，可以引导您获得最终所需的结果。
/* 0 */ { "_id" : 1111, "survey" : "name", "totalAverageTime" : 23.18333333333334, "fastestTime" : 13, "slowestTime" : 33.36666666666667, "questions" : [ { "_id" : 2, "text" : "what do you think", "type" : "freeform", "answers" : [ { "_id" : 126, "userId" : 2, "answer" : "this is another answer" }, { "_id" : 125, "userId" : 1, "answer" : "this is some answer" } ] }, { "_id" : 1, "text" : "a,b, or c?", "type" : "multipleChoice", "options" : [ "a", "b", "c" ], "answers" : [ { "_id" : 124, "userId" : 2, "answer" : "b" }, { "_id" : 123, "userId" : 1, "answer" : "a" } ] } ] }