Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/typo3/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Mongodb Mongo DB-将关系数据映射到文档结构_Mongodb - Fatal编程技术网

Mongodb Mongo DB-将关系数据映射到文档结构

Mongodb Mongo DB-将关系数据映射到文档结构,mongodb,Mongodb,我有一个mongo集合中包含3000万行的数据集。一组记录示例如下: {"_id" : ObjectId("568bc0f2f7cd2653e163a9e4"), "EmailAddress" : "1234@ab.com", "FlightNumber" : 1043, "FlightTime" : "10:00"}, {"_id" : ObjectId("568bc0f2f7cd2653e163a9e5"), "EmailAddress" : "1234@ab.com

我有一个mongo集合中包含3000万行的数据集。一组记录示例如下:

{"_id" : ObjectId("568bc0f2f7cd2653e163a9e4"),    
"EmailAddress" : "1234@ab.com",    
"FlightNumber" : 1043,
"FlightTime" : "10:00"},
{"_id" : ObjectId("568bc0f2f7cd2653e163a9e5"),    
"EmailAddress" : "1234@ab.com",    
"FlightNumber" : 1045,
"FlightTime" : "12:00"},
{"_id" : ObjectId("568bc0f2f7cd2653e163a9e6"),    
"EmailAddress" : "5678@ab.com",    
"FlightNumber" : 1045,
"FlightTime" : "12:00"},
var sourceDb = db.getSiblingDB("collectionSource");
var destinationDb = db.getSiblingDB("collectionDestination");

var externalUsers=sourceDb.CRM.find();
var index = 0; 
var contactArray = new Array();
var identifierArray = new Array();

externalUsers.forEach(function(doc) {    
    //library code for NewGuid omitted
    var guid = NewGuid();
    //buildContact and buildIdentifier simply create 2 js objects based on the parameters
    contactArray.push(buildContact(guid, doc.EmailAddress, doc.FlightNumber));
    identifierArray.push(buildIdentifier(guid, doc.EmailAddress));

    index++;

    if (index % 1000 == 0) {         
        var now = new Date();
        var dif = now.getTime() - startDate.getTime();
        var Seconds_from_T1_to_T2 = dif / 1000;
        var Seconds_Between_Dates = Math.abs(Seconds_from_T1_to_T2);
        print("Written " + index + " items (" + Seconds_Between_Dates + "s from start)");    
    }    

    //bulk insert in batches
    if (index % 5000 == 0) {    
        destinationDb.Contacts.insert(contactArray);
        destinationDb.Identifiers.insert(identifierArray);

        contactArray = new Array();
        identifierArray = new Array();
    } 
}); 
这是直接从SQLServer导入的,因此数据具有关系式的性质

如何最好地将此数据映射到另一个集合,以便所有数据按EmailAddress分组,并嵌套FlightNumber?输出的一个例子是:

{"_id" : ObjectId("can be new id"),    
"EmailAddress" : "1234@ab.com",    
"Flights" : [{"Number":1043, "Time":"10:00"},{"Number":1045, "Time":"12:00"}]},    
{"_id" : ObjectId("can be new id"),    
"EmailAddress" : "5678@ab.com",    
"Flights" : [{"Number":1045, "Time":"12:00"}]},
我一直在研究一个导入路由,它迭代源集合中的每个记录,然后批量插入到第二个集合中。这很好,但是不允许我对数据进行分组,除非我对记录进行反向处理,这会给导入例程增加巨大的时间开销

这方面的代码是:

{"_id" : ObjectId("568bc0f2f7cd2653e163a9e4"),    
"EmailAddress" : "1234@ab.com",    
"FlightNumber" : 1043,
"FlightTime" : "10:00"},
{"_id" : ObjectId("568bc0f2f7cd2653e163a9e5"),    
"EmailAddress" : "1234@ab.com",    
"FlightNumber" : 1045,
"FlightTime" : "12:00"},
{"_id" : ObjectId("568bc0f2f7cd2653e163a9e6"),    
"EmailAddress" : "5678@ab.com",    
"FlightNumber" : 1045,
"FlightTime" : "12:00"},
var sourceDb = db.getSiblingDB("collectionSource");
var destinationDb = db.getSiblingDB("collectionDestination");

var externalUsers=sourceDb.CRM.find();
var index = 0; 
var contactArray = new Array();
var identifierArray = new Array();

externalUsers.forEach(function(doc) {    
    //library code for NewGuid omitted
    var guid = NewGuid();
    //buildContact and buildIdentifier simply create 2 js objects based on the parameters
    contactArray.push(buildContact(guid, doc.EmailAddress, doc.FlightNumber));
    identifierArray.push(buildIdentifier(guid, doc.EmailAddress));

    index++;

    if (index % 1000 == 0) {         
        var now = new Date();
        var dif = now.getTime() - startDate.getTime();
        var Seconds_from_T1_to_T2 = dif / 1000;
        var Seconds_Between_Dates = Math.abs(Seconds_from_T1_to_T2);
        print("Written " + index + " items (" + Seconds_Between_Dates + "s from start)");    
    }    

    //bulk insert in batches
    if (index % 5000 == 0) {    
        destinationDb.Contacts.insert(contactArray);
        destinationDb.Identifiers.insert(identifierArray);

        contactArray = new Array();
        identifierArray = new Array();
    } 
}); 

非常感谢大家

大家好,欢迎来到MongoDB。在这种情况下,您可能需要考虑使用两个不同的集合——一个用于用户,另一个用于飞行。p> 用户:

{
    _id: 
    email:
}
航班:

{
    _id:
    userId:
    number: // if number is unique, you can actually specify _id as number
    time:
}

在forEach循环中,首先要检查具有该特定电子邮件地址的用户文档是否已经存在。如果没有,就创建它。然后使用用户文档的唯一标识符将新文档插入Flights集合,并将标识符存储在字段
userId
(或者
passengerId
?)下。

谢谢您的建议!导入的范围是为另一个将集合映射到c#dto的应用程序提供基本数据,关于此过程的一个不幸限制是我们有用户,用户具有嵌套的方面(即航班)。我可以重做循环的逻辑来交叉引用每一行,但我担心这会降低导入速度。它确实觉得聚合可以解决一些问题,只是不确定最好的方法。这是一个非常令人沮丧的限制!不幸的是,我无法从MongoDB导出数据;我只知道我希望在MongoDB中如何表示这些信息。