Python 将mongodb中的嵌套集合与从并行节点写入的文档相结合_Python_Mongodb_Pymongo

Python 将mongodb中的嵌套集合与从并行节点写入的文档相结合

python mongodb

Python 将mongodb中的嵌套集合与从并行节点写入的文档相结合,python,mongodb,pymongo,Python,Mongodb,Pymongo,我正在考虑是否可以使用MongoDB来帮助解决存储和处理问题。其思想是，计算将以多处理方式在每个节点上进行，并使用唯一的mongodbObjectId写入mongodb。字典中的数据结构如下所示： {a: {b: {c: [100, 200, 300]} } a、 b和c是整数键完成计算并将所有记录写入mongo后，必须将文件进行组合，以便我们按照顶级a、b和c进行分组。因此，两个文档可能包含（示例A）：文档1:{24:{67:{12:[100200]}}} 文档2:{24:{68:{12

我正在考虑是否可以使用MongoDB来帮助解决存储和处理问题。其思想是，计算将以多处理方式在每个节点上进行，并使用唯一的mongodbObjectId写入mongodb。字典中的数据结构如下所示：

{a: {b: {c: [100, 200, 300]} }

a、 b和c是整数键

完成计算并将所有记录写入mongo后，必须将文件进行组合，以便我们按照顶级a、b和c进行分组。因此，两个文档可能包含（示例A）：

文档1:

{24:{67:{12:[100200]}}}

文档2:

{24:{68:{12:[100200]}}}

那么如果我们结合起来：

组合：

{24:{67:{12:[100200]}，68:[100200]}

如果我们有另外两份文件（ExampleB）：

文档1:

{24:{67:{12:[100200]}}}

文档2:

{24:{67:{12:[300400]}}}

组合：

{24:{67:{12:[100200300400]}}}

组合这些嵌套结构的最佳方法是什么。我可以手工循环浏览每个文档，并用python实现这一点，但有更聪明的方法吗？我需要保留底层数据结构。

用python进行聚合有什么不明智的地方？考虑以下功能：

def aggregate(documents, base_document=None, unique=True):
    # use unique=False to keep all values in the lists, even if repeated
    # like [100, 100, 200, 300], leave it True otherwise
    for doc in documents:
        if isinstance(doc, list):
            if base_document is None: base_document = []
            for d in doc:
                base_document.append(d)
            if unique==True: base_document = set(base_document)
            base_document = sorted(base_document)
        else:
            if base_document is None: base_document = {}
            for d in doc:
                b = base_document[d] if d in base_document \
                    else [] if isinstance(doc[d], list) else {}
                base_document[d] = aggregate([doc[d]], base_document=b)
    return base_document

通过使用以下一组文档进行测试，它将生成聚合：

documents = [   {20: {55: { 7: [100, 200]}}},
                {20: {68: {12: [100, 200]}}},
                {20: {68: {12: [500, 200]}}},
                {23: {67: {12: [100, 200]}}},
                {23: {68: {12: [100, 200]}}},
                {24: {67: {12: [300, 400]}}},
                {24: {67: {12: [100, 200]}}},
                {24: {67: {12: [100, 200]}}},
                {24: {67: {12: [300, 500]}}},
                {24: {67: {13: [600, 400]}}},
                {24: {67: {13: [700, 900]}}},
                {24: {68: {12: [100, 200]}}},
                {25: {67: {12: [100, 200]}}},
                {25: {67: {12: [300, 400]}}},   ]

from pprint import pprint
pprint(aggregate(documents))

''' 
{20: {55: {7: [100, 200]}, 68: {12: [100, 200, 500]}},
 23: {67: {12: [100, 200]}, 68: {12: [100, 200]}},
 24: {67: {12: [100, 200, 300, 400, 500], 13: [400, 600, 700, 900]},
      68: {12: [100, 200]}},
 25: {67: {12: [100, 200, 300, 400]}}}
'''

以@chapelo为基础：

##Import python mongodb API:
import pymongo

##Build aggregation framework:
def aggregate(documents, base_document=None, unique=True):
    # use unique=False to keep all values in the lists, even if repeated
    # like [100, 100, 200, 300], leave it True otherwise
    for doc in documents:
        if isinstance(doc, list):
            if base_document is None: base_document = []
            for d in doc:
                base_document.append(d)
            if unique==True: base_document = set(base_document)
            base_document = sorted(base_document)
        else:
            if base_document is None: base_document = {}
            for d in doc:
                b = base_document[d] if d in base_document \
                    else [] if isinstance(doc[d], list) else {}
                base_document[d] = aggregate([doc[d]], base_document=b)
    return base_document

##Open mongodb connection:
db = pymongo.MongoClient()

##Query old documents without ObjectIds:
old_dict = db.old.collection.find({},{"_id":0})

##Run old documents through aggregation framework:
new_dict =  aggregate(old_dict)

##Insert aggregated documents into new mongodb collection:
for i in new_dict:
   db.new.collection.insert({i:new_dict[i]})

##Close mongodb connection:
db.close()

mongodb中的指令不能为您解决这个问题吗？为什么不使用map reduce？动态键使聚合结果变得有点困难。@AlexLaties，我想将这些$push作为批处理作业来执行，由于数据结构的原因，我想知道它是否太复杂了。几乎就要完成了，但是仍然需要在第一个id上键入大量文档，所以看起来像{24:{..}，20:{..}@纳沃诺：我不明白你在评论中的观点，你能给我解释一下吗？也许你会指出你得到的答案有什么问题。我编辑了答案，以便更清楚地看到结果词典