Python 相对于定位列使多个集合行相交_Python_Mongodb_Mongodb Query_Pymongo

Python 相对于定位列使多个集合行相交

python mongodb

Python 相对于定位列使多个集合行相交,python,mongodb,mongodb-query,pymongo,Python,Mongodb,Mongodb Query,Pymongo,我有N个不同大小的集合，例如，Collection 1和Collection 2。是否有一种有效的方法可以通过以下方式将具有相同源id值的行与相应的目标id一起保留 --Collection 1 `source` `target_1` 34 5 45 9 22 2 22 7 <--- not unique source id --Collection 2 `source` `target_2` 34 23

我有N个不同大小的集合，例如，

Collection 1

和

Collection 2

。是否有一种有效的方法可以通过以下方式将具有相同

源

id值的行与相应的

目标

id一起保留

--Collection 1
`source` `target_1`
34        5
45        9
22        2
22        7     <--- not unique source id



--Collection 2
`source` `target_2`
34        23
25        9
22        8
17        99


--Result
`source` `target_1` `target_2`
34        5          23
22        2          8
22        7          8

我正在进入我的数据集

{'_id': ObjectId('someCode'), 'source': [], 'target_1': 520838}
{'_id': ObjectId('someCode'), 'source': [{'_id': ObjectId('someCode'), 'target_2': 62483, 'source': 38758}, {'_id': ObjectId('someCode'), 'target_1': 62483, 'source': 38758}], 'target_1': 68099}

为了在示例中给出行为，请提供任何改进查询的建议

EDIT2:

在

@AlexBlex

之后，我有了一些评论

pipeline = [{'$lookup': {'from': 'Collection 2', 'localField': 'source', 'foreignField': 'source', 'as': 'source'}},{'$project': {"aligned._id": 0, "aligned.en": 0}]
    for doc in (db['Collection 1'].aggregate(pipeline)): 
        print(doc)

pipeline = [{'$lookup': {'from': 'Collection 2', 'localField': 'source', 'foreignField': 'source', 'as': 'source'}},{"$project": {'source': 1, 'target_1': 1, 'target_2': "$aligned.target_2"}]
    for doc in (db['Collection 1'].aggregate(pipeline)): 
        print(doc)

{'_id': ObjectId('somecode'), 'source': 38758, 'target_1': 68099, 'aligned': {'target_2': 62483}}

{'_id': ObjectId('somecode'), 'source': 9770, 'target_1': 4802, 'target_2': 180}
{'_id': ObjectId('somecode'), 'source': 9770, 'target_1': 4802, 'target_2': 180}
{'_id': ObjectId('somecode'), 'source': 9770, 'target_1': 4802, 'target_2': 180}
{'_id': ObjectId('somecode'), 'source': 9770, 'target_1': 4802, 'target_2': 5689}
{'_id': ObjectId('somecode'), 'source': 124, 'target_1': 78, 'target_2': 250}
{'_id': ObjectId('somecode'), 'source': 124, 'target_1': 78, 'target_2': 250}
{'_id': ObjectId('somecode'), 'source': 124, 'target_1': 78, 'target_2': 250}

我们是否可以进一步删除

aligned

标记以获得

{'_id': ObjectId('somecode'), 'source': 38758, 'target_1': 68099, 'target_2': 62483}

EDIT3:

在

@AlexBlex

之后，我有了一些评论

pipeline = [{'$lookup': {'from': 'Collection 2', 'localField': 'source', 'foreignField': 'source', 'as': 'source'}},{'$project': {"aligned._id": 0, "aligned.en": 0}]
    for doc in (db['Collection 1'].aggregate(pipeline)): 
        print(doc)

pipeline = [{'$lookup': {'from': 'Collection 2', 'localField': 'source', 'foreignField': 'source', 'as': 'source'}},{"$project": {'source': 1, 'target_1': 1, 'target_2': "$aligned.target_2"}]
    for doc in (db['Collection 1'].aggregate(pipeline)): 
        print(doc)

{'_id': ObjectId('somecode'), 'source': 38758, 'target_1': 68099, 'aligned': {'target_2': 62483}}

{'_id': ObjectId('somecode'), 'source': 9770, 'target_1': 4802, 'target_2': 180}
{'_id': ObjectId('somecode'), 'source': 9770, 'target_1': 4802, 'target_2': 180}
{'_id': ObjectId('somecode'), 'source': 9770, 'target_1': 4802, 'target_2': 180}
{'_id': ObjectId('somecode'), 'source': 9770, 'target_1': 4802, 'target_2': 5689}
{'_id': ObjectId('somecode'), 'source': 124, 'target_1': 78, 'target_2': 250}
{'_id': ObjectId('somecode'), 'source': 124, 'target_1': 78, 'target_2': 250}
{'_id': ObjectId('somecode'), 'source': 124, 'target_1': 78, 'target_2': 250}

@AlexBlex感谢您的回复。能否提供一个例子？我不知道pymondo的

查询，尤其是当它们稍微复杂一点时，至少对我来说是这样。'as'：'source'
覆盖集合1中原始文档中的source
字段。输出显示没有与目标_1 520838匹配的文档，目标_1 68099有2个匹配文档。它还说，集合_2包含带有target_1的文档，除非是打字错误，否则没有target_2。我认为'en'：38758
应该是'source'：38758
。好吧，这是数据库的错误。这是一个幽灵。许多年前，它在v3中被删除。像以前一样使用分组，只需将其移动到管道的第一阶段，就可以从索引中获益。为了获得最佳性能，请使用所有3个字段创建一个复合索引，因为mongo很少使用索引交集。或者编写一个脚本来清理数据，并从集合中永久删除重复的文档。