多文档字段上的MongoDB精确匹配_Mongodb_Pymongo

多文档字段上的MongoDB精确匹配

mongodb

多文档字段上的MongoDB精确匹配,mongodb,pymongo,Mongodb,Pymongo,我正在尝试使用PyMongo构建一个Python脚本，该脚本将能够命中一个Mongo DB，该DB可以获得数据库中可能存在的n个对象的精确匹配。目前，我有以下设置： db.entries.find({'$or': [<list-of-objects]}) 当列表中有10个左右的项目时，使用$或可以正常工作。我现在用100进行测试，并且需要很长时间才能回来。我考虑过在过滤器中使用多个$，但我不知道这是否是最好的选择我相信有更好的方法来处理这个问题，但我对Mongo还是相当陌生的编辑：下

我正在尝试使用PyMongo构建一个Python脚本，该脚本将能够命中一个Mongo DB，该DB可以获得数据库中可能存在的n个对象的精确匹配。目前，我有以下设置：

db.entries.find({'$or': [<list-of-objects]})

当列表中有10个左右的项目时，使用

$或可以正常工作。我现在用100进行测试，并且需要很长时间才能回来。我考虑过在
过滤器中使用多个$，但我不知道这是否是最好的选择
我相信有更好的方法来处理这个问题，但我对Mongo还是相当陌生的
编辑：下面的.explain（）
的输出：
{
    "executionStats": {
        "executionTimeMillis": 228734,
        "nReturned": 2,
        "totalKeysExamined": 0,
        "allPlansExecution": [],
        "executionSuccess": true,
        "executionStages": {
            "needYield": 0,
            "saveState": 43556,
            "restoreState": 43556,
            "isEOF": 1,
            "inputStage": {
                "needYield": 0,
                "saveState": 43556,
                "restoreState": 43556,
                "isEOF": 1,
                "inputStage": {
                    "needYield": 0,
                    "direction": "forward",
                    "saveState": 43556,
                    "restoreState": 43556,
                    "isEOF": 1,
                    "docsExamined": 5453000,
                    "nReturned": 2,
                    "needTime": 5452999,
                    "filter": {
                        "$or": [{
                            "$and": [{
                                "email": {
                                    "$eq": "some@email.com"
                                }
                            }, {
                                "zipcode": {
                                    "$eq": "11111"
                                }
                            }]
                        }, {
                            "$and": [{
                                "email": {
                                    "$eq": "another@email.com"
                                }
                            }, {
                                "zipcode": {
                                    "$eq": "11112"
                                }
                            }]
                        }]
                    },
                    "executionTimeMillisEstimate": 208083,
                    "invalidates": 0,
                    "works": 5453002,
                    "advanced": 2,
                    "stage": "COLLSCAN"
                },
                "nReturned": 2,
                "needTime": 5452999,
                "executionTimeMillisEstimate": 211503,
                "transformBy": {
                    "_id": false
                },
                "invalidates": 0,
                "works": 5453002,
                "advanced": 2,
                "stage": "PROJECTION"
            },
            "nReturned": 2,
            "needTime": 5452999,
            "executionTimeMillisEstimate": 213671,
            "invalidates": 0,
            "works": 5453002,
            "advanced": 2,
            "stage": "SUBPLAN"
        },
        "totalDocsExamined": 5453000
    },
    "queryPlanner": {
        "parsedQuery": {
            "$or": [{
                "$and": [{
                    "email": {
                        "$eq": "some@email.com"
                    }
                }, {
                    "zipcode": {
                        "$eq": "11111"
                    }
                }]
            }, {
                "$and": [{
                    "email": {
                        "$eq": "another@email.com"
                    }
                }, {
                    "zipcode": {
                        "$eq": "11112"
                    }
                }]
            }]
        },
        "rejectedPlans": [],
        "namespace": "db.entries",
        "winningPlan": {
            "inputStage": {
                "transformBy": {
                    "_id": false
                },
                "inputStage": {
                    "filter": {
                        "$or": [{
                            "$and": [{
                                "email": {
                                    "$eq": "some@email.com"
                                }
                            }, {
                                "zipcode": {
                                    "$eq": "11111"
                                }
                            }]
                        }, {
                            "$and": [{
                                "email": {
                                    "$eq": "another@email.com"
                                }
                            }, {
                                "zipcode": {
                                    "$eq": "11112"
                                }
                            }]
                        }]
                    },
                    "direction": "forward",
                    "stage": "COLLSCAN"
                },
                "stage": "PROJECTION"
            },
            "stage": "SUBPLAN"
        },
        "indexFilterSet": false,
        "plannerVersion": 1
    },
    "ok": 1.0,
    "serverInfo": {
        "host": "somehost",
        "version": "3.4.6",
        "port": 27017,
        "gitVersion": "c55eb86ef46ee7aede3b1e2a5d184a7df4bfb5b5"
    }
}

我建议创建一个新索引（复合索引），就像您使用两个字段进行搜索一样：
db.entries.createIndex( {"email": 1, "zip": 1} )

现在，在查询中附加explain（）命令运行查询，您应该看到它已开始使用IXSCAN而不是COLLSCAN。
为了避免索引和重新索引（此查询不仅涉及电子邮件/zip，将是动态的），我使用每个标题构建数据列表，并将其用作$in
参数，然后将它们传递到$和。它似乎工作得很好，而且没有超过3分钟
例如：
{'$and': [{'email': {'$in': ['some@example.com', 'fake@example.com', 'email@example.com']}, 'zipcode': {'$in': ['12345', '11111', '11112']}}]}

请添加.explain（）。为什么不创建1。在包含高基数的任何字段上创建索引，它可以是邮政编码或电子邮件。2.使用聚合管道，使用用于创建索引的字段选择文档，然后使用新索引筛选出大量文档。希望这有帮助。@Euclides，这个数据集可能会变得更大。此外，它将不仅仅是两个对象；这两个只是我作为测试运行的10k的一个片段。当然，当我向or子句添加更多对象时，这将变得更大，效率也会低很多。我正在寻找一种方法，希望能够绕过这个限制。您提到使用聚合，如果您能帮助我构建某种通用查询，我将不胜感激！
{'$and': [{'email': {'$in': ['some@example.com', 'fake@example.com', 'email@example.com']}, 'zipcode': {'$in': ['12345', '11111', '11112']}}]}