Python如何在mongo db中找到重复的名称/文档?
我想根据名称在我的mongodb中找到重复的文档,我有以下代码:Python如何在mongo db中找到重复的名称/文档?,python,mongodb,pymongo,Python,Mongodb,Pymongo,我想根据名称在我的mongodb中找到重复的文档,我有以下代码: def Check_BFA_DB(options): issue_list=[] client = MongoClient(options.host, int(options.port)) db = client[options.db] collection = db[options.collection] names = [{'$project': {'name':
def Check_BFA_DB(options):
issue_list=[]
client = MongoClient(options.host, int(options.port))
db = client[options.db]
collection = db[options.collection]
names = [{'$project': {'name':'$name'}}]
name_cursor = collection.aggregate(names, cursor={})
for name in name_cursor:
issue_list.append(name)
print(name)
它会打印所有的名字,我怎么能只打印重复的名字
感谢您的帮助 以下查询将仅显示重复项:
db['collection_name'].aggregate([{'$group': {'_id':'$name', 'count': {'$sum': 1}}}, {'$match': {'count': {'$gt': 1}}}])
工作原理:
步骤1:
检查整个集合,按名为name
的属性对文档进行分组,并计算每个名称在集合中使用的次数
步骤2:
仅筛选计数大于1的文档(使用关键字match
)(gt
运算符)
一个示例(为mongo shell编写,但很容易适用于python):
结果是{“\u id”:“name1”,“count”:2}
因此,您的代码应该如下所示:
def Check_BFA_DB(options):
issue_list=[]
client = MongoClient(options.host, int(options.port))
db = client[options.db]
name_cursor = db[options.collection].aggregate([
{'$group': {'_id': '$name', 'count': {'$sum': 1}}},
{'$match': {'count': {'$gt': 1}}}
])
for document in name_cursor:
name = document['_id']
issue_list.append(name)
print(name)
顺便说一句(与问题无关),函数名的python命名约定是小写字母,因此您可能希望将其命名为check\u bfa\u db()
def Check_BFA_DB(options):
issue_list=[]
client = MongoClient(options.host, int(options.port))
db = client[options.db]
name_cursor = db[options.collection].aggregate([
{'$group': {'_id': '$name', 'count': {'$sum': 1}}},
{'$match': {'count': {'$gt': 1}}}
])
for document in name_cursor:
name = document['_id']
issue_list.append(name)
print(name)