在MongoDB中查找/计算数组中的重复值

在MongoDB中查找/计算数组中的重复值,mongodb,mongodb-query,aggregation-framework,Mongodb,Mongodb Query,Aggregation Framework,我是mongo数据库的新手。使用Robo3t软件 我必须根据通道id找出数组中的重复值 我做了一项研究,发现需要使用聚合进行分组并找到各自的计数。 我开发了以下查询,但结果与预期不符 示例文档: { "_id" : ObjectId("59b674d141b47e5401897d31"), "subscribed_channels" : [ { "channel_id" : "1001", "channel_nam

我是mongo数据库的新手。使用Robo3t软件
我必须根据通道id找出数组中的重复值
我做了一项研究,发现需要使用聚合进行分组并找到各自的计数。
我开发了以下查询,但结果与预期不符

示例文档:

{
    "_id" : ObjectId("59b674d141b47e5401897d31"),
    "subscribed_channels" : [ 
        {
            "channel_id" : "1001",
            "channel_name" : "StarPlus",
            "channelPrice":"100"
        }, 
        {
            "channel_id" : "1002",
            "channel_name" : "StarGold",
            "channelPrice":"75"
        }, 
        {
            "channel_id" : "1001",
            "channel_name" : "StarPlus",
            "channelPrice":"100"
        },
        {
            "channel_id" : "1003",
            "channel_name" : "SetMax",
            "channelPrice":"80"
        }
    ],
    "viewer_account_id" : "59b6745b41b47e5401143b3d",
    "public_id_type" : "PHONE_NUMBER",
    "viewer_id" : "+919322264403",
    "role" : "CONSUMER",
    "active" : true,
    "date_time_created" : NumberLong(1505129681330),
    "date_time_modified" : NumberLong(1569320824387)
}

{
        "_id" : ObjectId("59b674d141b47e5401897d31"),
        "subscribed_channels" : [ 
            {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            }, 
            {
                "channel_id" : "1002",
                "channel_name" : "StarGold",
                "channelPrice":"75"
            }, 
            {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            },
             {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            }
        ],
        "viewer_account_id" : "59b6745b41b47e5401143c56",
        "public_id_type" : "PHONE_NUMBER",
        "viewer_id" : "+919322264404",
        "role" : "CONSUMER",
        "active" : true,
        "date_time_created" : NumberLong(1505129681330),
        "date_time_modified" : NumberLong(1569320824387)
    }
db.getCollection('viewers').aggregate([ 
        {
                    "$group" : 
                    {_id:{
                        //viewer_id:"$consumer_id",
                        enterprise_id:"$subscribed_channels.channel_id",
                         }, 
                         "viewer_id": {
                             $first: "$viewer_id"
                        },
                        count:{$sum:1}
                        }},

                        {
                          "$match": {"count": { "$gt": 1 }}
                        }
                 ]) 
上面只是文档查看器的两条记录

查询:

{
    "_id" : ObjectId("59b674d141b47e5401897d31"),
    "subscribed_channels" : [ 
        {
            "channel_id" : "1001",
            "channel_name" : "StarPlus",
            "channelPrice":"100"
        }, 
        {
            "channel_id" : "1002",
            "channel_name" : "StarGold",
            "channelPrice":"75"
        }, 
        {
            "channel_id" : "1001",
            "channel_name" : "StarPlus",
            "channelPrice":"100"
        },
        {
            "channel_id" : "1003",
            "channel_name" : "SetMax",
            "channelPrice":"80"
        }
    ],
    "viewer_account_id" : "59b6745b41b47e5401143b3d",
    "public_id_type" : "PHONE_NUMBER",
    "viewer_id" : "+919322264403",
    "role" : "CONSUMER",
    "active" : true,
    "date_time_created" : NumberLong(1505129681330),
    "date_time_modified" : NumberLong(1569320824387)
}

{
        "_id" : ObjectId("59b674d141b47e5401897d31"),
        "subscribed_channels" : [ 
            {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            }, 
            {
                "channel_id" : "1002",
                "channel_name" : "StarGold",
                "channelPrice":"75"
            }, 
            {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            },
             {
                "channel_id" : "1001",
                "channel_name" : "StarPlus",
                "channelPrice":"100"
            }
        ],
        "viewer_account_id" : "59b6745b41b47e5401143c56",
        "public_id_type" : "PHONE_NUMBER",
        "viewer_id" : "+919322264404",
        "role" : "CONSUMER",
        "active" : true,
        "date_time_created" : NumberLong(1505129681330),
        "date_time_modified" : NumberLong(1569320824387)
    }
db.getCollection('viewers').aggregate([ 
        {
                    "$group" : 
                    {_id:{
                        //viewer_id:"$consumer_id",
                        enterprise_id:"$subscribed_channels.channel_id",
                         }, 
                         "viewer_id": {
                             $first: "$viewer_id"
                        },
                        count:{$sum:1}
                        }},

                        {
                          "$match": {"count": { "$gt": 1 }}
                        }
                 ]) 
实际输出:

{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1001", 
            "1002",
            "1003"
        ]
    },
    "consumer_id" : "+919322264403",
    "count" : 2.0
}
{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1002", 
            "1001",
            "1001
        ]
    },
    "consumer_id" : "+919322264404",
    "count" : 2.0
}
预期输出:

db.collection.aggregate([
  /** project only needed fields & transform fields as you like */
  {
    $project: {
      customer_id: "$viewer_id",
      enterprise_id: "$subscribed_channels.channel_id",
      count: {
        /** Subtract size of original array & newly formed array which has unique values to get count of duplicates */
        $subtract: [
          {
            $size: "$subscribed_channels.channel_id" // get size of original array
          },
          {
            $size: {
              $setUnion: ["$subscribed_channels.channel_id", []] // This will give you an array with unique elements & get size of it
            }
          }
        ]
      }
    }
  }
]);
我想根据订阅的\u频道进行分组。频道id并分别获取计数

{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1001", 
            "1002",
            "1003"
        ]
    },
    "consumer_id" : "+919322264403",
    "count" : 2.0
}
{
    "_id" : {
        "enterprise_id" : [ 
            "1001", 
            "1001", 
            "1001",
            "1002
        ]
    },
    "consumer_id" : "+919322264404",
    "count" : 3.0
}
未根据通道id进行分组,计数也不正确。
计数甚至没有给我任何订阅的频道id,也没有给我重复的频道id

请指导我创建一个能给出正确结果的查询。

尝试以下查询:

查询:

db.collection.aggregate([
  /** project only needed fields & transform fields as you like */
  {
    $project: {
      customer_id: "$viewer_id",
      enterprise_id: "$subscribed_channels.channel_id",
      count: {
        /** Subtract size of original array & newly formed array which has unique values to get count of duplicates */
        $subtract: [
          {
            $size: "$subscribed_channels.channel_id" // get size of original array
          },
          {
            $size: {
              $setUnion: ["$subscribed_channels.channel_id", []] // This will give you an array with unique elements & get size of it
            }
          }
        ]
      }
    }
  }
]);

测试:

所以,如果您想要重复,第一个文档将有1个,第二个文档将有2个,如果我没有错,这是正确的还是您给出的是正确的?第一份文件中的原因
[“1001”、“1002”、“1003”]
将是唯一的,只有重复的才是另一份
1001
。。然后,如果你有这个代码> [ 1002 ],“1002”,“1001”,“1001”] /代码>你认为它是4个重复吗?谢谢你的回复。我想根据文件的结果。由于有两个1001,第一个文件将计数为2,第二个文件应计数为3,因为有三个1001。另外,根据您的理解,如果我得到第一个文档,给出1个文档,第二个文档,给出2个文档,这将起作用。如果需要任何其他澄清,请让我知道,我将更新我的问题hi@whoami。我想根据通道id突出显示包含重复项的文档。你能给我一个查询的开头吗?我觉得第一个为1,第二个为2是完美的,这是正确的,因为这些是重复元素的数量(如果你只需要有重复项的文档,你不需要计数,这是你的实际问题吗?还是你想要所有文档和一个添加的字段(一些字段,如hasDups:true)对于那些有副本的文档?@whoami,是的,谢谢您的建议。是的,你是对的,第一个为1,第二个为2,这是完美的,符合我的要求,因为我知道文档中重复了哪些通道ID。另外,添加一个字段也足够了,但选项1看起来更突出。嗨@whoami。执行上述查询时出现错误,错误为:命令失败:{“ok”:0,“errmsg”:“$size的参数必须是数组,但类型为:missing”,“code”:17124,“codeName”:“Location17124”}:聚合失败。我的文档损坏了吗?@AjinkyaKarode:是的,我想你的一些文档没有订阅数组的频道。你能看看这是否正确吗?你想对这些做什么?嗨@whoami抱歉,我刚刚验证了所有文档,是的,在一些文档中没有订阅的频道字段,因为该查看器没有订阅。我如何处理这些文件?@AjinkyaKarode:你对这些文件做了什么?你想从结果中删除这些吗?嗨@whoami。谢谢你的努力,刚刚验证了输出。我的计数超过了0。我会查看你推荐的网站,但如果我想向你学习,有可能吗?有电子邮件id或linkedin吗?