mongodb按多个字段分组值

mongodb按多个字段分组值,mongodb,aggregation-framework,Mongodb,Aggregation Framework,例如,我有以下文件: { "addr": "address1", "book": "book1" }, { "addr": "address2", "book": "book1" }, { "addr": "address1", "book": "book5" }, { "addr": "address3", "book": "book9" }, { "addr": "address2", "book": "book5" }, { "addr": "a

例如,我有以下文件:

{
  "addr": "address1",
  "book": "book1"
},
{
  "addr": "address2",
  "book": "book1"
},
{
  "addr": "address1",
  "book": "book5"
},
{
  "addr": "address3",
  "book": "book9"
},
{
  "addr": "address2",
  "book": "book5"
},
{
  "addr": "address2",
  "book": "book1"
},
{
  "addr": "address1",
  "book": "book1"
},
{
  "addr": "address15",
  "book": "book1"
},
{
  "addr": "address9",
  "book": "book99"
},
{
  "addr": "address90",
  "book": "book33"
},
{
  "addr": "address4",
  "book": "book3"
},
{
  "addr": "address5",
  "book": "book1"
},
{
  "addr": "address77",
  "book": "book11"
},
{
  "addr": "address1",
  "book": "book1"
}

等等。


我怎样才能提出请求,哪个将描述前N个地址和每个地址前M本书?

预期结果示例:

地址1 |书1:5
|书2:10
|书3:50
|总计:65
|书2:10

|总计:M*10

| uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu
地址n | book_u1:20
| book_u2:20
| book_um:20
|总计:M*20,使用如下聚合函数:

[
{$group: {_id : {book : '$book',address:'$addr'}, total:{$sum :1}}},
{$project : {book : '$_id.book', address : '$_id.address', total : '$total', _id : 0}}
]
它将为您提供如下结果:

        {
            "total" : 1,
            "book" : "book33",
            "address" : "address90"
        }, 
        {
            "total" : 1,
            "book" : "book5",
            "address" : "address1"
        }, 
        {
            "total" : 1,
            "book" : "book99",
            "address" : "address9"
        }, 
        {
            "total" : 1,
            "book" : "book1",
            "address" : "address5"
        }, 
        {
            "total" : 1,
            "book" : "book5",
            "address" : "address2"
        }, 
        {
            "total" : 1,
            "book" : "book3",
            "address" : "address4"
        }, 
        {
            "total" : 1,
            "book" : "book11",
            "address" : "address77"
        }, 
        {
            "total" : 1,
            "book" : "book9",
            "address" : "address3"
        }, 
        {
            "total" : 1,
            "book" : "book1",
            "address" : "address15"
        }, 
        {
            "total" : 2,
            "book" : "book1",
            "address" : "address2"
        }, 
        {
            "total" : 3,
            "book" : "book1",
            "address" : "address1"
        }
我没有完全了解您的预期结果格式,因此请随意将其修改为您需要的格式。

TLDR Summary 在现代MongoDB版本中,您可以使用基本聚合结果来强制执行此操作。对于“大”结果,请对每个分组运行并行查询(答案末尾有一个演示列表),或者等待解析,这将允许对
$push
到数组的项数进行“限制”

db.books.aggregate([
{“$组”:{
“_id”:{
“地址”:“$addr”,
“书”:“$book”
},
“簿记”:{“$sum”:1}
}},
{“$组”:{
“\u id”:“$\u id.addr”,
“书籍”:{
“$push”:{
“书”:“$\u id.book”,
“计数”:“$bookCount”
},
},
“计数”:{“$sum”:“$bookCount”}
}},
{“$sort”:{“count”:-1},
{“$limit”:2},
{“$project”:{
“图书”:{“$slice”:[“$books”,2]},
“计数”:1
}}
])

MongoDB 3.6预览版 仍然没有解析,但在此版本中允许一个新的“非相关”选项,该选项将
“管道”
表达式作为参数,而不是
“localFields”
“foreignFields”
选项。然后,这允许与另一个管道表达式进行“自连接”,我们可以在其中应用,以返回“top-n”结果

db.books.aggregate([
{“$组”:{
“_id”:“$addr”,
“计数”:{“$sum”:1}
}},
{“$sort”:{“count”:-1},
{“$limit”:2},
{“$lookup”:{
“来自”:“书籍”,
“让”:{
“地址”:“$\u id”
},
“管道”:[
{“$match”:{
“$expr”:{“$eq”:[“$addr”,“$$addr”]}
}},
{“$组”:{
“_id”:“$book”,
“计数”:{“$sum”:1}
}},
{“$sort”:{“count”:-1},
{“$limit”:2}
],
“作为”:“书籍”
}}
])
这里的另一个新增功能当然是通过
$expr
插入变量,用于选择“连接”中的匹配项,但一般前提是“管道中的管道”,其中内部内容可以通过父级的匹配项进行过滤。由于它们本身都是“管道”,我们可以分别得出各自的结果

这将是运行并行查询的下一个最佳选项,如果允许并能够在“子管道”处理中使用索引,实际上会更好。因此,它没有像引用的问题所要求的那样使用“限制到
$push
”,它实际上提供了一些应该更好的功能


原始内容 你似乎偶然发现了最重要的“N”问题。在某种程度上,您的问题相当容易解决,但没有您要求的确切限制:

db.books.aggregate([
{“$组”:{
“_id”:{
“地址”:“$addr”,
“书”:“$book”
},
“簿记”:{“$sum”:1}
}},
{“$组”:{
“\u id”:“$\u id.addr”,
“书籍”:{
“$push”:{
“书”:“$\u id.book”,
“计数”:“$bookCount”
},
},
“计数”:{“$sum”:“$bookCount”}
}},
{“$sort”:{“count”:-1},
{“$limit”:2}
])
现在,这将给你一个这样的结果:

{
“结果”:[
{
“_id”:“地址1”,
“书籍”:[
{
“书”:“书4”,
“计数”:1
},
{
“书”:“书5”,
“计数”:1
},
{
“书”:“书1”,
“计数”:3
}
],
“计数”:5
},
{
“_id”:“地址2”,
“书籍”:[
{
“书”:“书5”,
“计数”:1
},
{
“书”:“书1”,
“计数”:2
}
],
“计数”:3
}
],
“好”:1
}
因此,这与您要求的不同之处在于,虽然我们确实获得了地址值的最高结果,但基本的“books”选择并不局限于所需的结果数量

这很难做到,但可以做到,尽管复杂性随着需要匹配的项目数量的增加而增加。为了保持简单,我们最多可以保持2个匹配:

db.books.aggregate([
{“$组”:{
“_id”:{
“地址”:“$addr”,
“书”:“$book”
},
“簿记”:{“$sum”:1}
}},
{“$组”:{
“\u id”:“$\u id.addr”,
“书籍”:{
“$push”:{
“书”:“$\u id.book”,
“计数”:“$bookCount”
db.books.aggregate([
    {
        $group: {
            _id: { addresses: "$addr", books: "$book" },
            num: { $sum :1 }
        }
    },
    {
        $group: {
            _id: "$_id.addresses",
            bookCounts: { $push: { bookName: "$_id.books",count: "$num" } }
        }
    },
    {
        $project: {
            _id: 1,
            bookCounts:1,
            "totalBookAtAddress": {
                "$sum": "$bookCounts.count"
            }
        }
    }

]) 
/* 1 */
{
    "_id" : "address4",
    "bookCounts" : [
        {
            "bookName" : "book3",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 2 */
{
    "_id" : "address90",
    "bookCounts" : [
        {
            "bookName" : "book33",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 3 */
{
    "_id" : "address15",
    "bookCounts" : [
        {
            "bookName" : "book1",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 4 */
{
    "_id" : "address3",
    "bookCounts" : [
        {
            "bookName" : "book9",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 5 */
{
    "_id" : "address5",
    "bookCounts" : [
        {
            "bookName" : "book1",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 6 */
{
    "_id" : "address1",
    "bookCounts" : [
        {
            "bookName" : "book1",
            "count" : 3
        },
        {
            "bookName" : "book5",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 4
},

/* 7 */
{
    "_id" : "address2",
    "bookCounts" : [
        {
            "bookName" : "book1",
            "count" : 2
        },
        {
            "bookName" : "book5",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 3
},

/* 8 */
{
    "_id" : "address77",
    "bookCounts" : [
        {
            "bookName" : "book11",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
},

/* 9 */
{
    "_id" : "address9",
    "bookCounts" : [
        {
            "bookName" : "book99",
            "count" : 1
        }
    ],
    "totalBookAtAddress" : 1
}