MongoDB mapReduce()查询以将数据聚合到记录内的列表中
我有一个MongoDB集合,记录如下:MongoDB mapReduce()查询以将数据聚合到记录内的列表中,mongodb,mongodb-query,aggregation-framework,Mongodb,Mongodb Query,Aggregation Framework,我有一个MongoDB集合,记录如下: { "_id" : ObjectId("562d6d9c3a2e9c0adbb02f14"), "slug" : "1:955553", "subslug" : "1:955553:02", "score" : "0.615", "position_start" : "1", "position_end" : 955553, "name" : "AGRN", "ref" : "A" }, {
{
"_id" : ObjectId("562d6d9c3a2e9c0adbb02f14"),
"slug" : "1:955553",
"subslug" : "1:955553:02",
"score" : "0.615",
"position_start" : "1",
"position_end" : 955553,
"name" : "AGRN",
"ref" : "A"
},
{
"_id" : ObjectId("562d6d9c3a2e9c0adbb02f15"),
"slug" : "2:15553",
"subslug" : "2:15553:01",
"score" : "0.915",
"position_start" : "1002",
"position_end" : 15553,
"name" : "MMFR",
"ref" : "C"
}
{
"_id" : ObjectId("562d6d9c3a2e9c0adbb02f16"),
"slug" : "1:955553",
"subslug" : "1:955553:01",
"score" : "0.715",
"position_start" : 1,
"position_end" : 955553,
"name" : "AGRN",
"ref" : "A"
},
我想按slug聚合这个集合和分组(注意这里的第一个和第三个记录具有相同的slug)
我正在尝试将我的数据聚合到一个新集合中,该集合如下所示:
{
"_id" : "<?>",
"slug" : "1:955553",
"components" : [
{
"subslug": "1:955553:01",
"score": 0.615,
"position_start": 1,
"position_end": 955553,
"name": AGRN,
"ref": "A"
},
{
"subslug": "1:955553:02",
"score": 0.715,
"position_start": 1,
"position_end": 955553,
"name": AGRN,
"ref": "A"
},
]
},
{
"_id" : "<?>",
"slug" : "2:15553",
"components" : [
{
"subslug": "2:15553:01",
"score": 0.915,
"position_start": 1002,
"position_end": 15553,
"name": MMFR,
"ref": "C"
}
]
}
{
"_id" : "1:955553",
"value" : {
"components" : {
"$push" : {
"_id" : ObjectId("562d6d9c3a2e9c0adbb02f14"),
"slug" : "1:955553",
"subslug" : "1:955553:01",
"position_start" : 1,
"position_end" : 955553,
"gene" : "AGRN",
"ref" : "A"
}
}
}
}
但不幸的是,这构建了一个如下所示的表:
{
"_id" : "<?>",
"slug" : "1:955553",
"components" : [
{
"subslug": "1:955553:01",
"score": 0.615,
"position_start": 1,
"position_end": 955553,
"name": AGRN,
"ref": "A"
},
{
"subslug": "1:955553:02",
"score": 0.715,
"position_start": 1,
"position_end": 955553,
"name": AGRN,
"ref": "A"
},
]
},
{
"_id" : "<?>",
"slug" : "2:15553",
"components" : [
{
"subslug": "2:15553:01",
"score": 0.915,
"position_start": 1002,
"position_end": 15553,
"name": MMFR,
"ref": "C"
}
]
}
{
"_id" : "1:955553",
"value" : {
"components" : {
"$push" : {
"_id" : ObjectId("562d6d9c3a2e9c0adbb02f14"),
"slug" : "1:955553",
"subslug" : "1:955553:01",
"position_start" : 1,
"position_end" : 955553,
"gene" : "AGRN",
"ref" : "A"
}
}
}
}
这不是我需要的。我试图使用$push
来附加组件
数组,但显然$push
在mapReduce()
中不受尊重
有谁能给我一些关于如何获取上面的输入集合数据并创建所需的输出集合的建议吗?我的mapReduce()
查询是否正确?最好使用进行此类操作,其速度应比map reduce操作快数倍
通常,您将构建一个包含3个阶段的聚合管道:
- 阶段-此管道步骤按
字段作为键对文档进行分组,然后应用累加器操作符创建slug
数组,该数组是对上述组中的每个文档应用表达式的结果组件
- 阶段-这将重塑流中的每个文档,例如通过添加新字段或删除现有字段
- 阶段-此最后一步将聚合管道的结果文档写入新集合
mytable
的新集合中为您提供所需的结果:
db.vest.aggregate([
{
"$group": {
"_id": "$slug",
"components": {
"$push": {
"subslug": "$subslug",
"score": "$score",
"position_start": "$position_start",
"position_end": "$position_end",
"name": "$name",
"ref": "$ref"
}
}
}
},
{
"$project": {
"_id": 0, "slug": "$_id", "components": 1
}
},
{ "$out": "mytable" }
])
使用上述示例数据查询此集合
db.mytable.find()
将为您提供所需的输出:
样本输出:
/* 0 */
{
"_id" : ObjectId("563bc608d1f71f49c3d6c80b"),
"components" : [
{
"subslug" : "2:15553:01",
"score" : "0.915",
"position_start" : "1002",
"position_end" : 15553,
"name" : "MMFR",
"ref" : "C"
}
],
"slug" : "2:15553"
}
/* 1 */
{
"_id" : ObjectId("563bc608d1f71f49c3d6c80c"),
"components" : [
{
"subslug" : "1:955553:02",
"score" : "0.615",
"position_start" : "1",
"position_end" : 955553,
"name" : "AGRN",
"ref" : "A"
},
{
"subslug" : "1:955553:01",
"score" : "0.715",
"position_start" : 1,
"position_end" : 955553,
"name" : "AGRN",
"ref" : "A"
}
],
"slug" : "1:955553"
}
您确实不需要,也应该为此使用mapReduce。您应该使用提供对的访问的方法。您所需要的只是通过“slug”创建文档,并使用累加器操作符返回所有其他字段的数组。该阶段用于从聚合结果中排除“\u id”字段 也就是说,您可以使用操作符将聚合管道的结果文档发送到@chridam的回答中提到的另一个集合,但是因为 不能将分片集合指定为输出集合。管道的输入集合可以分片 $out运算符无法将结果写入 您应该使用操作将结果写入新集合
var bulk=db.newcollection.initializeUnderedBulkop();
db.collection.aggregate([
{“$group”:{
“_id”:“$slug”,
“组成部分”:{
“$push”:{
“subslug”:“$subslug”,
“分数”:“$score”,
“位置开始”:“$position\u start”,
“位置结束”:“$position\u end”,
“名称”:“$name”,
“ref”:“$ref”
}
}
}},
{“$project”:{
“slug”:“$\u id”,
“组成部分”:1,
“\u id”:0
}}
]).forEach(功能(文档){
批量插入(doc);
})
bulk.execute();
然后,db.newcollection.find()
产生如下结果:
{
“_id”:ObjectId(“563bc8a6bf93306f8f6638ce”),
“组成部分”:[
{
“slug”:“1:955553”,
“子插头”:“1:955553:02”,
“得分”:“0.615”,
“位置启动”:“1”,
“位置_端”:955553,
“名称”:“AGRN”,
“参考”:“A”
},
{
“slug”:“1:955553”,
“子插头”:“1:955553:01”,
“分数”:“0.715”,
“位置启动”:1,
“位置_端”:955553,
“名称”:“AGRN”,
“参考”:“A”
}
],
“slug”:“1:955553”
}
这看起来真的很棒。我的收藏量很大,所以内存有限。我尝试添加:…],{allowDiskUse:true}
,但仍然看到相同的超出了$group的内存限制
错误。有什么想法吗?更新:当我添加$match
元素来预过滤查询时,效果非常好!另一个更新(针对未来的更新):我看到的错误实际上是由于使用了2.6之前的客户端。我升级了我的mongo客户端,一切都很好。