Performance 如何提高mongodb map/reduce的性能
我的mongodb集群使用3个具有7000万条记录的碎片 硬件:16g内存 我想用下面的地图/减少进度来做一些计算Performance 如何提高mongodb map/reduce的性能,performance,mongodb,mapreduce,Performance,Mongodb,Mapreduce,我的mongodb集群使用3个具有7000万条记录的碎片 硬件:16g内存 我想用下面的地图/减少进度来做一些计算 db.runCommand({ mapReduce: 'orders', map: function(){ var key = { 'name': this.Receiver_name, 'mobile': this.Receiver_mobile }; var values = { 'count': 1, 'dates': [ th
db.runCommand({
mapReduce: 'orders',
map: function(){
var key = { 'name': this.Receiver_name, 'mobile': this.Receiver_mobile };
var values = { 'count': 1, 'dates': [ this.Receiver_date ] };
emit(key, values);
},
reduce: function(key, values){
var result = { count: 0, dates: 0 };
var dates = [];
values.forEach(function(value){
result.count += value.count;
dates = dates.concat(value.dates);
});
result.dates = new Date(Math.min.apply(Math, dates));
return result;
},
sort: { Receiver_name: 1, Receiver_mobile: 1 }, //do index
out: { replace: 'localtest' }
})
我参考了一些不使用多线程和多数据库的建议。由于2.6.11上的mongodb测试多线程错误,我无法使用ScopedThread()函数
因此,结果仍在继续
下面是当前操作日志:S1已完成,注意S3在映射阶段仅占2%
{
"inprog" : [
{
"opid" : "s3:99457482",
"active" : true,
"secs_running" : 1644,
"microsecs_running" : NumberLong(1644096961),
"op" : "query",
"ns" : "express.orders",
"query" : {
"$msg" : "query not recording (too large)"
},
"client_s" : "222.31.79.193:36487",
"desc" : "conn41",
"threadId" : "0x7fc4ff361700",
"connectionId" : 41,
"waitingForLock" : false,
"msg" : "m/r: (1/3) emit phase M/R: (1/3) Emit Progress: 740369/29745378 2%",
"progress" : {
"done" : 740369,
"total" : 29745378
},
"numYields" : 213942,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(299591221),
"w" : NumberLong(1368691)
},
"timeAcquiringMicros" : {
"r" : NumberLong(708109),
"w" : NumberLong(91251)
}
}
},
{
"opid" : "s2:158299848",
"active" : true,
"secs_running" : 1644,
"microsecs_running" : NumberLong(1644123918),
"op" : "query",
"ns" : "express.orders",
"query" : {
"$msg" : "query not recording (too large)"
},
"client_s" : "222.31.79.193:48366",
"desc" : "conn2332",
"threadId" : "0x7f850d9a6700",
"connectionId" : 2332,
"locks" : {
"^" : "r",
"^express" : "R"
},
"waitingForLock" : false,
"msg" : "m/r: (1/3) emit phase M/R: (1/3) Emit Progress: 28830696/30690385 93%",
"progress" : {
"done" : 28830696,
"total" : 30690385
},
"numYields" : 288816,
"lockStats" : {
"timeLockedMicros" : {
"r" : NumberLong(2522717442),
"w" : NumberLong(56346161)
},
"timeAcquiringMicros" : {
"r" : NumberLong(750081),
"w" : NumberLong(8547462)
}
}
}
]
}
碎片是否具有相同的内存量?S2的收益率似乎与S3的收益率(93%)差不多。S1为54g(他完成了),S2和S3为16g。您应该升级S2和S3的内存,或者使用更快的SSD进行补偿。SSD将使您的查询速度提高很多倍。您为什么要使用mapReduce?您是否尝试过
.aggregate()
?非常简单的{“$group”:{“\u id”:{“name”:“$Receiver\u name”,“mobile”:“$Receiver\u mobile”},“count”:{“$sum”:1}“dates”:{“$min”:“$Receiver\u date”}
。如果需要,也可以使用$out
。是的,我使用了聚合框架(aggregationframework,AF)。但当出现16mb结果文档大小错误时,我使用chioce mapReduce(MR)。现在,我知道AF也有$out操作符。最近,我比较了两种方法。AF可能更快10~20%。AF的性能是否比MR不正确?碎片的内存量是否相同?S2的收益率似乎与S3的收益率(93%)差不多。S1为54g(他完成了),S2和S3为16g。您应该升级S2和S3的内存,或者使用更快的SSD进行补偿。SSD将使您的查询速度提高很多倍。您为什么要使用mapReduce?您是否尝试过.aggregate()
?非常简单的{“$group”:{“\u id”:{“name”:“$Receiver\u name”,“mobile”:“$Receiver\u mobile”},“count”:{“$sum”:1}“dates”:{“$min”:“$Receiver\u date”}
。如果需要,也可以使用$out
。是的,我使用了聚合框架(aggregationframework,AF)。但当出现16mb结果文档大小错误时,我使用chioce mapReduce(MR)。现在,我知道AF也有$out操作符。最近,我比较了两种方法。AF可能更快10~20%。AF的性能是否比MR差?