MongoDB 2dsphere索引$GEOFIN性能
我有一个GeoJSON Point格式的坐标数据集合,我需要从中查询一个区域内的10个最新条目。现在有1.000.000个条目,但数量将增加10倍左右 我的问题是,当所需区域内有大量条目时,我的查询性能会大幅下降(案例3)。我目前拥有的测试数据是随机的,但实际数据不会是随机的,因此完全根据区域的维度选择另一个索引(如案例4)是不可能的 我应该怎么做才能让它在任何区域都能按预期运行 1。收集统计信息:MongoDB 2dsphere索引$GEOFIN性能,mongodb,geospatial,Mongodb,Geospatial,我有一个GeoJSON Point格式的坐标数据集合,我需要从中查询一个区域内的10个最新条目。现在有1.000.000个条目,但数量将增加10倍左右 我的问题是,当所需区域内有大量条目时,我的查询性能会大幅下降(案例3)。我目前拥有的测试数据是随机的,但实际数据不会是随机的,因此完全根据区域的维度选择另一个索引(如案例4)是不可能的 我应该怎么做才能让它在任何区域都能按预期运行 1。收集统计信息: > db.randomcoordinates.stats() { "ns" : "
> db.randomcoordinates.stats()
{
"ns" : "test.randomcoordinates",
"count" : 1000000,
"size" : 224000000,
"avgObjSize" : 224,
"storageSize" : 315006976,
"numExtents" : 15,
"nindexes" : 3,
"lastExtentSize" : 84426752,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 120416128,
"indexSizes" : {
"_id_" : 32458720,
"position_2dsphere_timestamp_-1" : 55629504,
"timestamp_-1" : 32327904
},
"ok" : 1
}
> db.randomcoordinates.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.randomcoordinates",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"position" : "2dsphere",
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "position_2dsphere_timestamp_-1"
},
{
"v" : 1,
"key" : {
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "timestamp_-1"
}
]
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 116775,
"nscanned" : 283424,
"nscannedObjectsAllPlans" : 116775,
"nscannedAllPlans" : 283424,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 3876,
"indexBounds" : {
},
"nscanned" : 283424,
"matchTested" : NumberLong(166649),
"geoTested" : NumberLong(166649),
"cellsInCover" : NumberLong(14),
"server" : "chan:27017"
}
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
"cursor" : "BtreeCursor timestamp_-1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 63,
"nscanned" : 63,
"nscannedObjectsAllPlans" : 63,
"nscannedAllPlans" : 63,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"timestamp" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "chan:27017"
}
2。索引:
> db.randomcoordinates.stats()
{
"ns" : "test.randomcoordinates",
"count" : 1000000,
"size" : 224000000,
"avgObjSize" : 224,
"storageSize" : 315006976,
"numExtents" : 15,
"nindexes" : 3,
"lastExtentSize" : 84426752,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 120416128,
"indexSizes" : {
"_id_" : 32458720,
"position_2dsphere_timestamp_-1" : 55629504,
"timestamp_-1" : 32327904
},
"ok" : 1
}
> db.randomcoordinates.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.randomcoordinates",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"position" : "2dsphere",
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "position_2dsphere_timestamp_-1"
},
{
"v" : 1,
"key" : {
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "timestamp_-1"
}
]
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 116775,
"nscanned" : 283424,
"nscannedObjectsAllPlans" : 116775,
"nscannedAllPlans" : 283424,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 3876,
"indexBounds" : {
},
"nscanned" : 283424,
"matchTested" : NumberLong(166649),
"geoTested" : NumberLong(166649),
"cellsInCover" : NumberLong(14),
"server" : "chan:27017"
}
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
"cursor" : "BtreeCursor timestamp_-1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 63,
"nscanned" : 63,
"nscannedObjectsAllPlans" : 63,
"nscannedAllPlans" : 63,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"timestamp" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "chan:27017"
}
3。使用2dsphere复合索引查找:
> db.randomcoordinates.stats()
{
"ns" : "test.randomcoordinates",
"count" : 1000000,
"size" : 224000000,
"avgObjSize" : 224,
"storageSize" : 315006976,
"numExtents" : 15,
"nindexes" : 3,
"lastExtentSize" : 84426752,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 120416128,
"indexSizes" : {
"_id_" : 32458720,
"position_2dsphere_timestamp_-1" : 55629504,
"timestamp_-1" : 32327904
},
"ok" : 1
}
> db.randomcoordinates.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.randomcoordinates",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"position" : "2dsphere",
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "position_2dsphere_timestamp_-1"
},
{
"v" : 1,
"key" : {
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "timestamp_-1"
}
]
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 116775,
"nscanned" : 283424,
"nscannedObjectsAllPlans" : 116775,
"nscannedAllPlans" : 283424,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 3876,
"indexBounds" : {
},
"nscanned" : 283424,
"matchTested" : NumberLong(166649),
"geoTested" : NumberLong(166649),
"cellsInCover" : NumberLong(14),
"server" : "chan:27017"
}
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
"cursor" : "BtreeCursor timestamp_-1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 63,
"nscanned" : 63,
"nscannedObjectsAllPlans" : 63,
"nscannedAllPlans" : 63,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"timestamp" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "chan:27017"
}
4。使用时间戳索引查找:
> db.randomcoordinates.stats()
{
"ns" : "test.randomcoordinates",
"count" : 1000000,
"size" : 224000000,
"avgObjSize" : 224,
"storageSize" : 315006976,
"numExtents" : 15,
"nindexes" : 3,
"lastExtentSize" : 84426752,
"paddingFactor" : 1,
"systemFlags" : 0,
"userFlags" : 0,
"totalIndexSize" : 120416128,
"indexSizes" : {
"_id_" : 32458720,
"position_2dsphere_timestamp_-1" : 55629504,
"timestamp_-1" : 32327904
},
"ok" : 1
}
> db.randomcoordinates.getIndexes()
[
{
"v" : 1,
"key" : {
"_id" : 1
},
"ns" : "test.randomcoordinates",
"name" : "_id_"
},
{
"v" : 1,
"key" : {
"position" : "2dsphere",
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "position_2dsphere_timestamp_-1"
},
{
"v" : 1,
"key" : {
"timestamp" : -1
},
"ns" : "test.randomcoordinates",
"name" : "timestamp_-1"
}
]
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 116775,
"nscanned" : 283424,
"nscannedObjectsAllPlans" : 116775,
"nscannedAllPlans" : 283424,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 3876,
"indexBounds" : {
},
"nscanned" : 283424,
"matchTested" : NumberLong(166649),
"geoTested" : NumberLong(166649),
"cellsInCover" : NumberLong(14),
"server" : "chan:27017"
}
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
"cursor" : "BtreeCursor timestamp_-1",
"isMultiKey" : false,
"n" : 10,
"nscannedObjects" : 63,
"nscanned" : 63,
"nscannedObjectsAllPlans" : 63,
"nscannedAllPlans" : 63,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 0,
"nChunkSkips" : 0,
"millis" : 0,
"indexBounds" : {
"timestamp" : [
[
{
"$maxElement" : 1
},
{
"$minElement" : 1
}
]
]
},
"server" : "chan:27017"
}
有些人建议使用{timestamp:-1,position:“2dsphere”}
索引,所以我也尝试过,但它的性能似乎不够好
5。使用时间戳+2dsphere复合索引查找
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1_position_2dsphere").explain()
{
"cursor" : "S2Cursor",
"isMultiKey" : true,
"n" : 10,
"nscannedObjects" : 116953,
"nscanned" : 286513,
"nscannedObjectsAllPlans" : 116953,
"nscannedAllPlans" : 286513,
"scanAndOrder" : true,
"indexOnly" : false,
"nYields" : 4,
"nChunkSkips" : 0,
"millis" : 4597,
"indexBounds" : {
},
"nscanned" : 286513,
"matchTested" : NumberLong(169560),
"geoTested" : NumberLong(169560),
"cellsInCover" : NumberLong(14),
"server" : "chan:27017"
}
您是否尝试过在数据集上使用聚合框架 您想要的查询如下所示:
db.randomcoordinates.aggregate(
{ $match: {position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}},
{ $sort: { timestamp: -1 } },
{ $limit: 10 }
);
不幸的是,聚合框架在产品构建中还没有explain
,因此您只能知道它是否会产生巨大的时间差异。如果您从源代码处进行了良好的构建,它看起来可能在上个月底就已经存在了:。它看起来也将出现在定于下周二(10/15/2013)发布的devbuild2.5.3中
我应该怎么做才能让它在任何情况下都能按预期执行
区域
$geointen
根本无法高效运行。据我所知,它将在Θ(n)效率平均情况下运行(考虑到alg最多需要检查n个点,至少10个点)
但是,我肯定会对坐标集合进行一些预处理,以确保首先处理最近添加的坐标,从而使您有更好的机会获得Θ(10)效率(除了使用位置\u 2dsphere\u时间戳\u1
之外,听起来也是这样的做法)
有些人建议使用{timestamp:-1,位置:“2dsphere”}
索引,所以我也尝试了一下,但它似乎不起作用
很好
(请参见对初始问题的回答。)
此外,以下内容可能有用
希望这有帮助
TL;DR您可以随心所欲地玩弄索引,但除非您重写它,否则您将无法从$geoinsin
中获得更高的效率
也就是说,如果您愿意,您可以始终专注于优化索引性能并重写函数 我在寻找类似问题的解决方案时看到了这个问题。这是一个很老的问题,没有得到回答,如果其他人在寻找此类情况的解决方案,我将尝试解释为什么提到的方法不适合手头的任务,以及如何微调这些查询 在第一种情况下,扫描如此多的项目是完全正常的。让我试着解释一下原因: 当Mongodb构建复合索引
“position\u 2dsphere\u timestamp\u1”
时,它实际上创建了一个B-树来保存position键中包含的所有几何图形,在本例中是点,并且对于该B-树中的每个不同值,会创建另一个B-树以降序保存时间戳。这意味着,除非您的条目彼此非常(我的意思是非常)接近,否则二级B树将只包含一个条目,并且查询性能几乎与仅在位置字段上有一个索引相同。除此之外,mongodb将能够在辅助b树上使用时间戳值,而不是将实际文档放入内存并检查时间戳
当我们构建复合索引“timestamp\u1\u position\u 2dsphere”时,这同样适用于其他场景。不太可能同时以毫秒精度输入两个条目。所以在这种情况下,;是的,我们有按时间戳字段排序的数据,但是我们有很多其他的B树,每个时间戳的不同值只有一个条目。因此,在过滤器中应用GeoInside将无法很好地执行,因为它必须检查每个条目,直到满足限制
那么,如何才能使这类查询运行良好呢?就我个人而言,我从尽可能多的字段放在地理空间字段前面开始。但主要的技巧是保留另一个字段,比如“createdDay”,它将以日精度保存一个数字。如果您需要更高的精度,您也可以使用小时级精度,以性能为代价,这完全取决于项目的需要。您的索引如下所示:{createdDay:-1,位置:“2dsphere”}
。现在,在同一天创建的每个文档都将在相同的2dsphere b树索引上存储和排序。因此,mongodb将从当天开始,因为它应该是索引中的最大值,并对创建日期为今天的文档的b树位置进行索引扫描。如果它发现至少10个文档,它将停止并返回这些文档,如果没有,它将移动到前一天,依此类推。在您的案例中,此方法将大大提高性能
我希望这对你的情况有所帮助。你能澄清一下你所说的“因此,单纯根据区域尺寸选择另一个索引(如案例4)是不可能的”是什么意思吗?在我看来,无论区域大小如何,因为您只查找最近的十个点,所以使用时间戳索引时,您总是会做得更好,其中scanander为false,nscanned最接近n。考虑到这一点,我建议创建一个时间戳第一、位置第二的复合索引,然而,当前的mongo版本(2.4.6)是