Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/mongodb/13.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
MongoDB 2dsphere索引$GEOFIN性能_Mongodb_Geospatial - Fatal编程技术网

MongoDB 2dsphere索引$GEOFIN性能

MongoDB 2dsphere索引$GEOFIN性能,mongodb,geospatial,Mongodb,Geospatial,我有一个GeoJSON Point格式的坐标数据集合,我需要从中查询一个区域内的10个最新条目。现在有1.000.000个条目,但数量将增加10倍左右 我的问题是,当所需区域内有大量条目时,我的查询性能会大幅下降(案例3)。我目前拥有的测试数据是随机的,但实际数据不会是随机的,因此完全根据区域的维度选择另一个索引(如案例4)是不可能的 我应该怎么做才能让它在任何区域都能按预期运行 1。收集统计信息: > db.randomcoordinates.stats() { "ns" : "

我有一个GeoJSON Point格式的坐标数据集合,我需要从中查询一个区域内的10个最新条目。现在有1.000.000个条目,但数量将增加10倍左右

我的问题是,当所需区域内有大量条目时,我的查询性能会大幅下降(案例3)。我目前拥有的测试数据是随机的,但实际数据不会是随机的,因此完全根据区域的维度选择另一个索引(如案例4)是不可能的

我应该怎么做才能让它在任何区域都能按预期运行

1。收集统计信息:

> db.randomcoordinates.stats()
{
    "ns" : "test.randomcoordinates",
    "count" : 1000000,
    "size" : 224000000,
    "avgObjSize" : 224,
    "storageSize" : 315006976,
    "numExtents" : 15,
    "nindexes" : 3,
    "lastExtentSize" : 84426752,
    "paddingFactor" : 1,
    "systemFlags" : 0,
    "userFlags" : 0,
    "totalIndexSize" : 120416128,
    "indexSizes" : {
        "_id_" : 32458720,
        "position_2dsphere_timestamp_-1" : 55629504,
        "timestamp_-1" : 32327904
    },
    "ok" : 1
}
> db.randomcoordinates.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "test.randomcoordinates",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "position" : "2dsphere",
            "timestamp" : -1
        },
        "ns" : "test.randomcoordinates",
        "name" : "position_2dsphere_timestamp_-1"
    },
    {
        "v" : 1,
        "key" : {
            "timestamp" : -1
        },
        "ns" : "test.randomcoordinates",
        "name" : "timestamp_-1"
    }
]
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
    "cursor" : "S2Cursor",
    "isMultiKey" : true,
    "n" : 10,
    "nscannedObjects" : 116775,
    "nscanned" : 283424,
    "nscannedObjectsAllPlans" : 116775,
    "nscannedAllPlans" : 283424,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 4,
    "nChunkSkips" : 0,
    "millis" : 3876,
    "indexBounds" : {

    },
    "nscanned" : 283424,
    "matchTested" : NumberLong(166649),
    "geoTested" : NumberLong(166649),
    "cellsInCover" : NumberLong(14),
    "server" : "chan:27017"
}
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
    "cursor" : "BtreeCursor timestamp_-1",
    "isMultiKey" : false,
    "n" : 10,
    "nscannedObjects" : 63,
    "nscanned" : 63,
    "nscannedObjectsAllPlans" : 63,
    "nscannedAllPlans" : 63,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "timestamp" : [
            [
                {
                    "$maxElement" : 1
                },
                {
                    "$minElement" : 1
                }
            ]
        ]
    },
    "server" : "chan:27017"
}
2。索引:

> db.randomcoordinates.stats()
{
    "ns" : "test.randomcoordinates",
    "count" : 1000000,
    "size" : 224000000,
    "avgObjSize" : 224,
    "storageSize" : 315006976,
    "numExtents" : 15,
    "nindexes" : 3,
    "lastExtentSize" : 84426752,
    "paddingFactor" : 1,
    "systemFlags" : 0,
    "userFlags" : 0,
    "totalIndexSize" : 120416128,
    "indexSizes" : {
        "_id_" : 32458720,
        "position_2dsphere_timestamp_-1" : 55629504,
        "timestamp_-1" : 32327904
    },
    "ok" : 1
}
> db.randomcoordinates.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "test.randomcoordinates",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "position" : "2dsphere",
            "timestamp" : -1
        },
        "ns" : "test.randomcoordinates",
        "name" : "position_2dsphere_timestamp_-1"
    },
    {
        "v" : 1,
        "key" : {
            "timestamp" : -1
        },
        "ns" : "test.randomcoordinates",
        "name" : "timestamp_-1"
    }
]
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
    "cursor" : "S2Cursor",
    "isMultiKey" : true,
    "n" : 10,
    "nscannedObjects" : 116775,
    "nscanned" : 283424,
    "nscannedObjectsAllPlans" : 116775,
    "nscannedAllPlans" : 283424,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 4,
    "nChunkSkips" : 0,
    "millis" : 3876,
    "indexBounds" : {

    },
    "nscanned" : 283424,
    "matchTested" : NumberLong(166649),
    "geoTested" : NumberLong(166649),
    "cellsInCover" : NumberLong(14),
    "server" : "chan:27017"
}
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
    "cursor" : "BtreeCursor timestamp_-1",
    "isMultiKey" : false,
    "n" : 10,
    "nscannedObjects" : 63,
    "nscanned" : 63,
    "nscannedObjectsAllPlans" : 63,
    "nscannedAllPlans" : 63,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "timestamp" : [
            [
                {
                    "$maxElement" : 1
                },
                {
                    "$minElement" : 1
                }
            ]
        ]
    },
    "server" : "chan:27017"
}
3。使用2dsphere复合索引查找:

> db.randomcoordinates.stats()
{
    "ns" : "test.randomcoordinates",
    "count" : 1000000,
    "size" : 224000000,
    "avgObjSize" : 224,
    "storageSize" : 315006976,
    "numExtents" : 15,
    "nindexes" : 3,
    "lastExtentSize" : 84426752,
    "paddingFactor" : 1,
    "systemFlags" : 0,
    "userFlags" : 0,
    "totalIndexSize" : 120416128,
    "indexSizes" : {
        "_id_" : 32458720,
        "position_2dsphere_timestamp_-1" : 55629504,
        "timestamp_-1" : 32327904
    },
    "ok" : 1
}
> db.randomcoordinates.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "test.randomcoordinates",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "position" : "2dsphere",
            "timestamp" : -1
        },
        "ns" : "test.randomcoordinates",
        "name" : "position_2dsphere_timestamp_-1"
    },
    {
        "v" : 1,
        "key" : {
            "timestamp" : -1
        },
        "ns" : "test.randomcoordinates",
        "name" : "timestamp_-1"
    }
]
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
    "cursor" : "S2Cursor",
    "isMultiKey" : true,
    "n" : 10,
    "nscannedObjects" : 116775,
    "nscanned" : 283424,
    "nscannedObjectsAllPlans" : 116775,
    "nscannedAllPlans" : 283424,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 4,
    "nChunkSkips" : 0,
    "millis" : 3876,
    "indexBounds" : {

    },
    "nscanned" : 283424,
    "matchTested" : NumberLong(166649),
    "geoTested" : NumberLong(166649),
    "cellsInCover" : NumberLong(14),
    "server" : "chan:27017"
}
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
    "cursor" : "BtreeCursor timestamp_-1",
    "isMultiKey" : false,
    "n" : 10,
    "nscannedObjects" : 63,
    "nscanned" : 63,
    "nscannedObjectsAllPlans" : 63,
    "nscannedAllPlans" : 63,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "timestamp" : [
            [
                {
                    "$maxElement" : 1
                },
                {
                    "$minElement" : 1
                }
            ]
        ]
    },
    "server" : "chan:27017"
}
4。使用时间戳索引查找:

> db.randomcoordinates.stats()
{
    "ns" : "test.randomcoordinates",
    "count" : 1000000,
    "size" : 224000000,
    "avgObjSize" : 224,
    "storageSize" : 315006976,
    "numExtents" : 15,
    "nindexes" : 3,
    "lastExtentSize" : 84426752,
    "paddingFactor" : 1,
    "systemFlags" : 0,
    "userFlags" : 0,
    "totalIndexSize" : 120416128,
    "indexSizes" : {
        "_id_" : 32458720,
        "position_2dsphere_timestamp_-1" : 55629504,
        "timestamp_-1" : 32327904
    },
    "ok" : 1
}
> db.randomcoordinates.getIndexes()
[
    {
        "v" : 1,
        "key" : {
            "_id" : 1
        },
        "ns" : "test.randomcoordinates",
        "name" : "_id_"
    },
    {
        "v" : 1,
        "key" : {
            "position" : "2dsphere",
            "timestamp" : -1
        },
        "ns" : "test.randomcoordinates",
        "name" : "position_2dsphere_timestamp_-1"
    },
    {
        "v" : 1,
        "key" : {
            "timestamp" : -1
        },
        "ns" : "test.randomcoordinates",
        "name" : "timestamp_-1"
    }
]
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("position_2dsphere_timestamp_-1").explain()
{
    "cursor" : "S2Cursor",
    "isMultiKey" : true,
    "n" : 10,
    "nscannedObjects" : 116775,
    "nscanned" : 283424,
    "nscannedObjectsAllPlans" : 116775,
    "nscannedAllPlans" : 283424,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 4,
    "nChunkSkips" : 0,
    "millis" : 3876,
    "indexBounds" : {

    },
    "nscanned" : 283424,
    "matchTested" : NumberLong(166649),
    "geoTested" : NumberLong(166649),
    "cellsInCover" : NumberLong(14),
    "server" : "chan:27017"
}
> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1").explain()
{
    "cursor" : "BtreeCursor timestamp_-1",
    "isMultiKey" : false,
    "n" : 10,
    "nscannedObjects" : 63,
    "nscanned" : 63,
    "nscannedObjectsAllPlans" : 63,
    "nscannedAllPlans" : 63,
    "scanAndOrder" : false,
    "indexOnly" : false,
    "nYields" : 0,
    "nChunkSkips" : 0,
    "millis" : 0,
    "indexBounds" : {
        "timestamp" : [
            [
                {
                    "$maxElement" : 1
                },
                {
                    "$minElement" : 1
                }
            ]
        ]
    },
    "server" : "chan:27017"
}
有些人建议使用
{timestamp:-1,position:“2dsphere”}
索引,所以我也尝试过,但它的性能似乎不够好

5。使用时间戳+2dsphere复合索引查找

> db.randomcoordinates.find({position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}).sort({timestamp: -1}).limit(10).hint("timestamp_-1_position_2dsphere").explain()
{
    "cursor" : "S2Cursor",
    "isMultiKey" : true,
    "n" : 10,
    "nscannedObjects" : 116953,
    "nscanned" : 286513,
    "nscannedObjectsAllPlans" : 116953,
    "nscannedAllPlans" : 286513,
    "scanAndOrder" : true,
    "indexOnly" : false,
    "nYields" : 4,
    "nChunkSkips" : 0,
    "millis" : 4597,
    "indexBounds" : {

    },
    "nscanned" : 286513,
    "matchTested" : NumberLong(169560),
    "geoTested" : NumberLong(169560),
    "cellsInCover" : NumberLong(14),
    "server" : "chan:27017"
}

您是否尝试过在数据集上使用聚合框架

您想要的查询如下所示:

db.randomcoordinates.aggregate(
    { $match: {position: {$geoWithin: {$geometry: {type: "Polygon", coordinates: [[[1, 1], [1, 90], [180, 90], [180, 1], [1, 1]]]}}}}},
    { $sort: { timestamp: -1 } },
    { $limit: 10 }
);
不幸的是,聚合框架在产品构建中还没有
explain
,因此您只能知道它是否会产生巨大的时间差异。如果您从源代码处进行了良好的构建,它看起来可能在上个月底就已经存在了:。它看起来也将出现在定于下周二(10/15/2013)发布的devbuild2.5.3中

我应该怎么做才能让它在任何情况下都能按预期执行 区域

$geointen
根本无法高效运行。据我所知,它将在Θ(n)效率平均情况下运行(考虑到alg最多需要检查n个点,至少10个点)

但是,我肯定会对坐标集合进行一些预处理,以确保首先处理最近添加的坐标,从而使您有更好的机会获得Θ(10)效率(除了使用
位置\u 2dsphere\u时间戳\u1
之外,听起来也是这样的做法)

有些人建议使用{timestamp:-1,位置:“2dsphere”} 索引,所以我也尝试了一下,但它似乎不起作用 很好

(请参见对初始问题的回答。)

此外,以下内容可能有用

希望这有帮助

TL;DR您可以随心所欲地玩弄索引,但除非您重写它,否则您将无法从
$geoinsin
中获得更高的效率


也就是说,如果您愿意,您可以始终专注于优化索引性能并重写函数

我在寻找类似问题的解决方案时看到了这个问题。这是一个很老的问题,没有得到回答,如果其他人在寻找此类情况的解决方案,我将尝试解释为什么提到的方法不适合手头的任务,以及如何微调这些查询

在第一种情况下,扫描如此多的项目是完全正常的。让我试着解释一下原因:

当Mongodb构建复合索引
“position\u 2dsphere\u timestamp\u1”
时,它实际上创建了一个B-树来保存position键中包含的所有几何图形,在本例中是点,并且对于该B-树中的每个不同值,会创建另一个B-树以降序保存时间戳。这意味着,除非您的条目彼此非常(我的意思是非常)接近,否则二级B树将只包含一个条目,并且查询性能几乎与仅在位置字段上有一个索引相同。除此之外,mongodb将能够在辅助b树上使用时间戳值,而不是将实际文档放入内存并检查时间戳

当我们构建复合索引“timestamp\u1\u position\u 2dsphere”时,这同样适用于其他场景。不太可能同时以毫秒精度输入两个条目。所以在这种情况下,;是的,我们有按时间戳字段排序的数据,但是我们有很多其他的B树,每个时间戳的不同值只有一个条目。因此,在过滤器中应用GeoInside将无法很好地执行,因为它必须检查每个条目,直到满足限制

那么,如何才能使这类查询运行良好呢?就我个人而言,我从尽可能多的字段放在地理空间字段前面开始。但主要的技巧是保留另一个字段,比如“createdDay”,它将以日精度保存一个数字。如果您需要更高的精度,您也可以使用小时级精度,以性能为代价,这完全取决于项目的需要。您的索引如下所示:
{createdDay:-1,位置:“2dsphere”}
。现在,在同一天创建的每个文档都将在相同的2dsphere b树索引上存储和排序。因此,mongodb将从当天开始,因为它应该是索引中的最大值,并对创建日期为今天的文档的b树位置进行索引扫描。如果它发现至少10个文档,它将停止并返回这些文档,如果没有,它将移动到前一天,依此类推。在您的案例中,此方法将大大提高性能


我希望这对你的情况有所帮助。

你能澄清一下你所说的“因此,单纯根据区域尺寸选择另一个索引(如案例4)是不可能的”是什么意思吗?在我看来,无论区域大小如何,因为您只查找最近的十个点,所以使用时间戳索引时,您总是会做得更好,其中scanander为false,nscanned最接近n。考虑到这一点,我建议创建一个时间戳第一、位置第二的复合索引,然而,当前的mongo版本(2.4.6)是