Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/70.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
快速计算R中多边形列表内的点的方法_R_Sp_Point In Polygon - Fatal编程技术网

快速计算R中多边形列表内的点的方法

快速计算R中多边形列表内的点的方法,r,sp,point-in-polygon,R,Sp,Point In Polygon,我有两个数据集,一个有超过1300万个矩形多边形(4个lat lng点集),另一个有10000个点涉及该位置的价格 > polygons id pol_lat pol_lng 1: 148 -4.250236,-4.250236,-4.254640,-4.254640 -49.94628,-49.94494,-49.94494,-49.94628 2

我有两个数据集,一个有超过1300万个矩形多边形(4个lat lng点集),另一个有10000个点涉及该位置的价格

> polygons
     id                                 pol_lat                                 pol_lng
 1: 148 -4.250236,-4.250236,-4.254640,-4.254640 -49.94628,-49.94494,-49.94494,-49.94628
 2: 149 -4.254640,-4.254640,-5.361601,-5.361601 -49.94494,-49.07906,-49.07906,-49.94494
 3: 150 -5.361601,-5.361601,-5.212208,-5.212208 -49.07906,-49.04469,-49.04469,-49.07906
 4: 151 -5.212208,-5.212208,-5.002878,-5.002878 -49.04469,-48.48664,-48.48664,-49.04469
 5: 152 -5.002878,-5.002878,-5.080018,-5.080018 -48.48664,-48.43699,-48.43699,-48.48664
 6: 153 -5.080018,-5.080018,-5.079819,-5.079819 -48.43699,-48.42480,-48.42480,-48.43699
 7: 154 -5.079819,-5.079819,-5.155606,-5.155606 -48.42480,-47.53891,-47.53891,-48.42480
 8: 155 -5.155606,-5.155606,-4.954156,-4.954156 -47.53891,-47.50354,-47.50354,-47.53891
 9: 156 -4.954156,-4.954156,-3.675864,-3.675864 -47.50354,-45.39022,-45.39022,-47.50354
10: 157 -3.675864,-3.675864,-3.706356,-3.706356 -45.39022,-45.30724,-45.30724,-45.39022
11: 158 -3.706356,-3.706356,-3.705801,-3.705801 -45.30724,-45.30722,-45.30722,-45.30724
> points
    longitude  latitude  price
 1: -47.50308 -4.953936 3.0616
 2: -47.50308 -4.953936 3.2070
 3: -47.50308 -4.953936 3.0630
 4: -47.50308 -4.953936 3.0603
 5: -47.50308 -4.953936 3.0460
 6: -47.50308 -4.953936 2.9900
 7: -49.07035 -5.283658 3.3130
 8: -49.08054 -5.347284 3.3900
 9: -49.08054 -5.347284 3.3620
10: -49.21726 -5.338270 3.3900
11: -49.08050 -5.347255 3.4000
12: -49.08042 -5.347248 3.3220
13: -49.08190 -5.359508 3.3130
14: -49.08046 -5.347277 3.3560
我想为每个多边形生成适合每个多边形的所有点的平均价格

现在我正在使用
sp::point.in.polygon
获取适合给定多边形的所有点的索引,然后获取其平均价格

w <- lapply(1:nrow(polygons),
            function(tt) {
              ind <- point.in.polygon(points$latitude, points$longitude,
                                      polygons$pol_lat[[tt]], polygons$pol_lng[[tt]]) > 0
              med <- mean(points$price[ind])
              return(med)
            }
)
> unlist(w)
 [1]      NaN 3.361857 3.313000      NaN      NaN      NaN      NaN      NaN 3.071317      NaN      NaN
如果您的“多边形”始终是矩形如示例中所示,则可以使用四叉树空间索引(如package
SearchTrees
中实现的),以提高识别每个多边形中哪些点的速度

由于空间索引所允许的“比较”次数越少,数据集中的点数越多,因此它可以大大提高速度

例如:

library(SearchTrees)
library(magrittr)

# Create a "beefier" test dataset based on your data: 14000 pts 
# over 45000 polygons

for (i in 1:10) points   <- rbind(points, points + runif(length(points)))
for (i in 1:12) polygons <- rbind(polygons, polygons)


# Compute limits of the polygons
min_xs <- lapply(polygons$pol_lng , min) %>% unlist()
max_xs <- lapply(polygons$pol_lng , max) %>% unlist()
min_ys <- lapply(polygons$pol_lat , min) %>% unlist()
max_ys <- lapply(polygons$pol_lat, max) %>% unlist()
xlims <- cbind(min_xs, max_xs)
ylims <- cbind(min_ys, max_ys)

# Create the quadtree
tree = SearchTrees::createTree(cbind(points[1],points[2]))

#☻ extract averages, looping over polygons ----
t1 <- Sys.time()
w <- lapply(1:nrow(polygons), 
            function(tt) {
              ind <- SearchTrees::rectLookup(
                tree, 
                xlims = xlims[tt,],
                ylims = ylims[tt,]))
              mean(points$price[ind])

              })
Sys.time() - t1
整体速度的提高将取决于点在空间范围内以及相对于多边形的“聚集”方式

如果多边形不是矩形,也可以考虑利用此方法,方法是首先提取每个多边形的bbox中包含的点,然后使用更标准的方法查找多边形“内部”的点

也认为任务是并行的,因此可以通过使用<代码> PROACH 或<代码> PARLAPEP 方法来改进性能。p>


可能会有帮助:我已经访问了第二个链接,但是
ptinpoly
并没有提供多少速度提升。将查看第一个链接中的答案是否有帮助。感谢您不是空间统计方面的专家,但是您的lappy正在对所有多边形进行迭代,如果点的数量远小于多边形的数量(在您的案例中显然是这样),那么这可能会导致效率低下。您是否尝试过在点上迭代,然后查看它是否属于多边形?这种方法的好处是,如果多边形是一个分区,那么一旦你找到了一个点所属的多边形,你就可以停止寻找其他点,继续寻找下一个点。现在有点傻了,因为没有考虑它。谢谢@mbironMy多边形可以有重叠区域。我已经只检查了离每个点最近的多边形。反转循环,也就是在点上迭代,这项技巧帮助很大。奇怪的是它的速度是4倍,而不是10倍。尽管如此,还是有了很大的进步
library(SearchTrees)
library(magrittr)

# Create a "beefier" test dataset based on your data: 14000 pts 
# over 45000 polygons

for (i in 1:10) points   <- rbind(points, points + runif(length(points)))
for (i in 1:12) polygons <- rbind(polygons, polygons)


# Compute limits of the polygons
min_xs <- lapply(polygons$pol_lng , min) %>% unlist()
max_xs <- lapply(polygons$pol_lng , max) %>% unlist()
min_ys <- lapply(polygons$pol_lat , min) %>% unlist()
max_ys <- lapply(polygons$pol_lat, max) %>% unlist()
xlims <- cbind(min_xs, max_xs)
ylims <- cbind(min_ys, max_ys)

# Create the quadtree
tree = SearchTrees::createTree(cbind(points[1],points[2]))

#☻ extract averages, looping over polygons ----
t1 <- Sys.time()
w <- lapply(1:nrow(polygons), 
            function(tt) {
              ind <- SearchTrees::rectLookup(
                tree, 
                xlims = xlims[tt,],
                ylims = ylims[tt,]))
              mean(points$price[ind])

              })
Sys.time() - t1
w1 <- unlist(w)
t1 <- Sys.time()
w <- lapply(1:nrow(polygons),
            function(tt) {
              ind <- sp::point.in.polygon(points$latitude, points$longitude,
                                      polygons$pol_lat[[tt]], polygons$pol_lng[[tt]]) > 0
              med <- mean(points$price[ind])
              return(med)
            }
)
Sys.time() - t1
w2 <- unlist(w)
> all.equal(w1, w2)
[1] TRUE