R 如何在data.table中调用gDistance?

R 如何在data.table中调用gDistance?,r,data.table,gis,rgdal,R,Data.table,Gis,Rgdal,我使用ddply和gDistance(包:rgeos)链接了两组多边形。然而,对于我的大型数据集来说,这个过程非常缓慢,正如许多博客(例如)所解释的那样 这些博客建议使用data.table会更快,尽管我无法解决如何使其适用于我的案例。特别是,我如何在data.table…(数据表)中仅为我的栖息地清理数据的子集应用gDistance 下面我将更详细地解释我正在尝试做什么,并附加一小部分数据用于测试代码(这里:) 我有两个包含多边形的形状文件(其中一个是栖息地清理的补丁,另一个是因清理栖息地而受

我使用ddply和gDistance(包:rgeos)链接了两组多边形。然而,对于我的大型数据集来说,这个过程非常缓慢,正如许多博客(例如)所解释的那样

这些博客建议使用data.table会更快,尽管我无法解决如何使其适用于我的案例。特别是,我如何在data.table…(数据表)中仅为我的栖息地清理数据的子集应用gDistance

下面我将更详细地解释我正在尝试做什么,并附加一小部分数据用于测试代码(这里:)

我有两个包含多边形的形状文件(其中一个是栖息地清理的补丁,另一个是因清理栖息地而受到惩罚的属性),其中我想确定第一层(栖息地清理)中最靠近第二层(惩罚属性)中每个多边形的多边形,因为这两个多边形并不总是完全重叠。此外,还有一个额外的约束条件;匹配的栖息地清理多边形不能比环境犯罪早5年

我已经成功地使用了以下ddply代码-关于如何使用data.table执行此操作的任何建议,我认为这会更快

谢谢

# # # # # # # # # # # # # # # # # # # Load [R] GIS packages. # # # # # # # # # # # # # # # # # # 
library(rgeos)
library(raster)
library(rgdal)
library(plyr)

# # # # # # # # # # # # # # # # # # # ENTER shapefile information HERE # # # # # # # # # # # # # # # # # # # # # #


# What is the name of the punishments shapefile?
punishments <- "punishments_stack_overflow"  


# What is the name of the PUNISHMENTs directory?
myDrctry <- "E:/Esri_arcGIS_datasets/SM_data/IBAMA_embargo/final_embargo_list/near_chopped/stack_overflow" # !CHANGE ME - where the shapefiles are stored


# What is the name of the hab_cl shapefile?
hab_cl_shp <- "RO_SimU_deforestation_Amazonia_SIRGAS_near" 


# What is the name of the hab_cl data directory?
my_hab_cl_Drctry <- "E:/Esri_arcGIS_datasets/SM_data/PRODES/Deforestation_per_SimU/near_analysis" #! CHANGE ME




# # # # # # # # # # # # # # # # # # # # Load the shapefiles  # # # # # # # # # # # # # # # # # # # # # # 


# Read in the embargo shapefile
punishments_need_near <- readOGR(dsn=myDrctry, layer=punishments)


# Identify the attributes to keep
myattributes <- c("numero_tad", "data_tad", "CD_BIOMA")


# Subset the full dataset extracting only the desired attributes
punishments_need_near@data <- punishments_need_near@data[,myattributes]


# Load the deforestation data 
hab_cl_patches_near <- readOGR(dsn=my_hab_cl_Drctry, layer=hab_cl_shp) 
proj4string(hab_cl_patches_near)               # check the projection (which is SIRGAS 2000 UTM)
#hab_cl_patches_near@data <- hab_cl_patches_near@data[,c("year","LAPIG_ID")] # manipulate the columns to match oter dataframes
#names(hab_cl_patches_near@data) <- c("ano", "PRODES_ID")
head(hab_cl_patches_near)                                                   # check that it worked 


# # # # # # # # # # # # # # # # # # # # # # # Run the loop # # # # # # # # # # # # # # # # # # # # # # # 


# Use ddply to calculate nearest distance for each embargo ("numero_tad")
tmp <- ddply(punishments_need_near@data, .(numero_tad), function(x) {    # numero_tad is a unique identifier per punishment
  ID <- x$numero_tad[1]
  tmp.punishments <- punishments_need_near[punishments_need_near@data$numero_tad == ID,]                    
  tmp.patches <- hab_cl_patches_near[(hab_cl_patches_near@data$ano +5) >= tmp.punishments@data[,"ano_new"] &     # match the punishments with habitat clearance in last 5 years (ano_new = yr of punishment, ano = yr of habitat clearance)
                                     hab_cl_patches_near@data$ano <= tmp.punishments@data[,"ano_new"],]          # and not after the punishment itself
  obj <- gDistance(tmp.punishments, tmp.patches, byid=TRUE)                                                      # calculate the distance between each punishment and patch of habitat clearance
  df  <- data.frame(numero_tad = ID, PRODES_ID = tmp.patches$PRODES_ID[which.min(obj)], dist = min(obj))    # link punishment with the nearest suitable patch of habitat clearance
}, .progress='text') # progress bar

head(tmp)
在现场代码>周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五# 图书馆(rgeos) 图书馆(光栅) 图书馆(rgdal) 图书馆(plyr) #周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五周五#### #惩罚形状文件的名称是什么?
惩罚您的多边形存储在空间多边形数据框类对象中。data.table仅适用于data.table类型的对象。您需要将data.table创建的数据子集转换为空间多边形数据帧

gDistance需要空间对象,但它正在接收数据表对象

要解决此问题,请将与多边形对应的列转换回聚合函数中的多边形

Dropbox链接已断开,但下面是一个使用空间点(而不是多边形)的示例。我认为同样的方法也适用

library(sp)
library(rgeos)

punishments <- data.frame(numero_tad = sample(1:3, 10, replace=TRUE), ano_new = sample(1:10, 10, replace=TRUE), x = sample(1:100, 10), y = sample(1:100, 10) )
patches <- data.frame(PRODES_ID = 1:10, ano = sample(1:10, 10, replace=TRUE), x = sample(1:100, 10), y = sample(1:100, 10) )
head(punishments)
head(patches)

coordinates(punishments) <- ~x+y # convert to Spatial object
coordinates(patches) <- ~x+y # convert to Spatial object

class(punishments) # "SpatialPointsDataFrame"
head(punishments)

link <- function(SD) {
  coordinates(SD) <- ~x+y # convert from data.table to Spatial object
  # you'll need to do something like to above to convert back to a polygon
  tmp.patches <- patches[(patches$ano + 5) >= SD$ano_new & patches$ano < SD$ano_new, ]
  obj <- gDistance(SD, tmp.patches, byid=TRUE)
  df <- list(PRODES_ID = tmp.patches$PRODES_ID[which(obj == min(obj), arr.ind=TRUE)[1]], dist = min(obj) )
  return(df)
}

dt <- as.data.table(punishments)
class(dt)
head(dt)
setkey(dt, numero_tad)

tmp <- dt[,link(.SD), by=numero_tad]
库(sp)
图书馆(rgeos)

惩罚听起来像是一个使用
.BY
的机会。请参阅以获取此类示例