R 试图创建一个函数，通过最近的gps坐标连接两个数据集_R_For Loop_Gps_Spatial_Sp

R 试图创建一个函数，通过最近的gps坐标连接两个数据集

r for-loop gps

R 试图创建一个函数，通过最近的gps坐标连接两个数据集,r,for-loop,gps,spatial,sp,R,For Loop,Gps,Spatial,Sp,我试图合并两个包含GPS坐标的数据集，这样我就只剩下一个数据集，其中包含两个数据集中的变量。我正在尝试使用一个函数来实现这一点。问题是两个数据集的GPS坐标并不完全匹配。因此，任务是通过找到gps坐标的最接近配对，将一个数据集的变量与另一个数据集的变量进行匹配我成功地使用了fuzzy-join包，但只能获得部分匹配（~75%）。使用下面的函数，我希望获得更高程度的匹配。一个数据集比另一个数据集短，所以这里的想法是使用两个for循环，每个for循环遍历每个数据集建立“锚”（两个数据集的第一次观

我试图合并两个包含GPS坐标的数据集，这样我就只剩下一个数据集，其中包含两个数据集中的变量。我正在尝试使用一个函数来实现这一点。问题是两个数据集的GPS坐标并不完全匹配。因此，任务是通过找到gps坐标的最接近配对，将一个数据集的变量与另一个数据集的变量进行匹配

我成功地使用了fuzzy-join包，但只能获得部分匹配（~75%）。使用下面的函数，我希望获得更高程度的匹配。一个数据集比另一个数据集短，所以这里的想法是使用两个for循环，每个for循环遍历每个数据集

建立“锚”（两个数据集的第一次观测之间的距离），这样，如果两点之间的距离小于锚，新的（较短的）距离将成为新的锚。for循环将继续进行，直到找到最短距离，并且两个数据集的变量将附加到新数据集（此处称为pairedData）的末尾。我应该留下一个数据集，只要使用最短的数据集（6314行）和从两个数据集中获取的数据

我认为这个函数应该可以工作，但是rbind（）非常慢，而且我在实现rbindlist（）时遇到了问题。关于我如何实现这一点有什么想法吗

combineGPS <- function(harvest,planting) {
require(sp)
require(data.table)
longH <- harvest$long
latH <- harvest$lat
longP <- planting$long
latP <- planting$lat
rowsH <- nrow(harvest)
rowsP <- nrow(planting)
harvestCoords <- cbind(longH,latH)
harvestPoints <- SpatialPoints(harvestCoords)
plantingCoords <- cbind(longP,latP)
plantingPoints <- SpatialPoints(plantingCoords)

combineGPS如果我正确理解了您的问题，我不确定您为什么需要对采集数据执行循环。函数spDistsN1
将返回到指定点的距离矩阵。我认为您应该使用收获数据作为该函数的pts
，种植数据作为pt
输入，然后找到距离每个pt
最短的索引。仅在种植数据上循环。这样可以节省很多时间。另外，不要在spDistsN1
中指定longlat
，因为您的数据是空间点
，函数要求不要为这些对象指定
循环示例：
for (p in 1:rowsP){
     #Get the distance from the pth planting point to all of the havest points
     Dists <- spDistsN1(pts = harvestPoints, pt = plantingPoints[p,])

     #Find the index of the nearest harvest point to p. This is the minimum of Dists. (Note that there may be more than one minimum)
     NearestHarvest <- which(Dists == min(Dists))

     #Add information to the paired data
     pairedData[p,]<-c(planting[p,]$long, planting[p,]$lat, planting[p,]$variety, planting[p,]$seedling_rate, planting[p,]$seed_spacing, planting[p,]$speed, harvest[NearestHarvest,]$yield) 
   }

for（1中的p:rowsP）{
#获取pth种植点到所有havest点的距离
距离如果我正确理解了你的问题，我不确定你为什么需要来循环收割数据。函数spDistsN1
将返回到指定点的距离矩阵。我认为你应该使用收割数据作为pts
，种植数据作为pt
输入到f函数，然后找到到每个pt
的距离最短的索引。仅在种植数据上循环。这将节省大量时间。另外，不要在spDistsN1
中指定longlat
，因为您的数据是空间点
，函数说不要为这些对象指定
循环示例：
for (p in 1:rowsP){
     #Get the distance from the pth planting point to all of the havest points
     Dists <- spDistsN1(pts = harvestPoints, pt = plantingPoints[p,])

     #Find the index of the nearest harvest point to p. This is the minimum of Dists. (Note that there may be more than one minimum)
     NearestHarvest <- which(Dists == min(Dists))

     #Add information to the paired data
     pairedData[p,]<-c(planting[p,]$long, planting[p,]$lat, planting[p,]$variety, planting[p,]$seedling_rate, planting[p,]$seed_spacing, planting[p,]$speed, harvest[NearestHarvest,]$yield) 
   }

for（1中的p:rowsP）{
#获取pth种植点到所有havest点的距离
Dists您需要将收获文件（16626）中的每一行映射到种植（6314）文件中的一行，而不是相反。下图是xy平面上的收获和植物gps坐标图。红点是收获机点

精密农业机械是一种多行播种机和收割机。gps设备安装在机器内部。即，每个gps点都指向多行作物。在这种情况下，与收割机每次行程相比，播种机覆盖2X行。这解释了为什么收割文件具有~2X+数据点
基本方法是蛮力搜索，因为gps坐标在文件之间不重叠。我在R和Python中解决了这一问题，将整个区域分割成更小的均匀网格，并将搜索限制在最近的相邻网格上。从效率上讲，求解大约需要3-4分钟，平均距离为3米种植点和收获点之间的距离，这是合理的
您可以在my
上找到代码。您需要将harvest文件（16626）中的每一行映射到planting（6314）文件中的一行，而不是相反。下图是xy平面上的harvest和plant gps坐标图。红点是harvester点

精密农业机械是一种多行播种机和收割机。gps设备安装在机器内部。即，每个gps点都指向多行作物。在这种情况下，与收割机每次行程相比，播种机覆盖2X行。这解释了为什么收割文件具有~2X+数据点
基本方法是蛮力搜索，因为gps坐标在文件之间不重叠。我在R和Python中解决了这一问题，将整个区域分割成更小的均匀网格，并将搜索限制在最近的相邻网格上。从效率上讲，求解大约需要3-4分钟，平均距离为3米种植点和收获点之间的距离，这是合理的
你可以在我的电脑上找到密码