R:基于以不同数据集中的点为中心的特定分布从数据集中采样

R:基于以不同数据集中的点为中心的特定分布从数据集中采样,r,sampling,R,Sampling,我试图根据点在X-Y-Z平面上的分布,从X-Y-Z空间中的一组点df_map中采样行。分布的平均值和标准偏差在另一个数据集df_pts中 我的数据是这样的 > df_map X Y Z A 6 0 103 B -4 2 102 C -2 15 112 D 13 6 105 E 1 -3 117 F 5 16 105 G 10 5 103 H 14 -7 119 I 8 14 107 J -8 -4 100 > df_pts x y acc

我试图根据点在X-Y-Z平面上的分布,从X-Y-Z空间中的一组点df_map中采样行。分布的平均值和标准偏差在另一个数据集df_pts中

我的数据是这样的

> df_map
   X  Y   Z
A  6  0 103
B -4  2 102
C -2 15 112
D 13  6 105
E  1 -3 117
F  5 16 105
G 10  5 103
H 14 -7 119
I  8 14 107
J -8 -4 100

> df_pts
    x   y   accuracy
a   5  18 -0.8464018
b   3   2  0.5695678
c -18  14 -0.4711559
d  11  13 -0.7306417
e  -3 -10  2.1887011
f  -9 -11  2.1523923
g   5   1 -0.9612284
h  12 -19 -0.4750582
i -16  20 -1.4554292
j   0  -8  3.4028887
我想遍历df_pts中的行,并根据距离
(df_pts[I,x],df_pts[I,y])
的高斯分布从df_映射中选择一行,二维标准偏差为
df_pts[I,精度]
。换句话说,在每个
i=1:10
,我想根据正态分布从dfu图中抽取样本,平均
dfu pts[i,x]^2+dfu pts[i,y]^2
和2d sd
dfu pts[i,精度]

如果您能给我一个高效、复杂的方法,我将不胜感激。我对R比较陌生,来自C语言背景,我对这样的任务进行编码的方式涉及太多的基本循环和使用基本操作在每一步进行的计算,这使得代码非常慢

如果问题太琐碎或没有很好的框架,我提前表示歉意。

易用数据:

df_map <- data.frame(x = c(6,-4,-2,13,1,5,10,14,8,-8),
                     y= c(0,2,15,6,-3,16,5,-7,14,-4),
                     z= c(103,102,112,105,117,105,103,119,107,100))
df_pts <- data.frame(x = c(5,3,-18,11,-3,-9,5,12,-16,0),
              y= c(18,2,14,13,-10,-11,1,-19,20,-8),
              accuracy = c(-0.8464018, 0.5695678,-0.4711559,-0.7306417, 2.1887011, 2.1523923,-0.9612284,-0.4750582,-1.4554292,3.4028887))
正如您在本例中看到的,一些数据被多次匹配,因此,根据您的目标,您可能希望丢弃这些数据或进行双向搜索


我希望这就是您想要的

谢谢您的建议

我最后做了以下几件事

df_map <- data.frame(X = c(6,-4,-2,13,1,5,10,14,8,-8),
                     Y= c(0,2,15,6,-3,16,5,-7,14,-4),
                     Z= c(103,102,112,105,117,105,103,119,107,100))
df_pts <- data.frame(x = c(5,3,-18,11,-3,-9,5,12,-16,0),
                     y= c(18,2,14,13,-10,-11,1,-19,20,-8),
                     accuracy = c(-0.8464018, 0.5695678,-0.4711559,-0.7306417, 2.1887011, 2.1523923,-0.9612284,-0.4750582,-1.4554292,3.4028887))

map.point2map <- function(map_in, pt_in) {
  dists <- dist(rbind(cbind(x = pt_in['x'],
                           y = pt_in['y']),
                     cbind(x = map_in$X,
                           y = map_in$Y)))[1:dim(map_in)[1]]

  mu <- mean(dists)
  stddev <- abs(as.numeric(pt_in['accuracy']))

  return(sample_n(tbl = map_in[, c('X', 'Y')],
                  size = 1,
                  replace = TRUE,
                  weight = dnorm(dists, mean = mu, sd = stddev)))
}

mapped <- apply(df_pts,
                1,
                function(x) map.point2map(map_in = df_map,
                                          pt_in = x))

非常感谢您的建议。实际上,我编写了一个函数,它实现了get.knnx自己的功能,但是get.knnx的效率更高,代码看起来也更好。非常感谢。但是,这是选择最近的邻居。我想改变它,从概率为p(I)的dfu图中选择dfu图[I],其中p(I)=f(距离(dfu图[I,X],dfu图[I,Y]),从(dfu图[I,X],dfu图[I,Y])中选择,f是高斯分布的概率密度函数,平均dfu图[I,X]^2+dfu图[I,Y]^2和二维标准偏差dfu图[I,精度]。我试图做的基本上是采样(df_-map,1,prob=pdf_-df_-map,replace=T),其中pdf_-df_-map是具有(df_-map[I,X],df_-map[I,Y])的概率,如果(df_-pts[I,X],df_-pts[I,Y])周围的点的分布是正态的,具有二维标准偏差df_-pts[I,精度]
     x   y   accuracy x.1 y.1   z
1    5  18 -0.8464018   5  16 105
2    3   2  0.5695678   6   0 103
3  -18  14 -0.4711559  -2  15 112
4   11  13 -0.7306417   8  14 107
5   -3 -10  2.1887011  -8  -4 100
6   -9 -11  2.1523923  -8  -4 100
7    5   1 -0.9612284   6   0 103
8   12 -19 -0.4750582  14  -7 119
9  -16  20 -1.4554292  -2  15 112
10   0  -8  3.4028887   1  -3 117
df_map <- data.frame(X = c(6,-4,-2,13,1,5,10,14,8,-8),
                     Y= c(0,2,15,6,-3,16,5,-7,14,-4),
                     Z= c(103,102,112,105,117,105,103,119,107,100))
df_pts <- data.frame(x = c(5,3,-18,11,-3,-9,5,12,-16,0),
                     y= c(18,2,14,13,-10,-11,1,-19,20,-8),
                     accuracy = c(-0.8464018, 0.5695678,-0.4711559,-0.7306417, 2.1887011, 2.1523923,-0.9612284,-0.4750582,-1.4554292,3.4028887))

map.point2map <- function(map_in, pt_in) {
  dists <- dist(rbind(cbind(x = pt_in['x'],
                           y = pt_in['y']),
                     cbind(x = map_in$X,
                           y = map_in$Y)))[1:dim(map_in)[1]]

  mu <- mean(dists)
  stddev <- abs(as.numeric(pt_in['accuracy']))

  return(sample_n(tbl = map_in[, c('X', 'Y')],
                  size = 1,
                  replace = TRUE,
                  weight = dnorm(dists, mean = mu, sd = stddev)))
}

mapped <- apply(df_pts,
                1,
                function(x) map.point2map(map_in = df_map,
                                          pt_in = x))