R除去某些值后矩阵的最大距离

R除去某些值后矩阵的最大距离,r,matrix,R,Matrix,假设我们有一个如下所示的矩阵 A <- matrix(c(1,7,13,19,9,5,8,14,20,10,3,4,15,21,1,2,4,16,22,2,8,3,17,23,1,6,3,18,24,2), nrow=5) dist函数可以使用dist(A,method=“maximum”)计算矩阵A每行与返回距离矩阵D之间的最大绝对距离D[i,j]=\max_{k}(|A[i,k]-A[j,k]|) D[1,2] = max( abs( A[1,] - A[2,] ) ) = max

假设我们有一个如下所示的矩阵

A <- matrix(c(1,7,13,19,9,5,8,14,20,10,3,4,15,21,1,2,4,16,22,2,8,3,17,23,1,6,3,18,24,2), nrow=5)
dist
函数可以使用
dist(A,method=“maximum”)
计算矩阵
A
每行与返回距离矩阵
D
之间的最大绝对距离<例如,code>D[i,j]=\max_{k}(|A[i,k]-A[j,k]|)

 D[1,2] = max( abs( A[1,] - A[2,] ) ) = max(6, 3, 1, 2, 5, 3) = 6 
然而,在我的例子中,我需要首先删除
I,j
元素,即
D[I,j]=\max_{k不等于I或j}(| A[I,k]-A[j,k]|)
,例如,在上面的例子中,答案是

 D[1,2] = max( abs( A[1,] - A[2,] ) ) = max( 1, 2, 5, 3) = 5

我不知道如何有效地执行此操作,我知道我可以使用for循环,但数据集很大,for循环速度非常慢。

假设实际矩阵的列数也多于行数。以下是所需函数的基本R实现:

max_dist <- function(mat, i, j) {
  mat <- mat[c(i, j), -c(i, j)]
  max(abs(mat[1L, ] - mat[2L, ]))
}

dist1 <- function(mat) {
  n <- nrow(mat)
  ids <- do.call(rbind, lapply(2:n, function(i, e) cbind(i:e, rep.int(i - 1L, e - i + 1L)), n))
  out <- apply(ids, 1L, function(i) max_dist(mat, i[[1L]], i[[2L]]))
  attributes(out) <- list(
    Size = n, Labels = dimnames(mat)[[1L]], Diag = FALSE, 
    Upper = FALSE, method = "dist1", call = match.call(), 
    class = "dist"
  )
  out
}
使用以下矩阵进行测试(
small\m
是您文章中的示例):

输出如下

> dist1(small_m)
   1  2  3  4
2  5         
3 14 15      
4 18 21  6   
5  5  3 16 22
> dist2(small_m)
   1  2  3  4
2  5         
3 14 15      
4 18 21  6   
5  5  3 16 22

这里是一个基本R选项,使用
dist
+
combn
+
as.dist

r <- diag(0,nrow(m))
r[lower.tri(r)] <- combn(1:nrow(m),2,function(k) max(abs(do.call(`-`,asplit(m[k,],1)))[-k]))
out <- as.dist(r)
数据

> dput(m)
structure(c(1, 7, 13, 19, 9, 5, 8, 14, 20, 10, 3, 4, 15, 21, 
1, 2, 4, 16, 22, 2, 8, 3, 17, 23, 1, 6, 3, 18, 24, 2), .Dim = 5:6)

非常感谢,如果列数等于行数会有帮助。@kevin据我所知,这还是可以的。真正的问题是,例如,当您有6行5列时。由于没有第6列可供您删除,您将收到一些错误,如“下标超出范围”。非常感谢
# no real difference between these two implementations when the input matrix is small
> microbenchmark::microbenchmark(dist1(small_m), dist2(small_m))
Unit: microseconds
           expr   min     lq    mean median     uq   max neval cld
 dist1(small_m)  77.4  87.10 112.403  106.5 125.95 212.2   100  a 
 dist2(small_m) 145.5 160.25 177.786  170.2 183.80 286.7   100   b

# `dist2` is faster with large matrix input. However, the efficiency of `dist1` is also acceptable IMO.
> microbenchmark::microbenchmark(dist1(large_m), dist2(large_m))
Unit: milliseconds
           expr      min       lq      mean   median       uq      max neval cld
 dist1(large_m) 129.7531 139.3909 152.13154 143.0549 149.5870 322.0173   100   b
 dist2(large_m)  48.8025  52.5081  55.84333  55.5175  58.6095  67.6470   100  a 
> dist1(small_m)
   1  2  3  4
2  5         
3 14 15      
4 18 21  6   
5  5  3 16 22
> dist2(small_m)
   1  2  3  4
2  5         
3 14 15      
4 18 21  6   
5  5  3 16 22
r <- diag(0,nrow(m))
r[lower.tri(r)] <- combn(1:nrow(m),2,function(k) max(abs(do.call(`-`,asplit(m[k,],1)))[-k]))
out <- as.dist(r)
   1  2  3  4
2  5
3 14 15
4 18 21  6
5  5  3 16 22
> dput(m)
structure(c(1, 7, 13, 19, 9, 5, 8, 14, 20, 10, 3, 4, 15, 21, 
1, 2, 4, 16, 22, 2, 8, 3, 17, 23, 1, 6, 3, 18, 24, 2), .Dim = 5:6)