R除去某些值后矩阵的最大距离
假设我们有一个如下所示的矩阵R除去某些值后矩阵的最大距离,r,matrix,R,Matrix,假设我们有一个如下所示的矩阵 A <- matrix(c(1,7,13,19,9,5,8,14,20,10,3,4,15,21,1,2,4,16,22,2,8,3,17,23,1,6,3,18,24,2), nrow=5) dist函数可以使用dist(A,method=“maximum”)计算矩阵A每行与返回距离矩阵D之间的最大绝对距离D[i,j]=\max_{k}(|A[i,k]-A[j,k]|) D[1,2] = max( abs( A[1,] - A[2,] ) ) = max
A <- matrix(c(1,7,13,19,9,5,8,14,20,10,3,4,15,21,1,2,4,16,22,2,8,3,17,23,1,6,3,18,24,2), nrow=5)
dist
函数可以使用dist(A,method=“maximum”)
计算矩阵A
每行与返回距离矩阵D
之间的最大绝对距离<例如,code>D[i,j]=\max_{k}(|A[i,k]-A[j,k]|)
D[1,2] = max( abs( A[1,] - A[2,] ) ) = max(6, 3, 1, 2, 5, 3) = 6
然而,在我的例子中,我需要首先删除I,j
元素,即D[I,j]=\max_{k不等于I或j}(| A[I,k]-A[j,k]|)
,例如,在上面的例子中,答案是
D[1,2] = max( abs( A[1,] - A[2,] ) ) = max( 1, 2, 5, 3) = 5
我不知道如何有效地执行此操作,我知道我可以使用for循环,但数据集很大,for循环速度非常慢。假设实际矩阵的列数也多于行数。以下是所需函数的基本R实现:
max_dist <- function(mat, i, j) {
mat <- mat[c(i, j), -c(i, j)]
max(abs(mat[1L, ] - mat[2L, ]))
}
dist1 <- function(mat) {
n <- nrow(mat)
ids <- do.call(rbind, lapply(2:n, function(i, e) cbind(i:e, rep.int(i - 1L, e - i + 1L)), n))
out <- apply(ids, 1L, function(i) max_dist(mat, i[[1L]], i[[2L]]))
attributes(out) <- list(
Size = n, Labels = dimnames(mat)[[1L]], Diag = FALSE,
Upper = FALSE, method = "dist1", call = match.call(),
class = "dist"
)
out
}
使用以下矩阵进行测试(small\m
是您文章中的示例):
输出如下
> dist1(small_m)
1 2 3 4
2 5
3 14 15
4 18 21 6
5 5 3 16 22
> dist2(small_m)
1 2 3 4
2 5
3 14 15
4 18 21 6
5 5 3 16 22
这里是一个基本R选项,使用
dist
+combn
+as.dist
r <- diag(0,nrow(m))
r[lower.tri(r)] <- combn(1:nrow(m),2,function(k) max(abs(do.call(`-`,asplit(m[k,],1)))[-k]))
out <- as.dist(r)
数据
> dput(m)
structure(c(1, 7, 13, 19, 9, 5, 8, 14, 20, 10, 3, 4, 15, 21,
1, 2, 4, 16, 22, 2, 8, 3, 17, 23, 1, 6, 3, 18, 24, 2), .Dim = 5:6)
非常感谢,如果列数等于行数会有帮助。@kevin据我所知,这还是可以的。真正的问题是,例如,当您有6行5列时。由于没有第6列可供您删除,您将收到一些错误,如“下标超出范围”。非常感谢
# no real difference between these two implementations when the input matrix is small
> microbenchmark::microbenchmark(dist1(small_m), dist2(small_m))
Unit: microseconds
expr min lq mean median uq max neval cld
dist1(small_m) 77.4 87.10 112.403 106.5 125.95 212.2 100 a
dist2(small_m) 145.5 160.25 177.786 170.2 183.80 286.7 100 b
# `dist2` is faster with large matrix input. However, the efficiency of `dist1` is also acceptable IMO.
> microbenchmark::microbenchmark(dist1(large_m), dist2(large_m))
Unit: milliseconds
expr min lq mean median uq max neval cld
dist1(large_m) 129.7531 139.3909 152.13154 143.0549 149.5870 322.0173 100 b
dist2(large_m) 48.8025 52.5081 55.84333 55.5175 58.6095 67.6470 100 a
> dist1(small_m)
1 2 3 4
2 5
3 14 15
4 18 21 6
5 5 3 16 22
> dist2(small_m)
1 2 3 4
2 5
3 14 15
4 18 21 6
5 5 3 16 22
r <- diag(0,nrow(m))
r[lower.tri(r)] <- combn(1:nrow(m),2,function(k) max(abs(do.call(`-`,asplit(m[k,],1)))[-k]))
out <- as.dist(r)
1 2 3 4
2 5
3 14 15
4 18 21 6
5 5 3 16 22
> dput(m)
structure(c(1, 7, 13, 19, 9, 5, 8, 14, 20, 10, 3, 4, 15, 21,
1, 2, 4, 16, 22, 2, 8, 3, 17, 23, 1, 6, 3, 18, 24, 2), .Dim = 5:6)