R中两个goups之间的马氏距离
我有两组,每组有3个变量,如下所示:R中两个goups之间的马氏距离,r,classification,R,Classification,我有两组,每组有3个变量,如下所示: Group1: cost time quality [1,] 90 4 70 [2,] 4 27 37 [3,] 82 4 17 [4,] 18 41 4 benchmark<-rbind(c(90,4,70),c(4,27,37),c(82,4,17),c(18,41,4)) colnames(benchmark)=c('co
Group1:
cost time quality
[1,] 90 4 70
[2,] 4 27 37
[3,] 82 4 17
[4,] 18 41 4
benchmark<-rbind(c(90,4,70),c(4,27,37),c(82,4,17),c(18,41,4))
colnames(benchmark)=c('cost','time','quality')
current=rbind(c(4,27,4))
colnames(current)=c('cost','time','quality')
bdm<-as.matrix(benchmark)
cdm<-as.matrix(current)
mat1<-matrix(bdm,ncol=ncol(bdm),dimnames=NULL)
mat2<-matrix(cdm,ncol=ncol(cdm),dimnames=NULL)
#center Data
mat1.1<-scale(mat1,center = T,scale = F)
mat2.1<-scale(mat2,center=T,scale=F)
#cov Matrix
mat1.2<-cov(mat1.1,method="pearson")
mat2.2<-cov(mat2.1,method="pearson")
#the pooled covariance is calculated using weighted average
n1<-nrow(mat1)
n2<-nrow(mat2)
n3<-n1+n2
#pooled matrix
#pooled matrix
mat3<-((n1/n3)*mat1.2) + ((n2/n3)*mat2.2)
mat4<-solve(mat3)
#Mean diff
mat5<-as.matrix((colMeans(mat1)-colMeans(mat2)))
#multiply
mat6<-t(mat5)%*%mat4
#Mahalanobis distance
sqrt(mat6 %*% mat5)
第2组:
cost time quality
[1,] 4 27 4
计算两组之间马氏距离的代码如下:
Group1:
cost time quality
[1,] 90 4 70
[2,] 4 27 37
[3,] 82 4 17
[4,] 18 41 4
benchmark<-rbind(c(90,4,70),c(4,27,37),c(82,4,17),c(18,41,4))
colnames(benchmark)=c('cost','time','quality')
current=rbind(c(4,27,4))
colnames(current)=c('cost','time','quality')
bdm<-as.matrix(benchmark)
cdm<-as.matrix(current)
mat1<-matrix(bdm,ncol=ncol(bdm),dimnames=NULL)
mat2<-matrix(cdm,ncol=ncol(cdm),dimnames=NULL)
#center Data
mat1.1<-scale(mat1,center = T,scale = F)
mat2.1<-scale(mat2,center=T,scale=F)
#cov Matrix
mat1.2<-cov(mat1.1,method="pearson")
mat2.2<-cov(mat2.1,method="pearson")
#the pooled covariance is calculated using weighted average
n1<-nrow(mat1)
n2<-nrow(mat2)
n3<-n1+n2
#pooled matrix
#pooled matrix
mat3<-((n1/n3)*mat1.2) + ((n2/n3)*mat2.2)
mat4<-solve(mat3)
#Mean diff
mat5<-as.matrix((colMeans(mat1)-colMeans(mat2)))
#multiply
mat6<-t(mat5)%*%mat4
#Mahalanobis distance
sqrt(mat6 %*% mat5)
以及警告信息:
In colMeans(mat1) - colMeans(mat2) :
longer object length is not a multiple of shorter object length
我觉得你想做的事情一定存在于某个
R
包中。经过相当彻底的搜索,我在packageasbio
中找到了函数D.sq
,看起来非常接近。此函数需要2个矩阵作为输入,因此它不适用于您的示例。我还包括一个修改版本,它接受第二个矩阵的向量
# Original Function
D.sq <- function (g1, g2) {
dbar <- as.vector(colMeans(g1) - colMeans(g2))
S1 <- cov(g1)
S2 <- cov(g2)
n1 <- nrow(g1)
n2 <- nrow(g2)
V <- as.matrix((1/(n1 + n2 - 2)) * (((n1 - 1) * S1) + ((n2 -
1) * S2)))
D.sq <- t(dbar) %*% solve(V) %*% dbar
res <- list()
res$D.sq <- D.sq
res$V <- V
res
}
# Data
g1 <- matrix(c(90, 4, 70, 4, 27, 37, 82, 4, 17, 18, 41, 4), ncol = 3, byrow = TRUE)
g2 <- c(2, 27, 4)
# Function modified to accept a vector for g2 rather than a matrix
D.sq2 <- function (g1, g2) {
dbar <- as.vector(colMeans(g1) - g2)
S1 <- cov(g1)
S2 <- var(g2)
n1 <- nrow(g1)
n2 <- length(g2)
V <- as.matrix((1/(n1 + n2 - 2)) * (((n1 - 1) * S1) + ((n2 -
1) * S2)))
D.sq <- t(dbar) %*% solve(V) %*% dbar
res <- list()
res$D.sq <- D.sq
res$V <- V
res
}
#原始功能
D.sq如果我将mat2.2中的NA替换为零,添加mat2.2[is.NA(mat2.2)]对吗?非常感谢,是的,区别在于Vassum中的分母是g2的3*3矩阵,而不是向量su ac,如下所示:然后我使用apply函数将矩阵转换为向量:“我不太明白你的新情况。我注意到,stats
中的mahalanobis
函数将一个矩阵作为一个输入,一个列数相同的向量,并给出每行的距离。这就是你想要的吗?考虑<代码>马哈拉诺比斯(G1,G2,COV(G1))< /COD>给出长度为4的向量,因为<代码> G1 >有4行。是的,我希望每行与基准矩阵完全相距,我想您可以直接使用<代码>马哈拉诺比斯< /代码>。您可以在提示符下键入mahalanobis
,查看它正在使用的代码。