R中归一化欧氏距离的计算_R_Euclidean Distance_Sapply

R中归一化欧氏距离的计算

R中归一化欧氏距离的计算,r,euclidean-distance,sapply,R,Euclidean Distance,Sapply,我的数据框架如下： Binning_data[1:4,] person_id V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 1 312 74 80 NA 87 90.0 85 88 98 96.5 99 94 95 90 90 93 106 2 316 NA NA 116 106 105.0 110 102 105 105.0 102 98 1

我的数据框架如下：

Binning_data[1:4,]
  person_id  V1  V2  V3  V4    V5  V6  V7  V8    V9 V10 V11 V12 V13 V14 V15 V16
1       312  74  80  NA  87  90.0  85  88  98  96.5  99  94  95  90  90  93 106
2       316  NA  NA 116 106 105.0 110 102 105 105.0 102  98 101  98  92  89  91
3       318  71  61  61  61  60.5  68  62  67  64.0  60  59  60  62  59  63  63
4       319  64  NA  80  80  83.0  84  87  83  85.0  88  87  95  74  70  63  83

我想计算一个给定的“index_person_id”（比如312）与所有其他person_id之间的欧几里德距离，同时忽略所有NAs

例如：“312”和“316”之间的标准化欧几里德距离应忽略前3个存储单元（V1、V2、V3），因为两行中至少有一行具有NAs。它应该只计算从第4个箱子到第16个箱子的欧几里德距离，然后除以13（非空箱子的数量）

Binning_数据的尺寸为10000*17

输出文件的大小应为10000*2，第一列为person_id，第二列为“标准化欧几里德距离”

我目前正在使用sapply用于此目的：

index_person<-binning_data[which(binning_data$person_id==index_person_id),]
non_empty_index_person=which(is.na(index_person[2:ncol(index_person)])==FALSE)

distance[,2]<-sapply(seq_along(binning_data$person_id),function(j) {

compare_person<-binning_data[j,]    
non_empty_compare_person=which(is.na(compare_person[2:ncol(compare_person)])==FALSE)
non_empty=intersect(non_empty_index_person,non_empty_compare_person)
distance_temp=(index_person[non_empty+1]-compare_person[non_empty+1])^2
as.numeric(mean(distance_temp))    
})

index\u person如果我运行你的代码，我会得到：
 0.0000 146.0192 890.9000 200.8750

如果将数据帧转换为矩阵，进行转置，则可以减去列，然后使用na.rm=TRUE
onmean
获得所需的距离。可以使用colMeans
对列执行此操作。以下是示例数据的第二行II
：
> II = 1
> m = t(as.matrix(binning_data[,-1]))
> colMeans((m - m[,II])^2, na.rm=TRUE)
       1        2        3        4 
  0.0000 146.0192 890.9000 200.8750 

然后，10000x2矩阵为（其中10000==4）：
如果要为给定的索引列表计算此值，请循环它，可能像这样使用lappy
和rbind
将其作为更改的数据帧重新组合在一起：
II = c(1,2,1,4,4)
do.call(rbind,lapply(II, function(i){data.frame(i,d=colMeans((m-m[,i])^2,na.rm=TRUE))}))
   i         d
1  1    0.0000
2  1  146.0192
3  1  890.9000
4  1  200.8750
11 2  146.0192
21 2    0.0000
31 2 1595.0179
41 2  456.7143
12 1    0.0000
22 1  146.0192
32 1  890.9000
42 1  200.8750
13 4  200.8750
23 4  456.7143
33 4  420.8833
43 4    0.0000
14 4  200.8750
24 4  456.7143
34 4  420.8833
44 4    0.0000

这是一个4x长度（II）
-行矩阵
II = c(1,2,1,4,4)
do.call(rbind,lapply(II, function(i){data.frame(i,d=colMeans((m-m[,i])^2,na.rm=TRUE))}))
   i         d
1  1    0.0000
2  1  146.0192
3  1  890.9000
4  1  200.8750
11 2  146.0192
21 2    0.0000
31 2 1595.0179
41 2  456.7143
12 1    0.0000
22 1  146.0192
32 1  890.9000
42 1  200.8750
13 4  200.8750
23 4  456.7143
33 4  420.8833
43 4    0.0000
14 4  200.8750
24 4  456.7143
34 4  420.8833
44 4    0.0000