as.matrix()和as.dist()的结果不同

as.matrix()和as.dist()的结果不同,r,matrix,hierarchical-clustering,cosine-similarity,R,Matrix,Hierarchical Clustering,Cosine Similarity,我有一个列表“simil”,其中包含7个向量: > dput(simil) structure(list(Monday = structure(c(0.889987253484581, 0.882957894295089, 0.882232353177177, 0.874080268021168, 0.851760771472629, 0.811536071048775 ), .Names = c("Sunday", "Tuesday", "Friday", "Wednesday",

我有一个列表“simil”,其中包含7个向量:

 > dput(simil)
structure(list(Monday = structure(c(0.889987253484581, 0.882957894295089, 
0.882232353177177, 0.874080268021168, 0.851760771472629, 0.811536071048775
), .Names = c("Sunday", "Tuesday", "Friday", "Wednesday", "Thursday", 
"Saturday")), Tuesday = structure(c(0.901682757072732, 0.882957894295089, 
0.874716806575548, 0.869202937572079, 0.855248496101086, 0.818659253763272
), .Names = c("Sunday", "Monday", "Wednesday", "Friday", "Thursday", 
"Saturday")), Wednesday = structure(c(0.88354911311872, 0.874716806575548, 
0.874080268021168, 0.853293126413937, 0.851921112754124, 0.841170795359615
), .Names = c("Sunday", "Tuesday", "Monday", "Friday", "Thursday", 
"Saturday")), Thursday = structure(c(0.86579834238668, 0.855248496101086, 
0.851921112754124, 0.851760771472629, 0.851384896045153, 0.836732564057725
), .Names = c("Sunday", "Tuesday", "Wednesday", "Monday", "Friday", 
"Saturday")), Friday = structure(c(0.882232353177177, 0.869202937572079, 
0.856441568566172, 0.853293126413937, 0.851384896045153, 0.80098779448239
), .Names = c("Monday", "Tuesday", "Sunday", "Wednesday", "Thursday", 
"Saturday")), Saturday = structure(c(0.866654844262859, 0.841170795359615, 
0.836732564057725, 0.818659253763272, 0.811536071048775, 0.80098779448239
), .Names = c("Sunday", "Wednesday", "Thursday", "Tuesday", "Monday", 
"Friday")), Sunday = structure(c(0.901682757072732, 0.889987253484581, 
0.88354911311872, 0.866654844262859, 0.86579834238668, 0.856441568566172
), .Names = c("Tuesday", "Monday", "Wednesday", "Saturday", "Thursday", 
"Friday"))), .Names = c("Monday", "Tuesday", "Wednesday", "Thursday", 
"Friday", "Saturday", "Sunday"), class = c("similMatrix", "list"
))
现在我想将其转换为dist对象,然后将其用于hclust()。因此我使用as.dist()并计算:

> as.dist(simil,diag = TRUE, upper = TRUE)
             Monday    Sunday   Tuesday    Friday Wednesday  Thursday  Saturday
Monday    0.0000000 0.8899873 0.8829579 0.8822324 0.8740803 0.8517608 0.8115361
Sunday    0.8899873 0.0000000 1.0000000 0.8692029 0.8747168 0.8552485 0.8186593
Tuesday   0.8829579 1.0000000 0.0000000 0.8532931 1.0000000 0.8519211 0.8411708
Friday    0.8822324 0.8692029 0.8532931 0.0000000 0.8519211 1.0000000 0.8367326
Wednesday 0.8740803 0.8747168 1.0000000 0.8519211 0.0000000 0.8513849 0.8009878
Thursday  0.8517608 0.8552485 0.8519211 1.0000000 0.8513849 0.0000000 1.0000000
Saturday  0.8115361 0.8186593 0.8411708 0.8367326 0.8009878 1.0000000 0.0000000
但这与我使用as.matrix()时的结果略有不同:


使用
as.dist()
,矩阵不是完全对称的,有些对会出错,而使用
as.matrix()
时不会发生这种情况。为什么呢?如何更正它?

因此,最后我通过首先转换为矩阵,然后交换行顺序,最后转换为dist对象来修复它:

simil = as.matrix(simil)
simil = simil[ c(1,3,5,6,4,7,2),]
simil = as.dist(1-simil,diag = TRUE, upper = TRUE)

> simil
              Monday    Tuesday  Wednesday   Thursday     Friday   Saturday     Sunday
Monday    0.00000000 0.11704211 0.12591973 0.14823923 0.11776765 0.18846393 0.11001275
Tuesday   0.11704211 0.00000000 0.12528319 0.14475150 0.13079706 0.18134075 0.09831724
Wednesday 0.12591973 0.12528319 0.00000000 0.14807889 0.14670687 0.15882920 0.11645089
Thursday  0.14823923 0.14475150 0.14807889 0.00000000 0.14861510 0.16326744 0.13420166
Friday    0.11776765 0.13079706 0.14670687 0.14861510 0.00000000 0.19901221 0.14355843
Saturday  0.18846393 0.18134075 0.15882920 0.16326744 0.19901221 0.00000000 0.13334516
Sunday    0.11001275 0.09831724 0.11645089 0.13420166 0.14355843 0.13334516 0.00000000

这可能是因为“simil”是从quanteda包的similarity()函数创建的。

如果它是您提到的
list
sapply/lappy
是循环遍历`列表'的方法。如果您发布examp的dput会更好;我已经用dput()更新了问题。但我不明白,我应该如何使用sappy/lapply将我的列表转换为dist对象?as.dist()不应该已经这样做了吗?基于您的dput,您使用的代码没有给出您显示的输出,但是,
simplify2array(simil)
给出了一个矩阵sorry,我不明白您在说什么,我尝试通过dput()重新生成变量,再次得到了完全相同的结果。我试过simplify2array(simil),它也给了我一个不同的矩阵,有错误的值,所以这不是我所需要的。CRAN上有8439个包-任何人都不可能知道它们的全部。您需要在提问时提及您正在使用的额外软件包:)
simil = as.matrix(simil)
simil = simil[ c(1,3,5,6,4,7,2),]
simil = as.dist(1-simil,diag = TRUE, upper = TRUE)

> simil
              Monday    Tuesday  Wednesday   Thursday     Friday   Saturday     Sunday
Monday    0.00000000 0.11704211 0.12591973 0.14823923 0.11776765 0.18846393 0.11001275
Tuesday   0.11704211 0.00000000 0.12528319 0.14475150 0.13079706 0.18134075 0.09831724
Wednesday 0.12591973 0.12528319 0.00000000 0.14807889 0.14670687 0.15882920 0.11645089
Thursday  0.14823923 0.14475150 0.14807889 0.00000000 0.14861510 0.16326744 0.13420166
Friday    0.11776765 0.13079706 0.14670687 0.14861510 0.00000000 0.19901221 0.14355843
Saturday  0.18846393 0.18134075 0.15882920 0.16326744 0.19901221 0.00000000 0.13334516
Sunday    0.11001275 0.09831724 0.11645089 0.13420166 0.14355843 0.13334516 0.00000000