Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/67.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 从虚拟编码的观测值创建共现矩阵_R - Fatal编程技术网

R 从虚拟编码的观测值创建共现矩阵

R 从虚拟编码的观测值创建共现矩阵,r,R,是否有一种简单的方法可以将一个关于某个方面是否存在的带有假人(二进制编码)的数据帧转换为包含两个方面共现计数的共现矩阵 从这个开始 X <- data.frame(rbind(c(1,0,1,0), c(0,1,1,0), c(0,1,1,1), c(0,0,1,0))) X X1 X2 X3 X4 1 1 0 1 0 2 0 1 1 0 3 0 1 1 1 4 0 0 1 0 这将实现以下目的: X <- as.matrix(X) out &l

是否有一种简单的方法可以将一个关于某个方面是否存在的带有假人(二进制编码)的数据帧转换为包含两个方面共现计数的共现矩阵

从这个开始

X <- data.frame(rbind(c(1,0,1,0), c(0,1,1,0), c(0,1,1,1), c(0,0,1,0)))
X
  X1 X2 X3 X4
1  1  0  1  0
2  0  1  1  0
3  0  1  1  1
4  0  0  1  0

这将实现以下目的:

X <- as.matrix(X)
out <- crossprod(X)  # Same as: t(X) %*% X
diag(out) <- 0       # (b/c you don't count co-occurrences of an aspect with itself)
out
#      [,1] [,2] [,3] [,4]
# [1,]    0    0    1    0
# [2,]    0    0    2    1
# [3,]    1    2    0    1
# [4,]    0    1    1    0

X虽然没有什么能比得上上面简单的答案,但只需发布
tidyverse
aproach供将来参考

Y <- X %>% mutate(id = row_number()) %>%
  pivot_longer(-id) %>% filter(value !=0)

merge(Y, Y, by = "id", all = T) %>%
  filter(name.x != name.y) %>%
  group_by(name.x, name.y) %>%
  summarise(val = n()) %>%
  pivot_wider(names_from = name.y, values_from = val, values_fill = 0, names_sort = T) %>%
  column_to_rownames("name.x")

   X1 X2 X3 X4
X1  0  0  1  0
X2  0  0  2  1
X3  1  2  0  1
X4  0  1  1  0
Y%变异(id=row_number())%>%
pivot_更长(-id)%%>%筛选器(值!=0)
合并(Y,Y,by=“id”,all=T)%>%
过滤器(name.x!=name.y)%>%
分组依据(name.x,name.y)%>%
摘要(val=n())%>%
pivot\u Widther(名称\u from=name.y,值\u from=val,值\u fill=0,名称\u排序=T)%>%
列到行名称(“name.x”)
x1x2x3x4
X1 0 0 1 0
x202021
X3 1 2 0 1
x40110

有趣的是,对角线只是X的列和。非常好;简单易行+1@bdemarest-有趣的是考虑到方差协方差矩阵的相似性,它们只在计算<代码> t(x)%*%x之前集中在列中。
nms <- paste("X", 1:4, sep="")
dimnames(out) <- list(nms, nms)
out <- as.data.frame(out)
Y <- X %>% mutate(id = row_number()) %>%
  pivot_longer(-id) %>% filter(value !=0)

merge(Y, Y, by = "id", all = T) %>%
  filter(name.x != name.y) %>%
  group_by(name.x, name.y) %>%
  summarise(val = n()) %>%
  pivot_wider(names_from = name.y, values_from = val, values_fill = 0, names_sort = T) %>%
  column_to_rownames("name.x")

   X1 X2 X3 X4
X1  0  0  1  0
X2  0  0  2  1
X3  1  2  0  1
X4  0  1  1  0