如何创建在R中一行中同时出现的元素的共现矩阵？_R_Adjacency Matrix_Find Occurrences

如何创建在R中一行中同时出现的元素的共现矩阵？

如何创建在R中一行中同时出现的元素的共现矩阵？,r,adjacency-matrix,find-occurrences,R,Adjacency Matrix,Find Occurrences,我是R方面的新手，我想创建一个共现矩阵，基于哪些元素在一行中共现理想结果的基本示例假设你有这张桌子： df <- data.frame(ID = c(1,2,3), V1 = c("England", "England", "China"), V2 = c("Greece", "England", "Gree

我是R方面的新手，我想创建一个共现矩阵，基于哪些元素在一行中共现

理想结果的基本示例

假设你有这张桌子：

df <- data.frame(ID = c(1,2,3), 
                 V1 = c("England", "England", "China"),
                 V2 = c("Greece", "England", "Greece")
)

我的共现矩阵应如下所示：

Country   China    England  Greece
China           0       0      1 #China & Greece co-occur in row 3
England         0       1      1 #England & England co-occur in row 2, and England and Greece in row 1
Greece          1       1      0

然而，如果我遵循这一点，我会得到：

library(tidyverse)
df %>%
      pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
      xtabs(~ID + Country, data = ., sparse = FALSE) %>% 
      crossprod(., .) 
    
    df_diag <- df %>% 
      pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
      mutate(Country2 = Country) %>%
      xtabs(~Country + Country2, data = ., sparse = FALSE) %>% 
      diag()
    
    diag(df1) <- df_diag 
    
    df1

Country   China England Greece
  China       1       0      1
  England     0       3      1
  Greece      1       1      2

库（tidyverse）
df%>%
pivot_longer（-ID，name_to=“Category”，value_to=“Country”）%>%
xtabs（~ID+Country，data=，sparse=FALSE）%>%
crossprod（，.）
df_诊断%
pivot_longer（-ID，name_to=“Category”，value_to=“Country”）%>%
突变（Country2=国家）%>%
xtabs（~Country+Country2，数据=，稀疏=假）%>%
diag（）
诊断（df1）0和完成。案例（值）]->foo
#获取每个ID组（每行）中的不同值（国家）
唯一（foo，by=c（“ID”，“value”）->foo2
# https://stackoverflow.com/questions/13281303/creating-co-occurrence-matrix
#看到这个问题，您希望使用crossprod（）创建一个矩阵。
crossprod（表（foo2[，c（1,3）]）->mymat
#最后，您需要更改对角线值。如果一个值等于一，
#把它改为零。否则，保留原始值。
diag（mymat）您需要的是：
在base R中，您可以执行以下操作：
a <- table(lapply(df[-1], factor, levels = sort(unique(unlist(df[-1])))))
a[lower.tri(a)] <- t(a)[lower.tri(a)]
 a
         V2
V1        China England Greece
  China       0       0      1
  England     0       1      1
  Greece      1       1      0

您需要的是：
在base R中，您可以执行以下操作：
a <- table(lapply(df[-1], factor, levels = sort(unique(unlist(df[-1])))))
a[lower.tri(a)] <- t(a)[lower.tri(a)]
 a
         V2
V1        China England Greece
  China       0       0      1
  England     0       1      1
  Greece      1       1      0

您可以在基本R中使用outer
：
unique_vals <- sort(union(df$V1, df$V2))
co_mat <- function(x, y) +(any(df$V1 == x & df$V2 == y | 
                                df$V2 == x & df$V1 == y))
mat <- outer(unique_vals, unique_vals, Vectorize(co_mat))
dimnames(mat) <- list(unique_vals, unique_vals)
mat

#        China England Greece
#China       0       0      1
#England     0       1      1
#Greece      1       1      0

unique\u vals您可以在基本R中使用outer
：
unique_vals <- sort(union(df$V1, df$V2))
co_mat <- function(x, y) +(any(df$V1 == x & df$V2 == y | 
                                df$V2 == x & df$V1 == y))
mat <- outer(unique_vals, unique_vals, Vectorize(co_mat))
dimnames(mat) <- list(unique_vals, unique_vals)
mat

#        China England Greece
#China       0       0      1
#England     0       1      1
#Greece      1       1      0

unique\u vals可能重复：我认为棘手的部分只是确保V1和V2列的计数因子级别相同，但基本上只是df%>%的变异（跨（c（V1，V2），~factor（.x，levels=sort（unique）（c（V1，V2щщщ）%>%xtabs（~V1+V2，）
应该可以工作。可能的重复：我认为棘手的部分是确保V1和V2列的计数因子级别相同，但基本上只需df%>%变异（跨（c（V1，V2），~factor（.x，levels=sort（unique）（c（V1，V2 k k k）]>%xtabs（~V1+V2，）应该可以工作。
unique_vals <- sort(union(df$V1, df$V2))
co_mat <- function(x, y) +(any(df$V1 == x & df$V2 == y | 
                                df$V2 == x & df$V1 == y))
mat <- outer(unique_vals, unique_vals, Vectorize(co_mat))
dimnames(mat) <- list(unique_vals, unique_vals)
mat

#        China England Greece
#China       0       0      1
#England     0       1      1
#Greece      1       1      0