如何创建在R中一行中同时出现的元素的共现矩阵?

如何创建在R中一行中同时出现的元素的共现矩阵?,r,adjacency-matrix,find-occurrences,R,Adjacency Matrix,Find Occurrences,我是R方面的新手,我想创建一个共现矩阵,基于哪些元素在一行中共现 理想结果的基本示例 假设你有这张桌子: df <- data.frame(ID = c(1,2,3), V1 = c("England", "England", "China"), V2 = c("Greece", "England", "Gree

我是R方面的新手,我想创建一个共现矩阵,基于哪些元素在一行中共现

理想结果的基本示例

假设你有这张桌子:

df <- data.frame(ID = c(1,2,3), 
                 V1 = c("England", "England", "China"),
                 V2 = c("Greece", "England", "Greece")
)
我的共现矩阵应如下所示:

Country   China    England  Greece
China           0       0      1 #China & Greece co-occur in row 3
England         0       1      1 #England & England co-occur in row 2, and England and Greece in row 1
Greece          1       1      0 
然而,如果我遵循这一点,我会得到:

library(tidyverse)
df %>%
      pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
      xtabs(~ID + Country, data = ., sparse = FALSE) %>% 
      crossprod(., .) 
    
    df_diag <- df %>% 
      pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
      mutate(Country2 = Country) %>%
      xtabs(~Country + Country2, data = ., sparse = FALSE) %>% 
      diag()
    
    diag(df1) <- df_diag 
    
    df1

Country   China England Greece
  China       1       0      1
  England     0       3      1
  Greece      1       1      2
库(tidyverse)
df%>%
pivot_longer(-ID,name_to=“Category”,value_to=“Country”)%>%
xtabs(~ID+Country,data=,sparse=FALSE)%>%
crossprod(,.)
df_诊断%
pivot_longer(-ID,name_to=“Category”,value_to=“Country”)%>%
突变(Country2=国家)%>%
xtabs(~Country+Country2,数据=,稀疏=假)%>%
diag()
诊断(df1)0和完成。案例(值)]->foo
#获取每个ID组(每行)中的不同值(国家)
唯一(foo,by=c(“ID”,“value”)->foo2
# https://stackoverflow.com/questions/13281303/creating-co-occurrence-matrix
#看到这个问题,您希望使用crossprod()创建一个矩阵。
crossprod(表(foo2[,c(1,3)])->mymat
#最后,您需要更改对角线值。如果一个值等于一,
#把它改为零。否则,保留原始值。
diag(mymat)您需要的是:

在base R中,您可以执行以下操作:

a <- table(lapply(df[-1], factor, levels = sort(unique(unlist(df[-1])))))
a[lower.tri(a)] <- t(a)[lower.tri(a)]
 a
         V2
V1        China England Greece
  China       0       0      1
  England     0       1      1
  Greece      1       1      0
您需要的是:

在base R中,您可以执行以下操作:

a <- table(lapply(df[-1], factor, levels = sort(unique(unlist(df[-1])))))
a[lower.tri(a)] <- t(a)[lower.tri(a)]
 a
         V2
V1        China England Greece
  China       0       0      1
  England     0       1      1
  Greece      1       1      0

您可以在基本R中使用
outer

unique_vals <- sort(union(df$V1, df$V2))
co_mat <- function(x, y) +(any(df$V1 == x & df$V2 == y | 
                                df$V2 == x & df$V1 == y))
mat <- outer(unique_vals, unique_vals, Vectorize(co_mat))
dimnames(mat) <- list(unique_vals, unique_vals)
mat

#        China England Greece
#China       0       0      1
#England     0       1      1
#Greece      1       1      0

unique\u vals您可以在基本R中使用
outer

unique_vals <- sort(union(df$V1, df$V2))
co_mat <- function(x, y) +(any(df$V1 == x & df$V2 == y | 
                                df$V2 == x & df$V1 == y))
mat <- outer(unique_vals, unique_vals, Vectorize(co_mat))
dimnames(mat) <- list(unique_vals, unique_vals)
mat

#        China England Greece
#China       0       0      1
#England     0       1      1
#Greece      1       1      0

unique\u vals可能重复:我认为棘手的部分只是确保V1和V2列的计数因子级别相同,但基本上只是
df%>%的变异(跨(c(V1,V2),~factor(.x,levels=sort(unique)(c(V1,V2щщщ)%>%xtabs(~V1+V2,)
应该可以工作。可能的重复:我认为棘手的部分是确保V1和V2列的计数因子级别相同,但基本上只需
df%>%变异(跨(c(V1,V2),~factor(.x,levels=sort(unique)(c(V1,V2 k k k)]>%xtabs(~V1+V2,)
应该可以工作。
unique_vals <- sort(union(df$V1, df$V2))
co_mat <- function(x, y) +(any(df$V1 == x & df$V2 == y | 
                                df$V2 == x & df$V1 == y))
mat <- outer(unique_vals, unique_vals, Vectorize(co_mat))
dimnames(mat) <- list(unique_vals, unique_vals)
mat

#        China England Greece
#China       0       0      1
#England     0       1      1
#Greece      1       1      0