如何创建在R中一行中同时出现的元素的共现矩阵?
我是R方面的新手,我想创建一个共现矩阵,基于哪些元素在一行中共现 理想结果的基本示例 假设你有这张桌子:如何创建在R中一行中同时出现的元素的共现矩阵?,r,adjacency-matrix,find-occurrences,R,Adjacency Matrix,Find Occurrences,我是R方面的新手,我想创建一个共现矩阵,基于哪些元素在一行中共现 理想结果的基本示例 假设你有这张桌子: df <- data.frame(ID = c(1,2,3), V1 = c("England", "England", "China"), V2 = c("Greece", "England", "Gree
df <- data.frame(ID = c(1,2,3),
V1 = c("England", "England", "China"),
V2 = c("Greece", "England", "Greece")
)
我的共现矩阵应如下所示:
Country China England Greece
China 0 0 1 #China & Greece co-occur in row 3
England 0 1 1 #England & England co-occur in row 2, and England and Greece in row 1
Greece 1 1 0
然而,如果我遵循这一点,我会得到:
library(tidyverse)
df %>%
pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
xtabs(~ID + Country, data = ., sparse = FALSE) %>%
crossprod(., .)
df_diag <- df %>%
pivot_longer(-ID, names_to = "Category", values_to = "Country") %>%
mutate(Country2 = Country) %>%
xtabs(~Country + Country2, data = ., sparse = FALSE) %>%
diag()
diag(df1) <- df_diag
df1
Country China England Greece
China 1 0 1
England 0 3 1
Greece 1 1 2
库(tidyverse)
df%>%
pivot_longer(-ID,name_to=“Category”,value_to=“Country”)%>%
xtabs(~ID+Country,data=,sparse=FALSE)%>%
crossprod(,.)
df_诊断%
pivot_longer(-ID,name_to=“Category”,value_to=“Country”)%>%
突变(Country2=国家)%>%
xtabs(~Country+Country2,数据=,稀疏=假)%>%
diag()
诊断(df1)0和完成。案例(值)]->foo
#获取每个ID组(每行)中的不同值(国家)
唯一(foo,by=c(“ID”,“value”)->foo2
# https://stackoverflow.com/questions/13281303/creating-co-occurrence-matrix
#看到这个问题,您希望使用crossprod()创建一个矩阵。
crossprod(表(foo2[,c(1,3)])->mymat
#最后,您需要更改对角线值。如果一个值等于一,
#把它改为零。否则,保留原始值。
diag(mymat)您需要的是:
在base R中,您可以执行以下操作:
a <- table(lapply(df[-1], factor, levels = sort(unique(unlist(df[-1])))))
a[lower.tri(a)] <- t(a)[lower.tri(a)]
a
V2
V1 China England Greece
China 0 0 1
England 0 1 1
Greece 1 1 0
您需要的是:
在base R中,您可以执行以下操作:
a <- table(lapply(df[-1], factor, levels = sort(unique(unlist(df[-1])))))
a[lower.tri(a)] <- t(a)[lower.tri(a)]
a
V2
V1 China England Greece
China 0 0 1
England 0 1 1
Greece 1 1 0
您可以在基本R中使用outer
:
unique_vals <- sort(union(df$V1, df$V2))
co_mat <- function(x, y) +(any(df$V1 == x & df$V2 == y |
df$V2 == x & df$V1 == y))
mat <- outer(unique_vals, unique_vals, Vectorize(co_mat))
dimnames(mat) <- list(unique_vals, unique_vals)
mat
# China England Greece
#China 0 0 1
#England 0 1 1
#Greece 1 1 0
unique\u vals您可以在基本R中使用outer
:
unique_vals <- sort(union(df$V1, df$V2))
co_mat <- function(x, y) +(any(df$V1 == x & df$V2 == y |
df$V2 == x & df$V1 == y))
mat <- outer(unique_vals, unique_vals, Vectorize(co_mat))
dimnames(mat) <- list(unique_vals, unique_vals)
mat
# China England Greece
#China 0 0 1
#England 0 1 1
#Greece 1 1 0
unique\u vals可能重复:我认为棘手的部分只是确保V1和V2列的计数因子级别相同,但基本上只是df%>%的变异(跨(c(V1,V2),~factor(.x,levels=sort(unique)(c(V1,V2щщщ)%>%xtabs(~V1+V2,)
应该可以工作。可能的重复:我认为棘手的部分是确保V1和V2列的计数因子级别相同,但基本上只需df%>%变异(跨(c(V1,V2),~factor(.x,levels=sort(unique)(c(V1,V2 k k k)]>%xtabs(~V1+V2,)
应该可以工作。
unique_vals <- sort(union(df$V1, df$V2))
co_mat <- function(x, y) +(any(df$V1 == x & df$V2 == y |
df$V2 == x & df$V1 == y))
mat <- outer(unique_vals, unique_vals, Vectorize(co_mat))
dimnames(mat) <- list(unique_vals, unique_vals)
mat
# China England Greece
#China 0 0 1
#England 0 1 1
#Greece 1 1 0