R 对数据帧中每两列之间的字符串进行计数以查找重叠

R 对数据帧中每两列之间的字符串进行计数以查找重叠,r,string,count,R,String,Count,我有一个数据框“dat”,它有多个不同“年份”的列,每个列的值都是“首都” 当我在没有循环的情况下执行此操作时,我看到在有空间的地方,它也会计算它们(如下所示): 有人能帮我循环一下这个,确保它不算空格吗 提前感谢您的帮助。在长度相同后,获取数据集列名和数据集值的表。然后,使用crossprod并将矩阵输出更改为'long'格式data.frame,使用as.data.frame v1 <- unlist(dat) i1 <- v1 != "" out <- as.data.f

我有一个数据框“dat”,它有多个不同“年份”的列,每个列的值都是“首都”

当我在没有循环的情况下执行此操作时,我看到在有空间的地方,它也会计算它们(如下所示):

有人能帮我循环一下这个,确保它不算空格吗


提前感谢您的帮助。

在长度相同后,获取数据集列名和数据集值的
表。然后,使用
crossprod
并将
矩阵
输出更改为'long'格式
data.frame
,使用
as.data.frame

v1 <- unlist(dat)
i1 <- v1 != ""
out <- as.data.frame.table(tcrossprod(table(colnames(dat)[col(dat)][i1], 
                  v1[i1])))[c(2, 1, 3)]
names(out)[1:2] <- paste0("Var", 1:2)
head(out, 5)
#   Var1  Var2 Freq
#1 Year1 Year1   10
#2 Year1 Year2    1
#3 Year1 Year3    2
#4 Year1 Year4    0
#5 Year1 Year5    0

v1是那些缺少的值
NA
或空白的
”?它们只是空白。因为如果我包括NAs,计数函数也可以在计数中考虑NAs,在任何2个变量之间。
Year1   Year1   10
Year1   Year2   1
Year1   Year3   2
Year1   Year4   0
Year1   Year5   0
Year1   Year6   1
Year2   Year1   1
Year2   Year2   8
Year2   Year3   2
Etc……       
> sum (dat$Year1 %in% dat$Year1)
[1] 10
> sum (dat$Year1 %in% dat$Year2)
[1] 1
> sum (dat$Year1 %in% dat$Year3)
[1] 2
> sum (dat$Year1 %in% dat$Year4)
[1] 0
> sum (dat$Year1 %in% dat$Year5)
[1] 0
> sum (dat$Year1 %in% dat$Year6)
[1] 1
> sum (dat$Year2 %in% dat$Year1)
[1] 1
> sum (dat$Year2 %in% dat$Year2)
[1] 10
> sum (dat$Year2 %in% dat$Year3) ## counts spaces
[1] 4 
v1 <- unlist(dat)
i1 <- v1 != ""
out <- as.data.frame.table(tcrossprod(table(colnames(dat)[col(dat)][i1], 
                  v1[i1])))[c(2, 1, 3)]
names(out)[1:2] <- paste0("Var", 1:2)
head(out, 5)
#   Var1  Var2 Freq
#1 Year1 Year1   10
#2 Year1 Year2    1
#3 Year1 Year3    2
#4 Year1 Year4    0
#5 Year1 Year5    0
dat <- structure(list(Year1 = c("Berlin", "Beijing", "Paris", "Tokyo", 
"Oslo", "Bern", "London", "Taipei", "Dhaka", "Kabul"), Year2 = c("Victoria", 
"Lima", "Oslo", "Rome", "Dublin", "Asmara", "Cairo", "Brasilia", 
"", ""), Year3 = c("Athens", "Berlin", "Dublin", "Victoria", 
"London", "Malabo", "", "", "", ""), Year4 = c("Manama", "Cairo", 
"Dublin", "Belmopan", "Moroni", "Algiers", "", "", "", ""), Year5 = c("Brussels", 
"Vienna", "Algiers", "Luanda", "Rome", "", "", "", "", ""), Year6 = c("Vienna", 
"Asmara", "Athens", "Paris", "", "", "", "", "", "")), .Names = c("Year1", 
"Year2", "Year3", "Year4", "Year5", "Year6"), 
 class = "data.frame", row.names = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10"))