R 对数据帧中每两列之间的字符串进行计数以查找重叠
我有一个数据框“dat”,它有多个不同“年份”的列,每个列的值都是“首都” 当我在没有循环的情况下执行此操作时,我看到在有空间的地方,它也会计算它们(如下所示): 有人能帮我循环一下这个,确保它不算空格吗R 对数据帧中每两列之间的字符串进行计数以查找重叠,r,string,count,R,String,Count,我有一个数据框“dat”,它有多个不同“年份”的列,每个列的值都是“首都” 当我在没有循环的情况下执行此操作时,我看到在有空间的地方,它也会计算它们(如下所示): 有人能帮我循环一下这个,确保它不算空格吗 提前感谢您的帮助。在长度相同后,获取数据集列名和数据集值的表。然后,使用crossprod并将矩阵输出更改为'long'格式data.frame,使用as.data.frame v1 <- unlist(dat) i1 <- v1 != "" out <- as.data.f
提前感谢您的帮助。在长度相同后,获取数据集列名和数据集值的
表。然后,使用crossprod
并将矩阵
输出更改为'long'格式data.frame
,使用as.data.frame
v1 <- unlist(dat)
i1 <- v1 != ""
out <- as.data.frame.table(tcrossprod(table(colnames(dat)[col(dat)][i1],
v1[i1])))[c(2, 1, 3)]
names(out)[1:2] <- paste0("Var", 1:2)
head(out, 5)
# Var1 Var2 Freq
#1 Year1 Year1 10
#2 Year1 Year2 1
#3 Year1 Year3 2
#4 Year1 Year4 0
#5 Year1 Year5 0
v1是那些缺少的值NA
或空白的“
”?它们只是空白。因为如果我包括NAs,计数函数也可以在计数中考虑NAs,在任何2个变量之间。
Year1 Year1 10
Year1 Year2 1
Year1 Year3 2
Year1 Year4 0
Year1 Year5 0
Year1 Year6 1
Year2 Year1 1
Year2 Year2 8
Year2 Year3 2
Etc……
> sum (dat$Year1 %in% dat$Year1)
[1] 10
> sum (dat$Year1 %in% dat$Year2)
[1] 1
> sum (dat$Year1 %in% dat$Year3)
[1] 2
> sum (dat$Year1 %in% dat$Year4)
[1] 0
> sum (dat$Year1 %in% dat$Year5)
[1] 0
> sum (dat$Year1 %in% dat$Year6)
[1] 1
> sum (dat$Year2 %in% dat$Year1)
[1] 1
> sum (dat$Year2 %in% dat$Year2)
[1] 10
> sum (dat$Year2 %in% dat$Year3) ## counts spaces
[1] 4
v1 <- unlist(dat)
i1 <- v1 != ""
out <- as.data.frame.table(tcrossprod(table(colnames(dat)[col(dat)][i1],
v1[i1])))[c(2, 1, 3)]
names(out)[1:2] <- paste0("Var", 1:2)
head(out, 5)
# Var1 Var2 Freq
#1 Year1 Year1 10
#2 Year1 Year2 1
#3 Year1 Year3 2
#4 Year1 Year4 0
#5 Year1 Year5 0
dat <- structure(list(Year1 = c("Berlin", "Beijing", "Paris", "Tokyo",
"Oslo", "Bern", "London", "Taipei", "Dhaka", "Kabul"), Year2 = c("Victoria",
"Lima", "Oslo", "Rome", "Dublin", "Asmara", "Cairo", "Brasilia",
"", ""), Year3 = c("Athens", "Berlin", "Dublin", "Victoria",
"London", "Malabo", "", "", "", ""), Year4 = c("Manama", "Cairo",
"Dublin", "Belmopan", "Moroni", "Algiers", "", "", "", ""), Year5 = c("Brussels",
"Vienna", "Algiers", "Luanda", "Rome", "", "", "", "", ""), Year6 = c("Vienna",
"Asmara", "Athens", "Paris", "", "", "", "", "", "")), .Names = c("Year1",
"Year2", "Year3", "Year4", "Year5", "Year6"),
class = "data.frame", row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))