R 如何筛选出在不同(未指定)列中包含两个特定字符串的行?
给定一组8个名称,我想生成5个名称的所有唯一组合。但是,某些名称可能不会同时出现 例如,给定下面的示例数据,如何过滤掉组合了“linda”和“susy”的行R 如何筛选出在不同(未指定)列中包含两个特定字符串的行?,r,subset,R,Subset,给定一组8个名称,我想生成5个名称的所有唯一组合。但是,某些名称可能不会同时出现 例如,给定下面的示例数据,如何过滤掉组合了“linda”和“susy”的行 # 7 names names <- c("joe", "mark", "mary", "john", "linda", "susy", "peter", "annie") # All unique combinations of 5 names cbn <- t(combn(names, 5)) #7个名称 names这里
# 7 names
names <- c("joe", "mark", "mary", "john", "linda", "susy", "peter", "annie")
# All unique combinations of 5 names
cbn <- t(combn(names, 5))
#7个名称
names这里有一个带两个参数的小函数:data
和x
,它完成了您要查找的内容
f <- function(data, x) {
data[rowSums(`dim<-`(data %in% x, dim(data))) < length(x), ]
}
结果
# [,1] [,2] [,3] [,4] [,5]
# [1,] "joe" "mark" "mary" "john" "linda"
# [2,] "joe" "mark" "mary" "john" "susy"
# [3,] "joe" "mark" "mary" "john" "peter"
# [4,] "joe" "mark" "mary" "john" "annie"
# [5,] "joe" "mark" "mary" "linda" "peter"
# [6,] "joe" "mark" "mary" "linda" "annie"
# [7,] "joe" "mark" "mary" "susy" "peter"
# [8,] "joe" "mark" "mary" "susy" "annie"
# [9,] "joe" "mark" "mary" "peter" "annie"
#[10,] "joe" "mark" "john" "linda" "peter"
#[11,] "joe" "mark" "john" "linda" "annie"
#[12,] "joe" "mark" "john" "susy" "peter"
#[13,] "joe" "mark" "john" "susy" "annie"
#[14,] "joe" "mark" "john" "peter" "annie"
#[15,] "joe" "mark" "linda" "peter" "annie"
#[16,] "joe" "mark" "susy" "peter" "annie"
#[17,] "joe" "mary" "john" "linda" "peter"
#[18,] "joe" "mary" "john" "linda" "annie"
#[19,] "joe" "mary" "john" "susy" "peter"
#[20,] "joe" "mary" "john" "susy" "annie"
#[21,] "joe" "mary" "john" "peter" "annie"
#[22,] "joe" "mary" "linda" "peter" "annie"
#[23,] "joe" "mary" "susy" "peter" "annie"
#[24,] "joe" "john" "linda" "peter" "annie"
#[25,] "joe" "john" "susy" "peter" "annie"
#[26,] "mark" "mary" "john" "linda" "peter"
#[27,] "mark" "mary" "john" "linda" "annie"
#[28,] "mark" "mary" "john" "susy" "peter"
#[29,] "mark" "mary" "john" "susy" "annie"
#[30,] "mark" "mary" "john" "peter" "annie"
#[31,] "mark" "mary" "linda" "peter" "annie"
#[32,] "mark" "mary" "susy" "peter" "annie"
#[33,] "mark" "john" "linda" "peter" "annie"
#[34,] "mark" "john" "susy" "peter" "annie"
#[35,] "mary" "john" "linda" "peter" "annie"
#[36,] "mary" "john" "susy" "peter" "annie"
cbn
# [,1] [,2] [,3] [,4] [,5]
# [1,] "joe" "mark" "mary" "linda" "peter"
# [2,] "joe" "mark" "mary" "linda" "annie"
# [3,] "joe" "mark" "mary" "susy" "peter"
# [4,] "joe" "mark" "mary" "susy" "annie"
# [5,] "joe" "mark" "mary" "peter" "annie"
# [6,] "joe" "mark" "linda" "peter" "annie"
# [7,] "joe" "mark" "susy" "peter" "annie"
# [8,] "joe" "mary" "linda" "peter" "annie"
# [9,] "joe" "mary" "susy" "peter" "annie"
#[10,] "mark" "mary" "john" "linda" "peter"
#[11,] "mark" "mary" "john" "linda" "annie"
#[12,] "mark" "mary" "john" "susy" "peter"
#[13,] "mark" "mary" "john" "susy" "annie"
#[14,] "mark" "mary" "john" "peter" "annie"
#[15,] "mark" "mary" "linda" "peter" "annie"
#[16,] "mark" "mary" "susy" "peter" "annie"
#[17,] "mark" "john" "linda" "peter" "annie"
#[18,] "mark" "john" "susy" "peter" "annie"
#[19,] "mary" "john" "linda" "peter" "annie"
#[20,] "mary" "john" "susy" "peter" "annie"
当必须迭代检查多个组合时,可以使用for循环
x <- c("linda", "susy")
y <- c("joe", "john")
结果
# [,1] [,2] [,3] [,4] [,5]
# [1,] "joe" "mark" "mary" "john" "linda"
# [2,] "joe" "mark" "mary" "john" "susy"
# [3,] "joe" "mark" "mary" "john" "peter"
# [4,] "joe" "mark" "mary" "john" "annie"
# [5,] "joe" "mark" "mary" "linda" "peter"
# [6,] "joe" "mark" "mary" "linda" "annie"
# [7,] "joe" "mark" "mary" "susy" "peter"
# [8,] "joe" "mark" "mary" "susy" "annie"
# [9,] "joe" "mark" "mary" "peter" "annie"
#[10,] "joe" "mark" "john" "linda" "peter"
#[11,] "joe" "mark" "john" "linda" "annie"
#[12,] "joe" "mark" "john" "susy" "peter"
#[13,] "joe" "mark" "john" "susy" "annie"
#[14,] "joe" "mark" "john" "peter" "annie"
#[15,] "joe" "mark" "linda" "peter" "annie"
#[16,] "joe" "mark" "susy" "peter" "annie"
#[17,] "joe" "mary" "john" "linda" "peter"
#[18,] "joe" "mary" "john" "linda" "annie"
#[19,] "joe" "mary" "john" "susy" "peter"
#[20,] "joe" "mary" "john" "susy" "annie"
#[21,] "joe" "mary" "john" "peter" "annie"
#[22,] "joe" "mary" "linda" "peter" "annie"
#[23,] "joe" "mary" "susy" "peter" "annie"
#[24,] "joe" "john" "linda" "peter" "annie"
#[25,] "joe" "john" "susy" "peter" "annie"
#[26,] "mark" "mary" "john" "linda" "peter"
#[27,] "mark" "mary" "john" "linda" "annie"
#[28,] "mark" "mary" "john" "susy" "peter"
#[29,] "mark" "mary" "john" "susy" "annie"
#[30,] "mark" "mary" "john" "peter" "annie"
#[31,] "mark" "mary" "linda" "peter" "annie"
#[32,] "mark" "mary" "susy" "peter" "annie"
#[33,] "mark" "john" "linda" "peter" "annie"
#[34,] "mark" "john" "susy" "peter" "annie"
#[35,] "mary" "john" "linda" "peter" "annie"
#[36,] "mary" "john" "susy" "peter" "annie"
cbn
# [,1] [,2] [,3] [,4] [,5]
# [1,] "joe" "mark" "mary" "linda" "peter"
# [2,] "joe" "mark" "mary" "linda" "annie"
# [3,] "joe" "mark" "mary" "susy" "peter"
# [4,] "joe" "mark" "mary" "susy" "annie"
# [5,] "joe" "mark" "mary" "peter" "annie"
# [6,] "joe" "mark" "linda" "peter" "annie"
# [7,] "joe" "mark" "susy" "peter" "annie"
# [8,] "joe" "mary" "linda" "peter" "annie"
# [9,] "joe" "mary" "susy" "peter" "annie"
#[10,] "mark" "mary" "john" "linda" "peter"
#[11,] "mark" "mary" "john" "linda" "annie"
#[12,] "mark" "mary" "john" "susy" "peter"
#[13,] "mark" "mary" "john" "susy" "annie"
#[14,] "mark" "mary" "john" "peter" "annie"
#[15,] "mark" "mary" "linda" "peter" "annie"
#[16,] "mark" "mary" "susy" "peter" "annie"
#[17,] "mark" "john" "linda" "peter" "annie"
#[18,] "mark" "john" "susy" "peter" "annie"
#[19,] "mary" "john" "linda" "peter" "annie"
#[20,] "mary" "john" "susy" "peter" "annie"
另一个解决方案:
创建名称和组合
combs <- list(x, y)
names <- c("joe", "mark", "mary", "john", "linda", "susy", "peter", "annie")
cbn <- combn(names, 5)
那真的很好用!如果要排除多个组合(例如“joe”和“john”),我将如何处理?我能想到的唯一一件事就是写连续的函数,但这看起来很笨拙。所以你想要一个返回值,即每个组合的矩阵或data.frame,对吗?我想要一个矩阵或df,它排除所有包含一些已定义组合的行(例如排除linda和susy对,排除joe和john对).@NukeDude更新了我的答案-希望这有帮助。
csums <- colSums((cbn == "linda") + (cbn == "susy"))
csums_2 <- colSums((cbn == "joe") + (cbn == "john"))
cbn <- t(cbn[, csums < 2 & csums_2 <2])
> cbn
[,1] [,2] [,3] [,4] [,5]
[1,] "joe" "mark" "mary" "linda" "peter"
[2,] "joe" "mark" "mary" "linda" "annie"
[3,] "joe" "mark" "mary" "susy" "peter"
[4,] "joe" "mark" "mary" "susy" "annie"
[5,] "joe" "mark" "mary" "peter" "annie"
[6,] "joe" "mark" "linda" "peter" "annie"
[7,] "joe" "mark" "susy" "peter" "annie"
[8,] "joe" "mary" "linda" "peter" "annie"
[9,] "joe" "mary" "susy" "peter" "annie"
[10,] "mark" "mary" "john" "linda" "peter"
[11,] "mark" "mary" "john" "linda" "annie"
[12,] "mark" "mary" "john" "susy" "peter"
[13,] "mark" "mary" "john" "susy" "annie"
[14,] "mark" "mary" "john" "peter" "annie"
[15,] "mark" "mary" "linda" "peter" "annie"
[16,] "mark" "mary" "susy" "peter" "annie"
[17,] "mark" "john" "linda" "peter" "annie"
[18,] "mark" "john" "susy" "peter" "annie"
[19,] "mary" "john" "linda" "peter" "annie"
[20,] "mary" "john" "susy" "peter" "annie"