Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/ant/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 子集匹配变量数为k的数据帧_R_Variables_Subset_Matching - Fatal编程技术网

R 子集匹配变量数为k的数据帧

R 子集匹配变量数为k的数据帧,r,variables,subset,matching,R,Variables,Subset,Matching,例如,我想在匹配变量的数量等于一个数字的条件下对数据帧进行子集 example <- rbind(sample(letters[1:5]), sample(letters[1:5]), sample(letters[1:5]), sample(letters[1:5]), sample(letters[1:5])) example [,1] [,2] [,3] [,4]

例如,我想在匹配变量的数量等于一个数字的条件下对数据帧进行子集

example <- rbind(sample(letters[1:5]),
             sample(letters[1:5]),
             sample(letters[1:5]),
             sample(letters[1:5]),
             sample(letters[1:5]))


example

     [,1] [,2] [,3] [,4] [,5]
[1,] "b"  "a"  "d"  "e"  "c" 
[2,] "e"  "c"  "a"  "d"  "b" 
[3,] "c"  "a"  "d"  "b"  "e" 
[4,] "b"  "d"  "e"  "c"  "a" 
[5,] "b"  "c"  "e"  "d"  "a"

示例创建可复制数据

set.seed(47)
example <- rbind(sample(letters[1:5]),
                 sample(letters[1:5]),
                 sample(letters[1:5]),
                 sample(letters[1:5]),
                 sample(letters[1:5]))

example
#    [,1] [,2] [,3] [,4] [,5]
#[1,] "e"  "b"  "c"  "d"  "a" 
#[2,] "d"  "b"  "e"  "c"  "a" 
#[3,] "a"  "c"  "e"  "b"  "d" 
#[4,] "e"  "b"  "a"  "c"  "d" 
#[5,] "a"  "c"  "b"  "e"  "d" 
set.seed(47)
例1,]
#    [,1] [,2] [,3] [,4] [,5]
#[1,]a“c”e“b”d
#[2,]a“c”b“e”d

在这里,我们将每一行与每一行进行元素比较,如果它等于或大于阈值(
n
),则计算相等比较的次数。另一个循环是过滤掉与自身相等的行

另一种方法是使用
combn
两次,第一次枚举对,第二次执行成对比较

以ronak shah为例

combn(seq_len(nrow(example)), 2)[, combn(seq_len(nrow(example)), 2,
                                 FUN=function(x) sum(example[x[1],] == example[x[2],]) >= 3)]
[1] 3 5
它指示要保留的行

这通常会返回一个矩阵,行集可能会重复。例如,将阈值设置为2,我们得到

     [,1] [,2] [,3] [,4]
[1,]    1    1    2    3
[2,]    2    4    4    5
要将其转化为有用的内容,请使用
c
将结果转化为向量,然后使用
unique
删除重复的行。当我们这样做的时候,我们不妨将整个过程包装成一个函数,该函数允许选择阈值

rowKeeper <- function(myMat, thresh) {
   myMat[unique(c(combn(seq_len(nrow(myMat)), 2)[,
         combn(seq_len(nrow(example)), 2,
               FUN=function(x) sum(myMat[x[1],] == myMat[x[2],]) >= thresh)])),]
}

应将矩阵中的每一行与矩阵中的每一行进行比较,如果它与任何行的阈值(此处为3)相匹配,则将选择这两行?确切地说,应选择具有3个公共元素的行。
rowKeeper <- function(myMat, thresh) {
   myMat[unique(c(combn(seq_len(nrow(myMat)), 2)[,
         combn(seq_len(nrow(example)), 2,
               FUN=function(x) sum(myMat[x[1],] == myMat[x[2],]) >= thresh)])),]
}
rowKeeper(example, 3)
     [,1] [,2] [,3] [,4] [,5]
[1,] "a"  "c"  "e"  "b"  "d" 
[2,] "a"  "c"  "b"  "e"  "d"