R基于真/假函数对数据帧中的行对进行分组_R

R基于真/假函数对数据帧中的行对进行分组

R基于真/假函数对数据帧中的行对进行分组,r,R,我需要使用自定义函数作为分组条件，从数据帧创建行组。该函数将比较两对行，如果这些行应分组在一起，则返回true/false 在以下示例数据集中： id field code1 code2 1 textField1 055 066 2 textField2 100 120 3 textField3 300 350 4 textField4 800 450 5 textField5 460 900 6

我需要使用自定义函数作为分组条件，从数据帧创建行组。该函数将比较两对行，如果这些行应分组在一起，则返回true/false

在以下示例数据集中：

id   field        code1  code2
1    textField1   055    066
2    textField2   100    120
3    textField3   300    350
4    textField4   800    450
5    textField5   460    900
6    textField6   490    700

                         ...

该函数成对检查行字段之间的某些规则（函数（row1，row2）），如果这些行应该在一起，则返回TRUE/FALSE

我需要将该函数应用于数据帧中的所有可能对，并生成一个列表（或其他结构），其中包含匹配到一起的所有ID

将函数应用于每对的一种方法如下所示：

但我想不出一种方法来对结果为真的行进行分组

编辑：重读我的问题，似乎需要一个例子：

如果我们创建一个包含所有可能组合的矩阵，结果将是：

      [,1]   [,2]   [,3]   [,4]   [,5]   [,6]
[1,]  TRUE   FALSE  FALSE  FALSE  FALSE  FALSE
[2,]  FALSE  TRUE   TRUE   TRUE   FALSE  FALSE
[3,]  FALSE  TRUE   TRUE   FALSE  FALSE  FALSE
[4,]  FALSE  TRUE   FALSE  TRUE   FALSE  FALSE
[5,]  FALSE  FALSE  FALSE  FALSE  TRUE   TRUE
[6,]  FALSE  FALSE  FALSE  FALSE  TRUE   TRUE

由此产生的群体将是：

1
2,3,4
5,6

下面是一个函数，它执行您指定的操作：

mx <- matrix(c( TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,
FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,
FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,
FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,
FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,
FALSE,FALSE,FALSE,FALSE,TRUE,TRUE),6)


groupings <- function(mx){

    out <- list()
    while(dim(mx)[1]){
        # get the groups that match the first column
        g = which(mx[,1])

        # expand the selection to any columns for which 
        # there is match in the first row
        (expansion = which(apply(cbind(mx[,g]),1,any)))
        while(length(expansion) > length(g)){
            g = expansion

            # expand the selection to any columns for which 
            # there is match to the current group
            expansion = which(apply(cbind(mx[,g]),1,any))
        }

        out <- c(out,list(g))
        mx <- mx[-g,-g]
    }
    return(out)

}

groupings(mx)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1 2 3
#> 
#> [[3]]
#> [1] 1 2

mx[[2]]
#> [1] 1 2 3
#> 
#> [[3]]
#> [1] 1 2

不清楚是要将所有

TRUE

s和所有

FALSE

s聚合在一起，还是要将连续的

TRUE

s和

FALSE

聚合在一起。例如，如果您得到

T，T，T，F，F，T，T

，那么输出将是

1,2,3

，然后是

4,5

，然后是

6,7

？或者你只是

1,2,3,6,7

4,5

？或者第三种选择？用另一种方法改变了这个例子。这一条是否更好地解释了我试图完成的任务？您将如何从初始数据集高效地创建矩阵？的公认答案讨论了一个类似的问题（答案仅限于上三角形），并且非常彻底。

mx <- matrix(c( TRUE,FALSE,FALSE,FALSE,FALSE,FALSE,
FALSE,TRUE,TRUE,TRUE,FALSE,FALSE,
FALSE,TRUE,TRUE,FALSE,FALSE,FALSE,
FALSE,TRUE,FALSE,TRUE,FALSE,FALSE,
FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,
FALSE,FALSE,FALSE,FALSE,TRUE,TRUE),6)


groupings <- function(mx){

    out <- list()
    while(dim(mx)[1]){
        # get the groups that match the first column
        g = which(mx[,1])

        # expand the selection to any columns for which 
        # there is match in the first row
        (expansion = which(apply(cbind(mx[,g]),1,any)))
        while(length(expansion) > length(g)){
            g = expansion

            # expand the selection to any columns for which 
            # there is match to the current group
            expansion = which(apply(cbind(mx[,g]),1,any))
        }

        out <- c(out,list(g))
        mx <- mx[-g,-g]
    }
    return(out)

}

groupings(mx)
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 1 2 3
#> 
#> [[3]]
#> [1] 1 2