R 计算所有可能的交叉点_R_Set_Intersection

R 计算所有可能的交叉点

R 计算所有可能的交叉点,r,set,intersection,R,Set,Intersection,我们有一个数据框，一列表示类别，一列表示离散值。我们希望获得所有类别组合的所有可能交点（公共值的数量）我想出了以下代码。然而，外面有没有更短的东西？我相信有一种更好的方法可以做到这一点，一个专门的函数可以做到这一点。当然，下面的代码可以缩短，例如使用purrr:map，但这不是我的问题 ## prepare an example data set df <- data.frame(category=rep(LETTERS[1:5], each=20),

我们有一个数据框，一列表示类别，一列表示离散值。我们希望获得所有类别组合的所有可能交点（公共值的数量）

我想出了以下代码。然而，外面有没有更短的东西？我相信有一种更好的方法可以做到这一点，一个专门的函数可以做到这一点。当然，下面的代码可以缩短，例如使用

purrr:map

，但这不是我的问题

## prepare an example data set
df <- data.frame(category=rep(LETTERS[1:5], each=20),
                 value=sample(letters[1:10], 100, replace=T))

cats <- unique(df$category)
n <- length(cats)

## all combinations of 1...n unique elements from category
combinations <- lapply(1:n, function(i) combn(cats, i, simplify=FALSE))
combinations <- unlist(combinations, recursive=FALSE)
names(combinations) <- sapply(combinations, paste0, collapse="")

## for each combination of categories, get the values which belong
## to this category
intersections <- lapply(combinations, 
          function(co) 
             lapply(co, function(.x) df$value[ df$category == .x ]))
intersections <- lapply(intersections, 
    function(.x) Reduce(intersect, .x))
intersections <- sapply(intersections, length)

问题：有没有一种方法可以在不太模糊的情况下获得相同的结果？

这里有一种可能的方法，可以使用

data.table

来转换data.frame和

model.matrix

来计算高阶交互：

通过将行中类别之间的所有匹配值分组，转换为宽格式（对于

dcast

语法，归功于@chinsoon12）

识别与

model.matrix

的所有高阶交互作用，并在列上求和

库（data.table）
df_全价值A类B类C类D类E
#>1:11
#>2:21101
#>3:3001010
#>4:B101
#>5:B210101
#>6:C1
colSums（model.matrix（~（A+B+C+D+E）^5，data=df_-wide））[-1]
#>A B C D E A:B A:C
#>        20        20        20        20        20        13        11 
#>A:DA:EB:CB:DB:EC:DC:
#>        12        12        11        13        13        11        13 
#>D:EA:B:CA:B:DA:B:EA:C:DA:C:C:EA:D:E
#>        10         8         9         9         7         9         7 
#>B:C:DB:C:EB:D:EC:D:EA:B:C:DA:B:C:EA:B:C:E:B:D:E
#>         8         9         7         8         5         7         5 
#>A:C:D:E B:C:D:E A:B:C:D:E
#>         5         6         4

数据

set.seed（1）
df
> intersections
    A     B     C     D     E    AB    AC    AD    AE    BC 
   20    20    20    20    20    10     8     8     9     8 
   BD    BE    CD    CE    DE   ABC   ABD   ABE   ACD   ACE 
    8     9     7     8     8     8     8     9     7     8 
  ADE   BCD   BCE   BDE   CDE  ABCD  ABCE  ABDE  ACDE  BCDE 
    8     7     8     8     7     7     8     8     7     7 
ABCDE 
    7