R 如何将combn（）并行化？_R_Parallel Processing_Combinations_Combinatorics

R 如何将combn（）并行化？

r parallel-processing

R 如何将combn（）并行化？,r,parallel-processing,combinations,combinatorics,R,Parallel Processing,Combinations,Combinatorics,函数combn（）一次生成x和m元素的所有组合。对于较小的nCm（其中n是x的元素数），它非常快速有效，但很快就会耗尽内存。例如： > combn(c(1:50), 12, simplify = TRUE) Error in matrix(r, nrow = len.r, ncol = count) : invalid 'ncol' value (too large or NA) 我想知道函数combn（）是否可以修改为只生成k个选择的组合。让我们调用这个新函数chosencombn（

函数combn（）一次生成x和m元素的所有组合。对于较小的nCm（其中n是x的元素数），它非常快速有效，但很快就会耗尽内存。例如：

> combn(c(1:50), 12, simplify = TRUE)
Error in matrix(r, nrow = len.r, ncol = count) : 
invalid 'ncol' value (too large or NA)

我想知道函数combn（）是否可以修改为只生成k个选择的组合。让我们调用这个新函数chosencombn（）。那么我们就可以：

> combn(c("a", "b", "c", "d"), m=2)
     [,1] [,2] [,3] [,4] [,5] [,6]
 [1,] "a"  "a"  "a"  "b"  "b"  "c" 
 [2,] "b"  "c"  "d"  "c"  "d"  "d" 

>chosencombn(c("a", "b", "c", "d"), m=2, i=c(1,4,6))
     [,1] [,2] [,3]
 [1,] "a"  "b"  "c" 
 [2,] "b"  "c"  "d"

>chosencombn(c("a", "b", "c", "d"), m=2, i=c(4,5))
     [,1] [,2]
 [1,] "b"  "b" 
 [2,] "c"  "d"

我理解，这样一个函数需要使用组合的顺序，以便可以立即找到给定组合的位置。

是否存在这种排序？是否可以对其进行编码以获得与combn（）一样高效的函数？

要了解

combn

如何对其输出排序，让我们看看

combn（1:5，3）

的输出：

这里有很多结构。首先，向下时所有列都是按顺序排列的，第一行是非递减的。以1开头的列下面有

combn（2:5,2）

；以2开头的列下面有

combn（3:5,2）

；等等

现在让我们考虑如何构造第8列。我将采用的重建方法是确定该列的第一个元素（由于上述关系，有

choose（4，2）=6个columns以1开头，choose（3，2）=3个columns以2开头，以及choose（2，2）=1个columns以3开头）。在本例中，我们确定以2开头，因为第7-9列必须以2开头
为了确定该列的第二个和后续元素，我们使用更少的元素（因为2是我们的第一个元素，所以我们现在从元素3-5中进行选择）、一个新位置（我们选择以2开头的列编号8-6=2）和一个要选择的剩余元素的新数量（我们需要3-1=2个元素）
getcombn
下面是一个迭代公式，它可以做到这一点：
getcombn <- function(x, m, pos) {
  combo <- rep(NA, m)
  start <- 1
  for (i in seq_len(m-1)) {
    end.pos <- cumsum(choose((length(x)-start):(m-i), m-i))
    selection <- which.max(end.pos >= pos)
    start <- start + selection
    combo[i] <- x[start - 1]
    pos <- pos - c(0, end.pos)[selection]
  }
  combo[m] <- x[start + pos - 1]
  combo
}

chosencombn <- function(x, m, all.pos) {
  sapply(all.pos, function(pos) getcombn(x, m, pos))
}
chosencombn(c("a", "b", "c", "d"), 2, c(1,4,6))
#     [,1] [,2] [,3]
# [1,] "a"  "b"  "c" 
# [2,] "b"  "c"  "d" 
chosencombn(c("a", "b", "c", "d"), 2, c(4,5))
#     [,1] [,2]
# [1,] "b"  "b" 
# [2,] "c"  "d" 

包对此很有用，因为它不会将排列保留在内存中
library(trotter)

combs = cpv(2, c("a", "b", "c", "d"))
sapply(c(1, 4, 6), function(i) combs[i])
#     [,1] [,2] [,3]
#[1,] "a"  "b"  "c" 
#[2,] "b"  "c"  "d"

chosencombn(1:50, 25, c(1, 1000000L, 1e14))
#       [,1] [,2] [,3]
#  [1,]    1    1    3
#  [2,]    2    2    4
#  [3,]    3    3    6
#  [4,]    4    4    7
#  [5,]    5    5    8
#  [6,]    6    6   11
#  [7,]    7    7   14
#  [8,]    8    8   15
#  [9,]    9    9   17
# [10,]   10   10   20
# [11,]   11   11   22
# [12,]   12   12   25
# [13,]   13   13   27
# [14,]   14   14   30
# [15,]   15   15   31
# [16,]   16   16   32
# [17,]   17   17   33
# [18,]   18   18   36
# [19,]   19   20   37
# [20,]   20   23   39
# [21,]   21   27   40
# [22,]   22   39   42
# [23,]   23   42   47
# [24,]   24   45   48
# [25,]   25   49   50

library(trotter)

combs = cpv(2, c("a", "b", "c", "d"))
sapply(c(1, 4, 6), function(i) combs[i])
#     [,1] [,2] [,3]
#[1,] "a"  "b"  "c" 
#[2,] "b"  "c"  "d"