R从不允许相邻元素的向量进行采样_R_Combinations_Permutation_Montecarlo

R从不允许相邻元素的向量进行采样

R从不允许相邻元素的向量进行采样,r,combinations,permutation,montecarlo,R,Combinations,Permutation,Montecarlo,假设我被允许沿着一个5长度的向量分布100%的重量。但是，我不能将权重放入两个相邻的值中，并且任何值都不能超过50% 比如说, [0, .5, 0, 0, .5] is good [.5, .5, 0, 0,0] is not good [.2, 0, .2, 0, .6] is good [.2, 0, .2, .2, .2] is not good 我想生成10000个这样的向量，从中运行蒙特卡罗模拟我想我可以用expand.grid来实现这一点，但我不太确定如何实现我可以随机生成一个

假设我被允许沿着一个5长度的向量分布100%的重量。但是，我不能将权重放入两个相邻的值中，并且任何值都不能超过50%

比如说,

[0, .5, 0, 0, .5] is good
[.5, .5, 0, 0,0] is not good
[.2, 0, .2, 0, .6] is good
[.2, 0, .2, .2, .2] is not good

我想生成10000个这样的向量，从中运行蒙特卡罗模拟

我想我可以用

expand.grid

来实现这一点，但我不太确定如何实现

我可以随机生成一个，然后：

nonzero_weights = which(starting_weights>0)
grid_positions = expand.grid(startingPos = nonzero_weights, endingPos = nonzero_weights)

然后做一些过滤和删除，但这看起来很混乱。如果我不需要它们，为什么要生成呢。有没有更干净的方法可以做到这一点？

首先，您可以通过删除上一个样本中的采样索引来生成二进制样本。然后生成要分配给这些二元样本的权重：

idx <- 1:11

system.time(
    binsampl <- t(replicate(10000L, {
        x <- rep(0L, length(idx))
        while(length(idx) > 0L) {
            chosen <- if (length(idx) > 1L) sample(idx, 1L) else idx
            idx <- setdiff(idx, chosen + -1L:1L)
            x[chosen] <- 1L
        }
        x
    }))
)

system.time(
    weights <- t(apply(binsampl, 1, function(s) {
        y <- runif(sum(s))
        s[s==1L] <- y/sum(y) 
        s
    }))
)
head(weights)

使用R-3.5.1 Windows x64 8GB RAM 2.8GHz处理器在我的机器上生成10k样本需要不到1s的时间。

如果我们没有邻接限制，使用

中当前可用的工具，这个问题就不会那么困难了（请参阅更多信息）。有了邻接限制，我们必须做更多的工作才能得到我们想要的结果

我们首先注意到，由于我们不能在一个有n列的向量行中有2个连续的数字（OP在注释中澄清了它们需要n=11，因此我们将使用这个作为我们的测试用例），因此具有值的最大列数等于

11-楼层（11/2）=6

。当值出现在列

1 3 5 7 9 11

中时，会发生这种情况。我们还应该注意，由于最大值的上限为0.5，并且我们需要行的总和为1，因此具有值的最小列数等于2，因为

上限（1/0.5）=2

。有了这些信息，我们就可以开始攻击了

我们首先生成11的每个组合，选择2到6。然后我们筛选出违反邻接限制的组合。后一部分可以通过获取每行的

diff

并检查结果差异是否等于1来轻松实现。注意（注意：我们使用

RcppAlgos

（我是作者）进行所有计算）：

现在，我们用所需的列数创建一个矩阵，并使用上面的

myCombs

填充所有可能的组合，以确保满足邻接要求

myCombMat <- matrix(0L, nrow = sum(groupLen * combLen), ncol = vecLen)
s <- g <- 1L
e <- combRow <- nrow(combSumOne[[1L]])

for (a in myCombs[-numCombs]) {
    for (i in 1:nrow(a)) {
        myCombMat[s:e, a[i, ]] <- combSumOne[[g]]
        s <- e + 1L
        e <- e + combRow
    }
    e <- e - combRow
    g <- g + 1L
    combRow <- nrow(combSumOne[[g]])
    e <- e + combRow
}

## the last element in myCombs is simply a
## vector, thus nrow would return NULL
myCombMat[s:e, myCombs[[numCombs]]] <- combSumOne[[g]]

并使用这些来创建所有可能排列的矩阵，这些排列总和为1，并满足邻接要求：

myPermMat <- matrix(0L, nrow = sum(groupLenPerm * combLen), ncol = vecLen)
s <- g <- 1L
e <- permRow <- nrow(permSumOne[[1L]])

for (a in myCombs[-numCombs]) {
    for (i in 1:nrow(a)) {
        myPermMat[s:e, a[i, ]] <- permSumOne[[g]]
        s <- e + 1L
        e <- e + permRow
    }
    e <- e - permRow
    g <- g + 1L
    permRow <- nrow(permSumOne[[g]])
    e <- e + permRow
}

## the last element in myCombs is simply a
## vector, thus nrow would return NULL
myPermMat[s:e, myCombs[[numCombs]]] <- permSumOne[[g]]

并且，正如OP所述，如果我们想随机挑选其中10000个，我们可以使用

sample

来实现这一点：

set.seed(101)
mySamp10000 <- sample(nrow(myPermMat), 10000)
myMat10000 <- myPermMat[mySamp10000, ]
rownames(myMat10000) <- mySamp10000

head(myMat10000)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
47897 0.00  0.0 0.00 0.50  0.0 0.25  0.0 0.00 0.05   0.0  0.20
5640  0.25  0.0 0.15 0.00  0.1 0.00  0.5 0.00 0.00   0.0  0.00
91325 0.10  0.0 0.00 0.15  0.0 0.40  0.0 0.00 0.20   0.0  0.15
84633 0.15  0.0 0.00 0.35  0.0 0.30  0.0 0.10 0.00   0.1  0.00
32152 0.00  0.4 0.00 0.05  0.0 0.00  0.0 0.25 0.00   0.3  0.00
38612 0.00  0.4 0.00 0.00  0.0 0.35  0.0 0.10 0.00   0.0  0.15

set.seed（101）
mysamp1000如果控制相邻值可以是什么，那么这难道不意味着这些值不是真正随机的吗？你为什么有邻接要求？@Tim，没错。例如，如果该向量表示重叠月份，（1-2月）与（2-3月）。。。。我不希望有任何重叠。在这个特殊问题中，非零权重的可能性不多：（1,3,5），（1,3），（1,4），（1,5），（2,4），（2,5），（3,5）。因此，您可以从这7个选项中随机选择一个。如果不是第一个，则两个权重必须为50%。只有第一个场景（1,3,5）需要过滤和删除。@papgeo——这将如何推广到11长度向量？这就是我要面对的。我想知道我是否能写出一个递归函数来找到所有这些位置。@JosephWood:是的，这就是我需要的。谢谢不错的套餐顺便说一句：-）。
myCombMat <- matrix(0L, nrow = sum(groupLen * combLen), ncol = vecLen)
s <- g <- 1L
e <- combRow <- nrow(combSumOne[[1L]])

for (a in myCombs[-numCombs]) {
    for (i in 1:nrow(a)) {
        myCombMat[s:e, a[i, ]] <- combSumOne[[g]]
        s <- e + 1L
        e <- e + combRow
    }
    e <- e - combRow
    g <- g + 1L
    combRow <- nrow(combSumOne[[g]])
    e <- e + combRow
}

## the last element in myCombs is simply a
## vector, thus nrow would return NULL
myCombMat[s:e, myCombs[[numCombs]]] <- combSumOne[[g]]

head(myCombMat)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]  0.5    0  0.5  0.0  0.0  0.0  0.0  0.0    0     0     0
[2,]  0.5    0  0.0  0.5  0.0  0.0  0.0  0.0    0     0     0
[3,]  0.5    0  0.0  0.0  0.5  0.0  0.0  0.0    0     0     0
[4,]  0.5    0  0.0  0.0  0.0  0.5  0.0  0.0    0     0     0
[5,]  0.5    0  0.0  0.0  0.0  0.0  0.5  0.0    0     0     0
[6,]  0.5    0  0.0  0.0  0.0  0.0  0.0  0.5    0     0     0

tail(myCombMat)
        [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[5466,] 0.10    0 0.10    0 0.20    0 0.20    0 0.20     0  0.20
[5467,] 0.10    0 0.15    0 0.15    0 0.15    0 0.15     0  0.30
[5468,] 0.10    0 0.15    0 0.15    0 0.15    0 0.20     0  0.25
[5469,] 0.10    0 0.15    0 0.15    0 0.20    0 0.20     0  0.20
[5470,] 0.15    0 0.15    0 0.15    0 0.15    0 0.15     0  0.25
[5471,] 0.15    0 0.15    0 0.15    0 0.15    0 0.20     0  0.20

set.seed(42)
mySamp <- sample(nrow(myCombMat), 10)
sampMat <- myCombMat[mySamp, ]
rownames(sampMat) <- mySamp

sampMat
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
5005 0.00 0.05 0.00 0.05 0.00 0.15 0.00 0.35 0.00   0.4  0.00
5126 0.00 0.15 0.00 0.15 0.00 0.20 0.00 0.20 0.00   0.0  0.30
1565 0.10 0.00 0.15 0.00 0.00 0.00 0.25 0.00 0.00   0.5  0.00
4541 0.05 0.00 0.05 0.00 0.00 0.15 0.00 0.00 0.25   0.0  0.50
3509 0.00 0.00 0.15 0.00 0.25 0.00 0.25 0.00 0.00   0.0  0.35
2838 0.00 0.10 0.00 0.15 0.00 0.00 0.35 0.00 0.00   0.0  0.40
4026 0.05 0.00 0.10 0.00 0.15 0.00 0.20 0.00 0.50   0.0  0.00
736  0.00 0.00 0.10 0.00 0.40 0.00 0.00 0.00 0.00   0.0  0.50
3590 0.00 0.00 0.15 0.00 0.20 0.00 0.00 0.30 0.00   0.0  0.35
3852 0.00 0.00 0.00 0.05 0.00 0.20 0.00 0.30 0.00   0.0  0.45

all(rowSums(myCombMat) == 1)
[1] TRUE

permSumOne <- lapply(lowComb:highComb, function(x) {
    permuteGeneral(seq(5L,50L,5L), x, TRUE, 
                   constraintFun = "sum", 
                   comparisonFun = "==", 
                   limitConstraints = 100L) / 100
})

groupLenPerm <- sapply(permSumOne, nrow)
groupLenPerm
[1]     1    63   633  3246 10872

myPermMat <- matrix(0L, nrow = sum(groupLenPerm * combLen), ncol = vecLen)
s <- g <- 1L
e <- permRow <- nrow(permSumOne[[1L]])

for (a in myCombs[-numCombs]) {
    for (i in 1:nrow(a)) {
        myPermMat[s:e, a[i, ]] <- permSumOne[[g]]
        s <- e + 1L
        e <- e + permRow
    }
    e <- e - permRow
    g <- g + 1L
    permRow <- nrow(permSumOne[[g]])
    e <- e + permRow
}

## the last element in myCombs is simply a
## vector, thus nrow would return NULL
myPermMat[s:e, myCombs[[numCombs]]] <- permSumOne[[g]]

head(myPermMat)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]  0.5    0  0.5  0.0  0.0  0.0  0.0  0.0    0     0     0
[2,]  0.5    0  0.0  0.5  0.0  0.0  0.0  0.0    0     0     0
[3,]  0.5    0  0.0  0.0  0.5  0.0  0.0  0.0    0     0     0
[4,]  0.5    0  0.0  0.0  0.0  0.5  0.0  0.0    0     0     0
[5,]  0.5    0  0.0  0.0  0.0  0.0  0.5  0.0    0     0     0
[6,]  0.5    0  0.0  0.0  0.0  0.0  0.0  0.5    0     0     0

tail(myPermMat)
          [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[128680,] 0.15    0 0.20    0 0.20    0 0.15    0 0.15     0  0.15
[128681,] 0.20    0 0.15    0 0.15    0 0.15    0 0.15     0  0.20
[128682,] 0.20    0 0.15    0 0.15    0 0.15    0 0.20     0  0.15
[128683,] 0.20    0 0.15    0 0.15    0 0.20    0 0.15     0  0.15
[128684,] 0.20    0 0.15    0 0.20    0 0.15    0 0.15     0  0.15
[128685,] 0.20    0 0.20    0 0.15    0 0.15    0 0.15     0  0.15

all(rowSums(myPermMat) == 1)
[1] TRUE

set.seed(101)
mySamp10000 <- sample(nrow(myPermMat), 10000)
myMat10000 <- myPermMat[mySamp10000, ]
rownames(myMat10000) <- mySamp10000

head(myMat10000)
      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
47897 0.00  0.0 0.00 0.50  0.0 0.25  0.0 0.00 0.05   0.0  0.20
5640  0.25  0.0 0.15 0.00  0.1 0.00  0.5 0.00 0.00   0.0  0.00
91325 0.10  0.0 0.00 0.15  0.0 0.40  0.0 0.00 0.20   0.0  0.15
84633 0.15  0.0 0.00 0.35  0.0 0.30  0.0 0.10 0.00   0.1  0.00
32152 0.00  0.4 0.00 0.05  0.0 0.00  0.0 0.25 0.00   0.3  0.00
38612 0.00  0.4 0.00 0.00  0.0 0.35  0.0 0.10 0.00   0.0  0.15