R 如何在不考虑线程和操作系统数量的情况下，以可复制的方式使用mclappy运行置换？_R

R 如何在不考虑线程和操作系统数量的情况下，以可复制的方式使用mclappy运行置换？

R 如何在不考虑线程和操作系统数量的情况下，以可复制的方式使用mclappy运行置换？,r,R,无论线程和操作系统的数量如何，是否可以使用mclappy以可复制的方式运行一些基于置换的函数？下面是一个玩具的例子。对置换向量的结果列表进行散列只是为了方便比较结果。我尝试了不同的RNGkind（“L'Ecuyer-CMRG”），mc.preschedule和mc.set.seed的不同设置。到目前为止，没有运气使它们完全相同 library("parallel") library("digest") set.seed(1) m <- mclapply(1:10, function(x

无论线程和操作系统的数量如何，是否可以使用mclappy以可复制的方式运行一些基于置换的函数？
下面是一个玩具的例子。对置换向量的结果列表进行散列只是为了方便比较结果。我尝试了不同的

RNGkind

（“L'Ecuyer-CMRG”），

mc.preschedule

和

mc.set.seed

的不同设置。到目前为止，没有运气使它们完全相同

library("parallel")
library("digest")

set.seed(1)
m <- mclapply(1:10, function(x) sample(1:10),
              mc.cores=2, mc.set.seed = F)
digest(m, 'crc32')

set.seed(1)
m <- mclapply(1:10, function(x) sample(1:10),
              mc.cores=4, mc.set.seed = F)
digest(m, 'crc32')

set.seed(1)
m <- mclapply(1:10, function(x) sample(1:10),
              mc.cores=2, mc.set.seed = F)
digest(m, 'crc32')

set.seed(1)
m <- mclapply(1:10, function(x) sample(1:10),
              mc.cores=1, mc.set.seed = F)
digest(m, 'crc32')

set.seed(1)
m <- lapply(1:10, function(x) sample(1:10))
digest(m, 'crc32') # this is equivalent to what I get on Windows.

我提出的一个解决方案是用种子生成一个互补向量

mclappy

或

lappy

迭代指向参数和相应种子的索引。有点像黑客，但很管用

library("parallel")
library("digest")

input <- 1:10

# make random seed vector of length(input).
set.seed(1)
seeds <- sample.int(length(input), replace=TRUE)

f <- function(idx){ 
    # input[i] # do whatever with the input
    set.seed(seeds[idx]) # set to proper seed
    sample(1:10)}

digest(mclapply(seq_along(input), f, mc.cores=2), 'crc32')
digest(mclapply(seq_along(input), f, mc.cores=4), 'crc32')
digest(mclapply(seq_along(input), f, mc.cores=2), 'crc32')
digest(mclapply(seq_along(input), f, mc.cores=1), 'crc32')
digest(lapply(seq_along(input), f), 'crc32')

库（“并行”）
图书馆（“文摘”）
输入另一种方法是首先生成要使用的样本，并对样本调用mclappy：
    library("parallel")
    library("digest")

    input<-1:10
    set.seed(1)
    nsamp<-20
    ## Generate and store all the random samples
    samples<-lapply(1:nsamp, function(x){ sample(input) })

    ## apply the algorithm "diff" on every sample
    ncore0<-  lapply(samples, diff)
    ncore1<-mclapply(samples, diff, mc.cores=1)
    ncore2<-mclapply(samples, diff, mc.cores=2)
    ncore3<-mclapply(samples, diff, mc.cores=3)
    ncore4<-mclapply(samples, diff, mc.cores=4)

    ## all equal
    all.equal(ncore0,ncore1)
    all.equal(ncore0,ncore2)
    all.equal(ncore0,ncore3)
    all.equal(ncore0,ncore4)

库（“并行”）
图书馆（“文摘”）
inputI在工作中遇到了类似的问题。最终，我们的解决方案是基于被引用数据的值在函数中设置种子（因此您可以基于当前值x设置种子）-这不是真正的随机，但是伪随机发生器也不是，我们同样不适合猜测/操纵给定配置的随机数流的条件。@ElizabethAB感谢您的启发性评论。然而，在您的情况下，排列将是坚如磐石的可复制。请参阅下面的一个解决方案/破解。
set.seed(123)
outcome1a <- digest(mclapply(seq_along(input), f, mc.cores=4), 'crc32')
outcome1b <- digest(sample(1:10), 'crc32')

set.seed(123)
outcome2a <- digest(lapply(seq_along(input), f), 'crc32')
outcome2b <- digest(sample(1:10), 'crc32')
identical(outcome1a, outcome2a)
identical(outcome1b, outcome2b)

library("parallel")
library("digest")

wrapply <- function(input, cores){
    recover.seed <- floor(runif(1)*1e6)
    seeds <- sample.int(length(input), replace=TRUE)
    f <- function(idx){ 
        # input[i] # do whatever with the input
        set.seed(seeds[idx]) # set to proper seed
        sample(1:10)
    }
    if(is.null(cores)){
        out <- digest(lapply(seq_along(input), f), 'crc32')
    }else{
        out <- digest(mclapply(seq_along(input), f, mc.cores=cores), 'crc32')
    }
    set.seed(recover.seed)
    return(out)
}

input <- 1:10

set.seed(123)
outcome1a <- wrapply(input, cores=4)
outcome1b <- digest(sample(1:10), 'crc32')

set.seed(123)
outcome2a <- wrapply(input, cores=NULL)
outcome2b <- digest(sample(1:10), 'crc32')

identical(outcome1a, outcome2a)
identical(outcome1b, outcome2b)

    library("parallel")
    library("digest")

    input<-1:10
    set.seed(1)
    nsamp<-20
    ## Generate and store all the random samples
    samples<-lapply(1:nsamp, function(x){ sample(input) })

    ## apply the algorithm "diff" on every sample
    ncore0<-  lapply(samples, diff)
    ncore1<-mclapply(samples, diff, mc.cores=1)
    ncore2<-mclapply(samples, diff, mc.cores=2)
    ncore3<-mclapply(samples, diff, mc.cores=3)
    ncore4<-mclapply(samples, diff, mc.cores=4)

    ## all equal
    all.equal(ncore0,ncore1)
    all.equal(ncore0,ncore2)
    all.equal(ncore0,ncore3)
    all.equal(ncore0,ncore4)