R 生成无循环的随机数_R_Performance_Loops_Random

R 生成无循环的随机数

r performance loops random

R 生成无循环的随机数,r,performance,loops,random,R,Performance,Loops,Random,我试图尽可能地减少一个函数的执行时间，该函数对一系列贝努利试验的输出求和这是我的工作方法，但很慢： set.seed(28100) sim <- data.frame(result = rep(NA, 10)) for (i in 1:nrow(sim)) { sim$result[i] <- sum(rbinom(1200, size = 1, prob = 0.2)) } sim # result # 1 268 # 2 230 # 3 223 #

我试图尽可能地减少一个函数的执行时间，该函数对一系列贝努利试验的输出求和

这是我的工作方法，但很慢：

set.seed(28100)
sim <- data.frame(result = rep(NA, 10))
for (i in 1:nrow(sim)) {
  sim$result[i] <- sum(rbinom(1200, size = 1, prob = 0.2))
}
sim
# result
# 1     268
# 2     230
# 3     223
# 4     242
# 5     224
# 6     218
# 7     237
# 8     254
# 9     227
# 10    247

set.seed（28100）
sim像这样怎么样：
set.seed(28100)
sims <- 10
n <- 1200
r <- rbinom(n*sims, size = 1, prob = 0.2)
r <- matrix(r, ncol=sims)
colSums(r)

set.seed（28100）
西姆斯像这样怎么样：
set.seed(28100)
sims <- 10
n <- 1200
r <- rbinom(n*sims, size = 1, prob = 0.2)
r <- matrix(r, ncol=sims)
colSums(r)

set.seed（28100）
模拟人生执行以下操作：
sim = rep(NA, 10)
sapply(sim,FUN = function(x) {sum(rbinom(1200, size = 1, prob = 0.2))})

结果:
[1] 216 231 234 249 249 236 255 251 231 244

然后转换为数据帧
执行以下操作：
sim = rep(NA, 10)
sapply(sim,FUN = function(x) {sum(rbinom(1200, size = 1, prob = 0.2))})

set.seed(28100)
nsim=10
sim = data.frame(result=replicate(nsim, sum(rbinom(1200, size=1, prob=0.2))))

sim

结果:
[1] 216 231 234 249 249 236 255 251 231 244

然后转换为数据帧
set.seed(28100)
nsim=10
sim = data.frame(result=replicate(nsim, sum(rbinom(1200, size=1, prob=0.2))))

sim

以下是10000次模拟的各种方法的一些计时：
microbenchmark::microbenchmark(
  replicate = {nsim=10000
  data.frame(result=replicate(nsim, sum(rbinom(1200, size=1, prob=0.2))))},
  matrixColSums = {
    sims <- 10000
    n <- 1200
    r <- rbinom(n*sims, size = 1, prob = 0.2)
    r <- matrix(r, ncol=sims)
    data.frame(result=colSums(r)) },
  sapply = data.frame(result=sapply(1:10000, FUN = function(x) {sum(rbinom(1200, size = 1, prob = 0.2))})),
  times=10
)

以下是10000次模拟的各种方法的一些计时：
microbenchmark::microbenchmark(
  replicate = {nsim=10000
  data.frame(result=replicate(nsim, sum(rbinom(1200, size=1, prob=0.2))))},
  matrixColSums = {
    sims <- 10000
    n <- 1200
    r <- rbinom(n*sims, size = 1, prob = 0.2)
    r <- matrix(r, ncol=sims)
    data.frame(result=colSums(r)) },
  sapply = data.frame(result=sapply(1:10000, FUN = function(x) {sum(rbinom(1200, size = 1, prob = 0.2))})),
  times=10
)

二项分布定义为伯努利试验的总和
# this line from your question
sum(rbinom(1200, size = 1, prob = 0.2))
# is equivalent to this
rbinom(1, size = 1200, prob = 0.2)

# and replicating it
replicate(expr = sum(rbinom(1200, size = 1, prob = 0.2)), n = 10)
# is equivalent to setting n higher:

        ### This is the only line of code you need! ####
rbinom(10, size = 1200, prob = 0.2)

在我（速度相当慢）的笔记本电脑上，100000次模拟大约需要0.01秒，1M次模拟大约需要0.12秒
修改@eipi的良好基准测试，这比其他方法快700-900倍（现在有bug修复！）
基准代码：
nn = 10000
n_bern = 1200
library(microbenchmark)
print(
    microbenchmark::microbenchmark(
        replicate =
            replicate(nn, sum(rbinom(
                n_bern, size = 1, prob = 0.2
            )))
        ,
        matrixColSums =
            colSums(matrix(
                rbinom(n_bern * nn, size = 1, prob = 0.2), ncol = nn
            )),
        sapply = sapply(
            1:nn,
            FUN = function(x) {
                sum(rbinom(n_bern, size = 1, prob = 0.2))
            }
        ),
        binom = rbinom(nn, size = n_bern, prob = 0.2),
        times = 10
    ),
    order = "median",
    signif = 4
)

二项分布定义为伯努利试验的总和
# this line from your question
sum(rbinom(1200, size = 1, prob = 0.2))
# is equivalent to this
rbinom(1, size = 1200, prob = 0.2)

# and replicating it
replicate(expr = sum(rbinom(1200, size = 1, prob = 0.2)), n = 10)
# is equivalent to setting n higher:

        ### This is the only line of code you need! ####
rbinom(10, size = 1200, prob = 0.2)

在我（速度相当慢）的笔记本电脑上，100000次模拟大约需要0.01秒，1M次模拟大约需要0.12秒
修改@eipi的良好基准测试，这比其他方法快700-900倍（现在有bug修复！）
基准代码：
nn = 10000
n_bern = 1200
library(microbenchmark)
print(
    microbenchmark::microbenchmark(
        replicate =
            replicate(nn, sum(rbinom(
                n_bern, size = 1, prob = 0.2
            )))
        ,
        matrixColSums =
            colSums(matrix(
                rbinom(n_bern * nn, size = 1, prob = 0.2), ncol = nn
            )),
        sapply = sapply(
            1:nn,
            FUN = function(x) {
                sum(rbinom(n_bern, size = 1, prob = 0.2))
            }
        ),
        binom = rbinom(nn, size = n_bern, prob = 0.2),
        times = 10
    ),
    order = "median",
    signif = 4
)

矢量化是关键
主要的省时方法（至少对于大的n
）是使用sample

e、 g.为了
n <- 1e7
sample(0:1, n, replace=TRUE) 

大约需要24秒。矢量化操作通常可以替换循环，但知道何时何地取决于您是否熟悉满足需要的可用功能。
矢量化是关键
主要的省时方法（至少对于大的n
）是使用sample

e、 g.为了
n <- 1e7
sample(0:1, n, replace=TRUE) 

大约需要24秒。矢量化操作通常可以替换循环，但知道何时何地取决于是否熟悉满足您需求的可用函数。
我将向您指出我的答案进行解释，但一行解决方案是rbinom（10，size=1200，prob=0.2）
。我将向您指出我的答案进行解释，但是单行程序解决方案是rbinom（10，size=1200，prob=0.2）
。如果您查看答案中我的replicate
代码，您会发现我没有正确的参数值（1和12，而不是1200和1）。我正朝着你的答案前进，但我想我必须在我的中间，而不是事先做计时。无论如何，replicate
并不比其他两种方法快，而你的方法显然是最好的选择。我只想让你知道，这样你就可以纠正replicate
方法的代码和计时（我已经纠正了我的答案）。有趣的是，当我为基准测试参数化nn
时，我也开始拿出一个n_bernoulli=1200
，但是当我看到你的代码时，你刚刚得到了12个-我以为你在做一些花哨的事情来解释它-我没有花任何时间去想它。如果你看看我的replicate
代码在你的答案中，你会发现我没有正确的参数值（1和12，而不是1200和1）。我正朝着你的答案前进，但我想我必须在我的中间，而不是事先做计时。无论如何，replicate
并不比其他两种方法快，而你的方法显然是最好的选择。我只想让你知道，这样你就可以纠正replicate
方法的代码和计时（我已经纠正了我的答案）。有趣的是，当我为基准测试参数化nn
时，我也开始拿出一个n_bernoulli=1200
，但是当我看到你的代码时，你刚刚得到了12个-我以为你在做一些花哨的事情，在其他地方解释它-我没有花任何时间去想它。