在R中加速概率加权抽样

在R中加速概率加权抽样,r,performance,multicore,sample,weighted,R,Performance,Multicore,Sample,Weighted,如何加速R中的概率加权采样 # Let's assume we are considering following example: w <- sample(1:4000,size=2e6, replace=T) # "w" will be integer, so we are going to convert it to numeric. w <- as.numeric(w) # Actually the sampling process have to be repea

如何加速R中的概率加权采样

# Let's assume we are considering following example:
w <- sample(1:4000,size=2e6, replace=T)   

# "w" will be integer, so we are going to convert it to numeric.
w <- as.numeric(w)

# Actually the sampling process have to be repeated many times.
M <- matrix(NA, 10, 2000)
system.time(
for (r in 1:10){
  ix <- sample(1:2e6,size=2000,prob=w/sum(w))
  M[r,] <- ix
})
# It's worth it to mention that without "prob=w/sum(w)" sampling is considerably faster.
# The main goal is to speed up sampling with probability weights!
system.time(ix <- sample(1:2e6,size=2000,prob=w/sum(w)))
#假设我们正在考虑以下示例:

w速度问题仅限于加权采样,无需更换。这是您的代码,将与
sample
无关的部分移出循环

normalized_weights <- w/sum(w)
#No weights
system.time(
for (r in 1:10){
  ix <- sample(2e6, size = 2000)
})
#Weighted, no replacement
system.time(
for (r in 1:10){
  ix <- sample(2e6, size = 2000, prob = normalized_weights)
})
#Weighted with replacement
system.time(
for (r in 1:10){
  ix <- sample(2e6, size = 2000, replace = TRUE, prob = normalized_weights)
})

normalized\u权重对于
sample
@Roland-True,不需要对权重进行规格化,
sample
将自己进行规格化。(事实上,在无需更换的情况下,加权采样会重复很多次。)为了测试速度,我试着让样本做尽可能少的工作。