如何使用R高效引导组(多级)

如何使用R高效引导组(多级),r,sample,statistics-bootstrap,R,Sample,Statistics Bootstrap,我正在分析一项包含40个人的研究,每个人对10个小案例进行评分 indiv vign score score2 gender 1 1 5 3 1 1 2 2 4 1 1 3 8 1 1 . . . .

我正在分析一项包含40个人的研究,每个人对10个小案例进行评分

indiv     vign      score    score2    gender    
  1         1         5         3        1
  1         2         2         4        1   
  1         3         8         1        1
  .         .         .         .        .
  .         .         .         .        .
  .         .         .         .        .
  39       10         9         1        1 
  40        8         1         5        0 
  40        9         3         8        0 
我想做一个引导,但我很快意识到,对小插曲进行采样是没有意义的;我们应该取而代之的是抽样人员(因此我们每人抽样10行左右)

下面的函数可以工作,但它是下一个函数的瓶颈。 那么,问题是,如何才能更有效地做到这一点

ResampleMultilevel <- function(data, groupvar) {
  n <- length(unique(data[,groupvar]))

  index <- sample(data[ , groupvar], n, replace = TRUE)

  resampled <- NULL      # one of the issues is that we do not know 
                         # the size of the matrix yet, since it may vary. 
  for (i in 1:n) {
   resampled <- rbind(resampled, data[data[, groupvar] == index[i], ])
  }
  return(resampled)
}

重新采样根据评论,我正在给出答案

a <- cbind(rep(1:40, each = 10), rep(1:10, 4), rnorm(40), rnorm(40))
index <- c(1, 1, 3, 4, 2)
a[a[, 1] %in% index, ]
##       [,1] [,2]        [,3]        [,4]
##  [1,]    1    1  0.28135473  0.47970116
##  [2,]    1    2 -0.12628982  0.34862899
##  [3,]    1    3 -0.41140740  1.30204100
##  [4,]    1    4 -0.61163593 -1.13354157
##  [5,]    1    5 -0.31538238  1.42701315
##  [6,]    1    6 -0.20403098  2.13989392
##  [7,]    1    7  0.37681973  0.65843232
##  [8,]    1    8 -0.94062165  0.97246212
##  [9,]    1    9  0.63377352 -0.48948273
## [10,]    1   10 -0.39817929 -1.03607028
## [11,]    2    1  0.54866153 -0.55127459
## [12,]    2    2  0.08410140  0.01457366
## [13,]    2    3 -1.19006851  1.33213116
## [14,]    2    4 -0.47210092  0.83369309
## [15,]    2    5  0.75968678 -0.48212390
## [16,]    2    6 -1.00205770  0.56376027
## [17,]    2    7  0.67251644  0.07234657
## [18,]    2    8  0.73165780 -0.51483172
## [19,]    2    9 -0.26022238  2.33181762
## [20,]    2   10  0.03370091 -0.71427295
## [21,]    3    1  0.60810461  0.15054307
## [22,]    3    2 -1.29363706  1.30510127
## [23,]    3    3 -0.20479713 -2.39797975
## [24,]    3    4 -0.86927664 -0.10845738
## [25,]    3    5  0.89040130 -0.08459249
## [26,]    3    6 -0.21511823  1.33960644
## [27,]    3    7 -0.32413278 -0.31691484
## [28,]    3    8 -0.61545941 -0.10457591
## [29,]    3    9 -1.85072358  0.93267270
## [30,]    3   10  0.38456423  0.76231047
## [31,]    4    1  0.76016236  1.63854054
## [32,]    4    2 -0.94463491  1.87271085
## [33,]    4    3  1.62451250  1.63298961
## [34,]    4    4 -1.96908559  0.89058201
## [35,]    4    5  1.66755533  0.10288947
## [36,]    4    6 -0.02182803 -0.91358891
## [37,]    4    7 -0.09382921 -0.54950093
## [38,]    4    8  0.74597002  2.31924468
## [39,]    4    9  0.64732694  0.29681494
## [40,]    4   10 -0.66535049  1.81285111

a一个示例数据:
cbind(1:40,rep(1:10,4),rnorm(40),rnorm(40))
当前用作
groupvar
参数的是什么,
indiv
vign
?我认为可以用
data[index,]
替换for循环。我想那会省点钱。@Marius我现在用的是
indiv
@Seth,那不行。您需要为
索引中的每个数字(人)选择大约10个渐晕图。请注意,也可以有重复的人,这不会被选中。因为我们希望它返回的方式更多。记住,那些1,1,3,4,2应该是人,每个人都有大约10个小插曲。我现在看到我的例子是无效的。。。我的错误。试试这个:
cbind(rep(1:40,每个=10),rep(1:10,4),rnorm(40),rnorm(40))
a[which(a[,1]==2),]这有点有效,现在我想用一个向量来代替“2”,这个向量可能是真的!所以,您不希望重复同一行,而是希望所有第一列值位于索引中的行,对吗?哇,这太棒了。%in%到底是做什么的?
a <- cbind(rep(1:40, each = 10), rep(1:10, 4), rnorm(40), rnorm(40))
index <- c(1, 1, 3, 4, 2)
a[a[, 1] %in% index, ]
##       [,1] [,2]        [,3]        [,4]
##  [1,]    1    1  0.28135473  0.47970116
##  [2,]    1    2 -0.12628982  0.34862899
##  [3,]    1    3 -0.41140740  1.30204100
##  [4,]    1    4 -0.61163593 -1.13354157
##  [5,]    1    5 -0.31538238  1.42701315
##  [6,]    1    6 -0.20403098  2.13989392
##  [7,]    1    7  0.37681973  0.65843232
##  [8,]    1    8 -0.94062165  0.97246212
##  [9,]    1    9  0.63377352 -0.48948273
## [10,]    1   10 -0.39817929 -1.03607028
## [11,]    2    1  0.54866153 -0.55127459
## [12,]    2    2  0.08410140  0.01457366
## [13,]    2    3 -1.19006851  1.33213116
## [14,]    2    4 -0.47210092  0.83369309
## [15,]    2    5  0.75968678 -0.48212390
## [16,]    2    6 -1.00205770  0.56376027
## [17,]    2    7  0.67251644  0.07234657
## [18,]    2    8  0.73165780 -0.51483172
## [19,]    2    9 -0.26022238  2.33181762
## [20,]    2   10  0.03370091 -0.71427295
## [21,]    3    1  0.60810461  0.15054307
## [22,]    3    2 -1.29363706  1.30510127
## [23,]    3    3 -0.20479713 -2.39797975
## [24,]    3    4 -0.86927664 -0.10845738
## [25,]    3    5  0.89040130 -0.08459249
## [26,]    3    6 -0.21511823  1.33960644
## [27,]    3    7 -0.32413278 -0.31691484
## [28,]    3    8 -0.61545941 -0.10457591
## [29,]    3    9 -1.85072358  0.93267270
## [30,]    3   10  0.38456423  0.76231047
## [31,]    4    1  0.76016236  1.63854054
## [32,]    4    2 -0.94463491  1.87271085
## [33,]    4    3  1.62451250  1.63298961
## [34,]    4    4 -1.96908559  0.89058201
## [35,]    4    5  1.66755533  0.10288947
## [36,]    4    6 -0.02182803 -0.91358891
## [37,]    4    7 -0.09382921 -0.54950093
## [38,]    4    8  0.74597002  2.31924468
## [39,]    4    9  0.64732694  0.29681494
## [40,]    4   10 -0.66535049  1.81285111
a[which(a[,1] == 2),]       # works
a[which(a[,1] == index), ]  # does not work