如何使用R高效引导组(多级)
我正在分析一项包含40个人的研究,每个人对10个小案例进行评分如何使用R高效引导组(多级),r,sample,statistics-bootstrap,R,Sample,Statistics Bootstrap,我正在分析一项包含40个人的研究,每个人对10个小案例进行评分 indiv vign score score2 gender 1 1 5 3 1 1 2 2 4 1 1 3 8 1 1 . . . .
indiv vign score score2 gender
1 1 5 3 1
1 2 2 4 1
1 3 8 1 1
. . . . .
. . . . .
. . . . .
39 10 9 1 1
40 8 1 5 0
40 9 3 8 0
我想做一个引导,但我很快意识到,对小插曲进行采样是没有意义的;我们应该取而代之的是抽样人员(因此我们每人抽样10行左右)
下面的函数可以工作,但它是下一个函数的瓶颈。
那么,问题是,如何才能更有效地做到这一点
ResampleMultilevel <- function(data, groupvar) {
n <- length(unique(data[,groupvar]))
index <- sample(data[ , groupvar], n, replace = TRUE)
resampled <- NULL # one of the issues is that we do not know
# the size of the matrix yet, since it may vary.
for (i in 1:n) {
resampled <- rbind(resampled, data[data[, groupvar] == index[i], ])
}
return(resampled)
}
重新采样根据评论,我正在给出答案
a <- cbind(rep(1:40, each = 10), rep(1:10, 4), rnorm(40), rnorm(40))
index <- c(1, 1, 3, 4, 2)
a[a[, 1] %in% index, ]
## [,1] [,2] [,3] [,4]
## [1,] 1 1 0.28135473 0.47970116
## [2,] 1 2 -0.12628982 0.34862899
## [3,] 1 3 -0.41140740 1.30204100
## [4,] 1 4 -0.61163593 -1.13354157
## [5,] 1 5 -0.31538238 1.42701315
## [6,] 1 6 -0.20403098 2.13989392
## [7,] 1 7 0.37681973 0.65843232
## [8,] 1 8 -0.94062165 0.97246212
## [9,] 1 9 0.63377352 -0.48948273
## [10,] 1 10 -0.39817929 -1.03607028
## [11,] 2 1 0.54866153 -0.55127459
## [12,] 2 2 0.08410140 0.01457366
## [13,] 2 3 -1.19006851 1.33213116
## [14,] 2 4 -0.47210092 0.83369309
## [15,] 2 5 0.75968678 -0.48212390
## [16,] 2 6 -1.00205770 0.56376027
## [17,] 2 7 0.67251644 0.07234657
## [18,] 2 8 0.73165780 -0.51483172
## [19,] 2 9 -0.26022238 2.33181762
## [20,] 2 10 0.03370091 -0.71427295
## [21,] 3 1 0.60810461 0.15054307
## [22,] 3 2 -1.29363706 1.30510127
## [23,] 3 3 -0.20479713 -2.39797975
## [24,] 3 4 -0.86927664 -0.10845738
## [25,] 3 5 0.89040130 -0.08459249
## [26,] 3 6 -0.21511823 1.33960644
## [27,] 3 7 -0.32413278 -0.31691484
## [28,] 3 8 -0.61545941 -0.10457591
## [29,] 3 9 -1.85072358 0.93267270
## [30,] 3 10 0.38456423 0.76231047
## [31,] 4 1 0.76016236 1.63854054
## [32,] 4 2 -0.94463491 1.87271085
## [33,] 4 3 1.62451250 1.63298961
## [34,] 4 4 -1.96908559 0.89058201
## [35,] 4 5 1.66755533 0.10288947
## [36,] 4 6 -0.02182803 -0.91358891
## [37,] 4 7 -0.09382921 -0.54950093
## [38,] 4 8 0.74597002 2.31924468
## [39,] 4 9 0.64732694 0.29681494
## [40,] 4 10 -0.66535049 1.81285111
a一个示例数据:cbind(1:40,rep(1:10,4),rnorm(40),rnorm(40))
当前用作groupvar
参数的是什么,indiv
或vign
?我认为可以用data[index,]
替换for循环。我想那会省点钱。@Marius我现在用的是indiv
@Seth,那不行。您需要为索引中的每个数字(人)选择大约10个渐晕图。请注意,也可以有重复的人,这不会被选中。因为我们希望它返回的方式更多。记住,那些1,1,3,4,2应该是人,每个人都有大约10个小插曲。我现在看到我的例子是无效的。。。我的错误。试试这个:cbind(rep(1:40,每个=10),rep(1:10,4),rnorm(40),rnorm(40))
a[which(a[,1]==2),]这有点有效,现在我想用一个向量来代替“2”,这个向量可能是真的!所以,您不希望重复同一行,而是希望所有第一列值位于索引中的行,对吗?哇,这太棒了。%in%到底是做什么的?
a <- cbind(rep(1:40, each = 10), rep(1:10, 4), rnorm(40), rnorm(40))
index <- c(1, 1, 3, 4, 2)
a[a[, 1] %in% index, ]
## [,1] [,2] [,3] [,4]
## [1,] 1 1 0.28135473 0.47970116
## [2,] 1 2 -0.12628982 0.34862899
## [3,] 1 3 -0.41140740 1.30204100
## [4,] 1 4 -0.61163593 -1.13354157
## [5,] 1 5 -0.31538238 1.42701315
## [6,] 1 6 -0.20403098 2.13989392
## [7,] 1 7 0.37681973 0.65843232
## [8,] 1 8 -0.94062165 0.97246212
## [9,] 1 9 0.63377352 -0.48948273
## [10,] 1 10 -0.39817929 -1.03607028
## [11,] 2 1 0.54866153 -0.55127459
## [12,] 2 2 0.08410140 0.01457366
## [13,] 2 3 -1.19006851 1.33213116
## [14,] 2 4 -0.47210092 0.83369309
## [15,] 2 5 0.75968678 -0.48212390
## [16,] 2 6 -1.00205770 0.56376027
## [17,] 2 7 0.67251644 0.07234657
## [18,] 2 8 0.73165780 -0.51483172
## [19,] 2 9 -0.26022238 2.33181762
## [20,] 2 10 0.03370091 -0.71427295
## [21,] 3 1 0.60810461 0.15054307
## [22,] 3 2 -1.29363706 1.30510127
## [23,] 3 3 -0.20479713 -2.39797975
## [24,] 3 4 -0.86927664 -0.10845738
## [25,] 3 5 0.89040130 -0.08459249
## [26,] 3 6 -0.21511823 1.33960644
## [27,] 3 7 -0.32413278 -0.31691484
## [28,] 3 8 -0.61545941 -0.10457591
## [29,] 3 9 -1.85072358 0.93267270
## [30,] 3 10 0.38456423 0.76231047
## [31,] 4 1 0.76016236 1.63854054
## [32,] 4 2 -0.94463491 1.87271085
## [33,] 4 3 1.62451250 1.63298961
## [34,] 4 4 -1.96908559 0.89058201
## [35,] 4 5 1.66755533 0.10288947
## [36,] 4 6 -0.02182803 -0.91358891
## [37,] 4 7 -0.09382921 -0.54950093
## [38,] 4 8 0.74597002 2.31924468
## [39,] 4 9 0.64732694 0.29681494
## [40,] 4 10 -0.66535049 1.81285111
a[which(a[,1] == 2),] # works
a[which(a[,1] == index), ] # does not work