使用R
我有一个大数据集,我想进行事后计算:使用R,r,processing-efficiency,posthoc,R,Processing Efficiency,Posthoc,我有一个大数据集,我想进行事后计算: dat = as.data.frame(matrix(runif(10000*300), ncol = 10000, nrow = 300)) dat$group = rep(letters[1:3], 100) 这是我的密码: start <- Sys.time() vars <- names(dat)[-ncol(dat)] aov.out <- lapply(vars, function(x) { lm(su
dat = as.data.frame(matrix(runif(10000*300), ncol = 10000, nrow = 300))
dat$group = rep(letters[1:3], 100)
这是我的密码:
start <- Sys.time()
vars <- names(dat)[-ncol(dat)]
aov.out <- lapply(vars, function(x) {
lm(substitute(i ~ group, list(i = as.name(x))), data = dat)})
TukeyHSD.out <- lapply(aov.out, function(x) TukeyHSD(aov(x)))
Sys.time() - start
start您的示例太大了。为了说明这个想法,我用了一个小的
set.seed(0)
dat = as.data.frame(matrix(runif(2*300), ncol = 2, nrow = 300))
dat$group = rep(letters[1:3], 100)
为什么在安装的“lm”车型上调用aov
?这基本上是改装同一型号
先读一读lm
是aov
的主要工具,因此您可以将多个LHS公式传递给aov
。该模型具有c类(“maov”、“aov”、“mlm”、“lm”)
我使用了“for”循环。如果您愿意,可以将其替换为lappy
。@Dong该错误现已修复。如果使用我的方法,模型估计可以快几倍,但与原始代码相比,post-hoc并没有得到加速。因此,总体加速是有限的。正如我所测试的,问题不是“for”循环,而是TukeyHSD
中qtukey
和ptukey
函数的缓慢。这两个函数占post hoc执行时间的60%~70%。对于TukeyHSD
,我的黑客攻击并不是一种很好的“maov”方法,因为它不允许重复计算qtukey
。事实上,对于所有模型,这个分位数只需要计算一次。@Dong编写一个合适的TukeyHSD.maov
更为复杂,尽管我的答案中的代码提供了一个良好的开端。是的,一般来说,R core中对“传销”和“maov”的支持较差。希望这能在未来变得更好。
response_names <- names(dat)[-ncol(dat)]
form <- as.formula(sprintf("cbind(%s) ~ group", toString(response_names)))
fit <- do.call("aov", list(formula = form, data = quote(dat)))
aov_hack <- fit
aov_hack[c("coefficients", "fitted.values")] <- NULL ## don't need them
aov_hack[c("contrasts", "xlevels")] <- NULL ## don't need them either
attr(aov_hack$model, "terms") <- NULL ## don't need it
class(aov_hack) <- c("aov", "lm") ## drop "maov" and "mlm"
## the following elements are mandatory for `TukeyHSD`
## names(aov_hack)
#[1] "residuals" "effects" "rank" "assign" "qr"
#[6] "df.residual" "call" "terms" "model"
N <- length(response_names) ## number of response variables
result <- vector("list", N)
for (i in 1:N) {
## change response variable in the formula
aov_hack$call[[2]][[2]] <- as.name(response_names[i])
## change residuals
aov_hack$residuals <- fit$residuals[, i]
## change effects
aov_hack$effects <- fit$effects[, i]
## change "terms" object and attribute
old_tm <- terms(fit) ## old "terms" object in the model
old_tm[[2]] <- as.name(response_names[i]) ## change response name in terms
new_tm <- terms.formula(formula(old_tm)) ## new "terms" object
aov_hack$terms <- new_tm ## replace `aov_hack$terms`
## replace data in the model frame
aov_hack$model[1] <- data.frame(fit$model[[1]][, i])
names(aov_hack$model)[1] <- response_names[i]
## run `TukeyHSD` on `aov_hack`
result[[i]] <- TukeyHSD(aov_hack)
}
result[[1]] ## for example
# Tukey multiple comparisons of means
# 95% family-wise confidence level
#
#Fit: aov(formula = V1 ~ group, data = dat)
#
#$group
# diff lwr upr p adj
#b-a -0.012743870 -0.1043869 0.07889915 0.9425847
#c-a -0.022470482 -0.1141135 0.06917254 0.8322109
#c-b -0.009726611 -0.1013696 0.08191641 0.9661356