R自动回归

R自动回归,r,data.table,lapply,R,Data.table,Lapply,我可以自己运行所有的模型,但我希望能自动运行,因为我要做将近一百次的回归。这是我的尝试, data1=mtcars CONTROL = c("mpg", "cyl") X = c("hp","drat","wt","am") Y = c("vs") model1= glm(vs ~ hp + mpg + cyl, family = binomial, data = data1) model2= glm(vs ~ drat + mpg + cyl, family = binomial, data

我可以自己运行所有的模型,但我希望能自动运行,因为我要做将近一百次的回归。这是我的尝试,

data1=mtcars
CONTROL = c("mpg", "cyl")
X = c("hp","drat","wt","am")
Y = c("vs")

model1= glm(vs ~ hp + mpg + cyl, family = binomial, data = data1)
model2= glm(vs ~ drat + mpg + cyl, family = binomial, data = data1)
model3= glm(vs ~ wt + mpg + cyl, family = binomial, data = data1)
model4= glm(vs ~ am + mpg + cyl, family = binomial, data = data1)

这可以让您在大部分方面达到目标:

VARNAME COEF LL UL
hp
drat
wt
am
库(扫帚)
图书馆(dplyr)
char_formula=sprintf(“%s~%s++%s”,Y,粘贴(控制,折叠=“+”,X)
mods=list()
对于(沿(char_公式)的顺序i){
mods[[i]]=glm(as.formula(char_formula[i]),family=二项式,data=data1)
}
名称(mods)=X
lapply(mods,tidy)%>%
绑定行(.id=“VARNAME”)%>%
过滤器(术语%X中的%X)
##tibble:4 x 6
#VARNAME术语估计标准误差统计p值
#                        
#1马力-0.0461 0.0362-1.27 0.203
#2牵引力-2.53 1.83-1.38 0.168
#3重量2.26 1.73 1.30 0.192
#上午4时至61.018184-0.00335   0.997

我们假设您想要对X中4个变量加上控制变量中的3个变量的所有集合回归vs

此外,期望输出的轮廓不清楚,因此我们假设您需要每个模型的系数以及一些其他未指定的统计数据,我们使用对数似然性作为示例。每一输出行表示一个模型运行的结果,每行中的NA表示未使用的系数

没有使用任何软件包

library(broom)
library(dplyr)

char_formula = sprintf("%s ~ %s + %s", Y, paste(CONTROL, collapse = "+"), X)

mods = list()
for(i in seq_along(char_formula)) {
  mods[[i]] = glm(as.formula(char_formula[i]), family = binomial, data = data1)
}

names(mods) = X
lapply(mods, tidy) %>%
  bind_rows(.id = "VARNAME") %>%
  filter(term %in% X)
# # A tibble: 4 x 6
#   VARNAME term  estimate  std.error statistic p.value
#   <chr>   <chr>    <dbl>      <dbl>     <dbl>   <dbl>
# 1 hp      hp     -0.0461     0.0362  -1.27      0.203
# 2 drat    drat   -2.53       1.83    -1.38      0.168
# 3 wt      wt      2.26       1.73     1.30      0.192
# 4 am      am    -61.0    18184.      -0.00335   0.997

您如何定义“所有模型”?您想要所有带有3个变量的模型吗?还是所有模型都有一个、两个或三个变量?使用双向交互术语?三方互动术语,更多?@Gregor Thomas非常感谢“所有模型”指的是我为模型1-4展示的示例。我有一个恒定的控制变量列表,然后是一个附加预测值列表;没有交互项SOKAY,因此模型包含所有
控制变量和
X
中的一个变量,那么LL是什么?对数可能性?另外,请修正问题,使其能够独立运行,而无需参考评论。
library(broom)
library(dplyr)

char_formula = sprintf("%s ~ %s + %s", Y, paste(CONTROL, collapse = "+"), X)

mods = list()
for(i in seq_along(char_formula)) {
  mods[[i]] = glm(as.formula(char_formula[i]), family = binomial, data = data1)
}

names(mods) = X
lapply(mods, tidy) %>%
  bind_rows(.id = "VARNAME") %>%
  filter(term %in% X)
# # A tibble: 4 x 6
#   VARNAME term  estimate  std.error statistic p.value
#   <chr>   <chr>    <dbl>      <dbl>     <dbl>   <dbl>
# 1 hp      hp     -0.0461     0.0362  -1.27      0.203
# 2 drat    drat   -2.53       1.83    -1.38      0.168
# 3 wt      wt      2.26       1.73     1.30      0.192
# 4 am      am    -61.0    18184.      -0.00335   0.997
# test data is the builtin mtcars as well as CONTROL, X and Y
CONTROL <- c("mpg", "cyl")
X <- c("hp","drat","wt","am")
Y <- "vs"

stats <- function(nm) {
  fo <- reformulate(c(setdiff(X, nm), CONTROL), Y)
  fm <- glm(fo, mtcars, family = binomial)
  coefs <- c(coef(fm), setNames(NA, nm))[c("(Intercept)", X)]
  c(coefs, logLik = logLik(fm))  # add other statistics to this line
}

do.call("rbind", lapply(X, stats))
     (Intercept)          hp        drat        wt         am        logLik
[1,]   150.84593          NA   -4.381111  2.242363  -62.74444 -2.338630e+00
[2,]   137.24734 -0.01079679          NA  1.209804  -60.16064 -2.826391e+00
[3,]  3285.91281 -6.54621462 -344.866970        NA -311.77291 -1.315947e-08
[4,]    84.76241 -0.49550208  -25.801081 24.231253         NA -3.863991e+00