Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/r/83.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R 使用每个预测器列拟合模型,并将结果单独存储在数据帧中_R_Dplyr_Purrr_Broom - Fatal编程技术网

R 使用每个预测器列拟合模型,并将结果单独存储在数据帧中

R 使用每个预测器列拟合模型,并将结果单独存储在数据帧中,r,dplyr,purrr,broom,R,Dplyr,Purrr,Broom,我有一个数据框,其中一列是响应变量,几列是预测变量。我想分别使用每个预测变量为响应变量拟合模型,最后创建一个包含模型系数的数据帧。以前,我会这样做: data(iris) iris_vars <- c("Sepal.Width", "Petal.Length", "Petal.Width") fits.iris <- lapply(iris_vars, function(x) {lm(substitute(Sepal.Length ~ i, list(i = as.name(x))

我有一个数据框,其中一列是响应变量,几列是预测变量。我想分别使用每个预测变量为响应变量拟合模型,最后创建一个包含模型系数的数据帧。以前,我会这样做:

data(iris)

iris_vars <- c("Sepal.Width", "Petal.Length", "Petal.Width")
fits.iris <- lapply(iris_vars, function(x) {lm(substitute(Sepal.Length ~ i, list(i = as.name(x))), data = iris)})

# extract model coeffs, so forth and so on, eventually combining into a result dataframe
iris.p <- as.data.frame(lapply(fits.iris, function(f) summary(f)$coefficients[,4]))
iris.r <- as.data.frame(lapply(fits.iris, function(f) summary(f)$r.squared))
但是,我不确定如何将此列表转换为适当的形式,以便与
broom::tidy
一起使用。如果我使用分组行,而不是列,我会存储模型拟合,并使用
broom::tidy
执行以下操作:

iris.fits <- group_by(Species) %>% do(modfit1 = lm(Sepal.Length~Sepal.Width,data=.))
tidy(iris.fits, modfit1)
iris.fits%do(modfit1=lm(萼片长~萼片宽,数据=))
整洁(iris.fits,modfit1)
当然这不是我正在做的,但我希望在使用数据列时也有类似的过程。有没有办法,也许可以使用
purrr::nest
或类似的方法来创建所需的输出?

1)这可以为模型匹配提供
一瞥
整洁
输出:

library(broom)

make_model <- function(nm) lm(iris[c("Sepal.Length", nm)])
fits <- Map(make_model, iris_vars)

glance_tidy <- function(x) c(unlist(glance(x)), unlist(tidy(x)[, -1]))
out <- sapply(fits, glance_tidy)
或转置:

> t(out)
              r.squared adj.r.squared     sigma  statistic      p.value df
Sepal.Width  0.01382265   0.007159294 0.8250966   2.074427 1.518983e-01  2
Petal.Length 0.75995465   0.758332718 0.4070745 468.550154 1.038667e-47  2
Petal.Width  0.66902769   0.666791387 0.4779948 299.167312 2.325498e-37  2
                 logLik      AIC      BIC  deviance df.residual estimate1
Sepal.Width  -182.99584 371.9917 381.0236 100.75610         148  6.526223
Petal.Length  -77.02021 160.0404 169.0723  24.52503         148  4.306603
Petal.Width  -101.11073 208.2215 217.2534  33.81489         148  4.777629
              estimate2 std.error1 std.error2 statistic1 statistic2
Sepal.Width  -0.2233611 0.47889634 0.15508093   13.62763  -1.440287
Petal.Length  0.4089223 0.07838896 0.01889134   54.93890  21.646019
Petal.Width   0.8885803 0.07293476 0.05137355   65.50552  17.296454
                  p.value1     p.value2
Sepal.Width   6.469702e-28 1.518983e-01
Petal.Length 2.426713e-100 1.038667e-47
Petal.Width  3.340431e-111 2.325498e-37
2)如果我们从函数定义中删除第一个未列出的列表,那么我们将得到一个2d列表(而不是2d数字矩阵):

glance\u tidy\u l%Map(f=make\u model)%%>%sapply(glance\u tidy\u l)
萼片。宽花瓣。长花瓣。宽
r、 平方0.01382265 0.7599546 0.6690277
调整r平方0.007159294 0.7583327 0.6667914
西格玛0.8250966 0.4070745 0.4779948
统计数据2.074427 468.5502 299.1673
p、 值0.1518983 1.038667e-47 2.325498e-37
东风2
logLik-182.9958-77.02021-101.1107
AIC 371.9917 160.0404 208.2215
BIC 381.0236 169.0723 217.2534
偏差100.7561 24.52503 33.81489
df.剩余148 148 148
估计数1 6.526223 4.306603 4.777629
估计值2-0.2233611 0.40892230.8885803
标准错误1 0.4788963 0.07838896 0.07293476
标准错误2 0.1550809 0.01889134 0.05137355
统计数字113.62763 54.9389 65.50552
统计2-1.440287 21.64602 17.29645
p、 值1 6.469702e-28 2.426713e-100 3.340431e-111
p、 值2 0.1518983 1.038667e-47 2.325498e-37

如果您想开始设置一个带有列表列的准嵌套数据框,那么map/model/unnest/tidy步骤会非常顺利

首先,设置数据框:

> library(dplyr)
> 
> nested_df <- data_frame(data = list(iris %>% 
                                          select(response = Sepal.Length, 
                                                 predictor = Sepal.Width), 
                                      iris %>% 
                                          select(response = Sepal.Length, 
                                                 predictor = Petal.Length),
                                      iris %>% 
                                          select(response = Sepal.Length, 
                                                 predictor = Petal.Width)))
> 
> nested_df
# A tibble: 3 × 1
                    data
                  <list>
1 <data.frame [150 × 2]>
2 <data.frame [150 × 2]>
3 <data.frame [150 × 2]>
>库(dplyr)
> 
>嵌套_df%
选择(响应=萼片长度,
预测器=萼片宽度),
虹膜%>%
选择(响应=萼片长度,
预测器=花瓣长度),
虹膜%>%
选择(响应=萼片长度,
预测器=花瓣宽度)
> 
>嵌套的
#一个tibble:3×1
数据
1.
2.
3.
现在使用purrr、tidyr和broom获得建模结果

> library(tidyr)
> library(purrr)
> library(broom)
> 
> nested_df %>%
      mutate(models = map(data, ~ lm(response ~ predictor, .))) %>%
      unnest(map(models, tidy))
# A tibble: 6 × 5
         term   estimate  std.error statistic       p.value
        <chr>      <dbl>      <dbl>     <dbl>         <dbl>
1 (Intercept)  6.5262226 0.47889634 13.627631  6.469702e-28
2   predictor -0.2233611 0.15508093 -1.440287  1.518983e-01
3 (Intercept)  4.3066034 0.07838896 54.938900 2.426713e-100
4   predictor  0.4089223 0.01889134 21.646019  1.038667e-47
5 (Intercept)  4.7776294 0.07293476 65.505517 3.340431e-111
6   predictor  0.8885803 0.05137355 17.296454  2.325498e-37
>库(tidyr)
>图书馆(purrr)
>图书馆(扫帚)
> 
>嵌套的_df%>%
突变(模型=map(数据,~lm(反应~predictor,))%>%
unnest(地图(模型、整洁))
#一个tibble:6×5
术语估计标准误差统计p值
1(截距)6.5262226 0.47889634 13.627631 6.469702e-28
2预测值-0.2233611 0.15508093-1.440287 1.518983e-01
3(截距)4.3066034 0.07838896 54.938900 2.426713e-100
4预测值0.4089223 0.01889134 21.646019 1.038667e-47
5(截距)4.7776294 0.07293476 65.505517 3.340431e-111
6预测值0.8885803 0.05137355 17.296454 2.325498e-37

您可以使用
filter
仅提取斜率(
term==“predictor”
)或者,您可以使用
glance
而不是最后一行代码中的
tidy
来获得这些结果。

准嵌套数据帧的另一个选项是使用purrr::map_df,它可以快速处理大量预测变量。这还可以将预测器作为封闭数据框中的一列提供给您。然后按照Julia的例子来拟合模型

> library(dplyr)
> library(purrr)
>
> def_nested_df <- function(x) {
    data_frame("covariate" = x,
               "data" = list(iris %>% tbl_df %>%
                               select_("response" = "Sepal.Length", 
                                       "predictor" = x)))
  }    
> 
> nested_df <- 
    c("Sepal.Width", "Petal.Length", "Petal.Width") %>%
    map_df(def_nested_df)
>
> nested_df
# A tibble: 3 × 2
     covariate               data
         <chr>             <list>
1  Sepal.Width <tibble [150 × 2]>
2 Petal.Length <tibble [150 × 2]>
3  Petal.Width <tibble [150 × 2]>
>
> nested_df[[1, "data"]]
# A tibble: 150 × 2
   response predictor
      <dbl>     <dbl>
1       5.1       3.5
2       4.9       3.0
3       4.7       3.2
4       4.6       3.1
5       5.0       3.6
6       5.4       3.9
7       4.6       3.4
8       5.0       3.4
9       4.4       2.9
10      4.9       3.1
# ... with 140 more rows
>库(dplyr)
>图书馆(purrr)
>
>def_嵌套_df%tbl_df%>%
选择(“响应”=“萼片长度”,
“预测器”=x)))
}    
> 
>嵌套_df%
地图方向图(定义方向图)
>
>嵌套的
#一个tibble:3×2
协变量数据
1萼片宽
2花瓣长度
3花瓣宽度
>
>嵌套的_df[[1,“数据”]]
#一个tibble:150×2
反应预测器
1       5.1       3.5
2       4.9       3.0
3       4.7       3.2
4       4.6       3.1
5       5.0       3.6
6       5.4       3.9
7       4.6       3.4
8       5.0       3.4
9       4.4       2.9
10      4.9       3.1
# ... 还有140多行

我的答案在精神上与Julia Silge和wysiwyg的答案相似,但我希望避免手动键入变量名,并将响应和预测的名称保留在模型对象的公式中:

require(tibble)
require(dplyr)
require(tidyr)
require(purrr)
require(broom)

df <- iris
response_var <- "Sepal.Length"

vars <- tibble(response=response_var,
               predictor=setdiff(names(df), response_var))

compose_formula <- function(x, y)
  as.formula(paste0("~lm(", y, "~", x, ", data=.)"))

models <- tibble(data=list(df)) %>%
           crossing(vars) %>%
           mutate(fmla = map2(predictor, response, compose_formula),
                  model = map2(data, fmla, ~at_depth(.x, 0, .y)))

models %>% unnest(map(model, tidy))
models %>% unnest(map(model, glance), .drop=T)
require(TIBLE)
需要(dplyr)
需要(三年)
需要(purrr)
需要(扫帚)

df最终输出应该是什么样子?添加到管道中的
map\u df(tidy)
的输出是否接近您想要的?是的,
map\u df(tidy)
接近。虽然,
map_df(glance)
不起作用。如果定义了用作解释变量的变量向量,则可以在使用
glance
之前使用
setNames
。例如,如果仅使用
iris\u vars
y
> library(dplyr)
> 
> nested_df <- data_frame(data = list(iris %>% 
                                          select(response = Sepal.Length, 
                                                 predictor = Sepal.Width), 
                                      iris %>% 
                                          select(response = Sepal.Length, 
                                                 predictor = Petal.Length),
                                      iris %>% 
                                          select(response = Sepal.Length, 
                                                 predictor = Petal.Width)))
> 
> nested_df
# A tibble: 3 × 1
                    data
                  <list>
1 <data.frame [150 × 2]>
2 <data.frame [150 × 2]>
3 <data.frame [150 × 2]>
> library(tidyr)
> library(purrr)
> library(broom)
> 
> nested_df %>%
      mutate(models = map(data, ~ lm(response ~ predictor, .))) %>%
      unnest(map(models, tidy))
# A tibble: 6 × 5
         term   estimate  std.error statistic       p.value
        <chr>      <dbl>      <dbl>     <dbl>         <dbl>
1 (Intercept)  6.5262226 0.47889634 13.627631  6.469702e-28
2   predictor -0.2233611 0.15508093 -1.440287  1.518983e-01
3 (Intercept)  4.3066034 0.07838896 54.938900 2.426713e-100
4   predictor  0.4089223 0.01889134 21.646019  1.038667e-47
5 (Intercept)  4.7776294 0.07293476 65.505517 3.340431e-111
6   predictor  0.8885803 0.05137355 17.296454  2.325498e-37
> library(dplyr)
> library(purrr)
>
> def_nested_df <- function(x) {
    data_frame("covariate" = x,
               "data" = list(iris %>% tbl_df %>%
                               select_("response" = "Sepal.Length", 
                                       "predictor" = x)))
  }    
> 
> nested_df <- 
    c("Sepal.Width", "Petal.Length", "Petal.Width") %>%
    map_df(def_nested_df)
>
> nested_df
# A tibble: 3 × 2
     covariate               data
         <chr>             <list>
1  Sepal.Width <tibble [150 × 2]>
2 Petal.Length <tibble [150 × 2]>
3  Petal.Width <tibble [150 × 2]>
>
> nested_df[[1, "data"]]
# A tibble: 150 × 2
   response predictor
      <dbl>     <dbl>
1       5.1       3.5
2       4.9       3.0
3       4.7       3.2
4       4.6       3.1
5       5.0       3.6
6       5.4       3.9
7       4.6       3.4
8       5.0       3.4
9       4.4       2.9
10      4.9       3.1
# ... with 140 more rows
require(tibble)
require(dplyr)
require(tidyr)
require(purrr)
require(broom)

df <- iris
response_var <- "Sepal.Length"

vars <- tibble(response=response_var,
               predictor=setdiff(names(df), response_var))

compose_formula <- function(x, y)
  as.formula(paste0("~lm(", y, "~", x, ", data=.)"))

models <- tibble(data=list(df)) %>%
           crossing(vars) %>%
           mutate(fmla = map2(predictor, response, compose_formula),
                  model = map2(data, fmla, ~at_depth(.x, 0, .y)))

models %>% unnest(map(model, tidy))
models %>% unnest(map(model, glance), .drop=T)
# A tibble: 9 x 7
      response    predictor              term   estimate  std.error statistic
         <chr>        <chr>             <chr>      <dbl>      <dbl>     <dbl>
1 Sepal.Length  Sepal.Width       (Intercept)  6.5262226 0.47889634 13.627631
2 Sepal.Length  Sepal.Width       Sepal.Width -0.2233611 0.15508093 -1.440287
3 Sepal.Length Petal.Length       (Intercept)  4.3066034 0.07838896 54.938900
4 Sepal.Length Petal.Length      Petal.Length  0.4089223 0.01889134 21.646019
5 Sepal.Length  Petal.Width       (Intercept)  4.7776294 0.07293476 65.505517
6 Sepal.Length  Petal.Width       Petal.Width  0.8885803 0.05137355 17.296454
7 Sepal.Length      Species       (Intercept)  5.0060000 0.07280222 68.761639
8 Sepal.Length      Species Speciesversicolor  0.9300000 0.10295789  9.032819
9 Sepal.Length      Species  Speciesvirginica  1.5820000 0.10295789 15.365506
# ... with 1 more variables: p.value <dbl>
# A tibble: 4 x 13
      response    predictor  r.squared adj.r.squared     sigma  statistic
         <chr>        <chr>      <dbl>         <dbl>     <dbl>      <dbl>
1 Sepal.Length  Sepal.Width 0.01382265   0.007159294 0.8250966   2.074427
2 Sepal.Length Petal.Length 0.75995465   0.758332718 0.4070745 468.550154
3 Sepal.Length  Petal.Width 0.66902769   0.666791387 0.4779948 299.167312
4 Sepal.Length      Species 0.61870573   0.613518054 0.5147894 119.264502
# ... with 7 more variables: p.value <dbl>, df <int>, logLik <dbl>, AIC <dbl>,
#   BIC <dbl>, deviance <dbl>, df.residual <int>