如何使用不同的自变量和因变量在R中添加标准化系数来运行多元线性回归？_R_Loops_Linear Regression_Lapply_Purrr

如何使用不同的自变量和因变量在R中添加标准化系数来运行多元线性回归？

r loops

如何使用不同的自变量和因变量在R中添加标准化系数来运行多元线性回归？,r,loops,linear-regression,lapply,purrr,R,Loops,Linear Regression,Lapply,Purrr,我目前正在尝试运行一个循环，对多个自变量（n=6）和多个因变量（n=1000）执行线性回归以下是一些示例数据，年龄、性别和教育程度代表我的兴趣自变量，testscore_u*是我的因变量 df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006,1007, 1008, 1009, 1010, 1011), age = as.numeric(c('56', '43','59','74','61'

我目前正在尝试运行一个循环，对多个自变量（n=6）和多个因变量（n=1000）执行线性回归

以下是一些示例数据，年龄、性别和教育程度代表我的兴趣自变量，testscore_u*是我的因变量

df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006,1007, 1008, 1009,   1010, 1011),
                    age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
                    sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
                    testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
                    testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
                    testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
                    education =  as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))

我的工作代码允许我为多个DV运行回归模型（我相信更有经验的R用户会因为缺乏效率而不喜欢它）：

y这是对我过去不得不使用的类似工作流的改编。记住要真正惩罚你自己，因为你跑了太多的车型。我在数据框中添加了两个预测器列。祝你好运
解决方案：
# Creating pedictor and outcome vectors
ivs_vec <- names(df)[c(2:6, 10)]
dvs_vec <- names(df)[7:9]

# Creating formulas and running the models
ivs <- paste0(" ~ ", ivs_vec)
dvs_ivs <- unlist(lapply(ivs, function(x) paste0(dvs_vec, x)))
formulas <- lapply(dvs_ivs, formula)

lm_results <- lapply(formulas, function(x) {
  lm(x, data = df)
})

# Creating / combining results
tidy_results <- lapply(lm_results, broom::tidy)
dv_list <- lapply(as.list(stringi::stri_extract_first_words(dvs_ivs)), rep, 2)
tidy_results <- Map(cbind, dv_list, tidy_results)

standardized_results <- lapply(lm_results, function(x) coef(lm.beta::lm.beta(x)))
combined_results <- Map(cbind, tidy_results, standardized_results)

# Cleaning up final results
names(combined_results) <- dvs_ivs
combined_results <- lapply(combined_results, function(x) {row.names(x) <- c(NULL); x})

new_names <- c("Outcome", "Term", "Estimate", "Std. Error", "Statistic", "P-value", "Standardized Estimate")
combined_results <- lapply(combined_results, setNames, new_names)

数据：
combined_results[1:5]

$`testscore_1 ~ age`
  Outcome        Term    Estimate Std. Error Statistic   P-value 
Standardized Estimate
1 testscore_1 (Intercept) 18.06027731 12.3493569 1.4624468 0.1776424            0.00000000
2 testscore_1         age  0.05835152  0.2031295 0.2872627 0.7804155            0.09531823

$`testscore_2 ~ age`
      Outcome        Term   Estimate Std. Error Statistic   P-value Standardized Estimate
1 testscore_2 (Intercept) 3.63788676 4.39014570 0.8286483 0.4287311             0.0000000
2 testscore_2         age 0.01367313 0.07221171 0.1893478 0.8540216             0.0629906

$`testscore_3 ~ age`
      Outcome        Term  Estimate Std. Error Statistic   P-value Standardized Estimate
1 testscore_3 (Intercept) 6.1215175   6.698083 0.9139208 0.3845886             0.0000000
2 testscore_3         age 0.1943125   0.110174 1.7636870 0.1116119             0.5068026

$`testscore_1 ~ sex`
      Outcome        Term Estimate Std. Error  Statistic      P-value Standardized Estimate
1 testscore_1 (Intercept)     22.5   3.099283  7.2597435 4.766069e-05             0.0000000
2 testscore_1         sex     -2.1   4.596980 -0.4568217 6.586248e-01            -0.1505386

$`testscore_2 ~ sex`
      Outcome        Term Estimate Std. Error Statistic     P-value Standardized Estimate
1 testscore_2 (Intercept) 3.666667   1.041129  3.521816 0.006496884             0.0000000
2 testscore_2         sex 1.733333   1.544245  1.122447 0.290723029             0.3504247

df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006,1007, 1008, 1009,   1010, 1011),
                     age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
                     sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
                     pred1 = sample(1:11, 11),
                     pred2 = sample(1:11, 11),
                     pred3 = sample(1:11, 11),
                     testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
                     testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
                     testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
                     education =  as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))

df一年后偶然发现了这一点，并记录了一个与@Andrew相同的tidyverse
解决方案数据
库（dplyr）
图书馆（purrr）
图书馆（tidyr）
图书馆（stringi）
#创建pedictor和结果向量
ivs_vec哪个依赖项估计标准误差统计p值标准估计
#>                                              
#>1 testscore_1截距18.112.3 1.46 0.178 0
#>2 testscore_1年龄0.0584 0.203 0.287 0.780 0.0953
#>3测试分数2截距3.64 4.39 0.829 0.429 0
#>4 testscore_2年龄0.0137 0.0722 0.189 0.854 0.0630
#>5 testscore_3截距6.12 6.70 0.914 0.385 0
#>6 testscore_3年龄0.194 0.110 1.76 0.112 0.507
#>7 testscore_1截距22.5 3.10 7.26 0.0000477 0
#>8测试分数1性别-2.10 4.60-0.457 0.659-0.151
#>9测试分数2截距3.67 1.04 3.52 0.00650 0
#>10测试分数2性别1.73 1.54 1.12 0.291 0.350
#>#…还有26行
这太有用了！！谢谢@Andrew！这真是太棒了！！！！非常感谢你的帮助，安德鲁！没有你的帮助是不可能做到的！再次感谢@Andrew。祝你有愉快的一天！嗨，安德鲁，再次感谢你今年早些时候在这方面的帮助。我想在我的代码中归功于你，你希望我怎么做？嗨，再次@Andrew！我想知道您是否对我在这里发布的其他问题有什么建议：它是基于您在这个答案中的代码。如果您有时间，我将非常感谢您的建议！：）
df <- data.frame(ID = c(1001, 1002, 1003, 1004, 1005, 1006,1007, 1008, 1009,   1010, 1011),
                     age = as.numeric(c('56', '43','59','74','61','62','69','80','40','55','58')),
                     sex = as.numeric(c('0','1','0','0','1','1','0','1','0','1','0')),
                     pred1 = sample(1:11, 11),
                     pred2 = sample(1:11, 11),
                     pred3 = sample(1:11, 11),
                     testscore_1 = as.numeric(c('23','28','30','15','7','18','29','27','14','22','24')),
                     testscore_2 = as.numeric(c('1','3','2','5','8','2','5','6','7','8','2')),
                     testscore_3 = as.numeric(c('18','20','19','15','20','23','19','25','10','14','12')),
                     education =  as.numeric(c('5','4','3','5','2', '1','4','4','3','5','2')))