R中的线性回归环
我需要多个股票的贝塔系数和剩余方差。我的问题是,如何创建多元线性回归的循环,并将上述系数提取到输出中 这是我的数据,MR是我的自变量,其余的列是因变量,我必须对每个变量分别进行线性回归 多谢各位 //编辑:R中的线性回归环,r,R,我需要多个股票的贝塔系数和剩余方差。我的问题是,如何创建多元线性回归的循环,并将上述系数提取到输出中 这是我的数据,MR是我的自变量,其余的列是因变量,我必须对每个变量分别进行线性回归 多谢各位 //编辑: > dput(head(Beta_market_model_test)) structure(list(...1 = structure(c(1422748800, 1425168000, 1427846400, 1430438400, 1433116800, 1435708800
> dput(head(Beta_market_model_test))
structure(list(...1 = structure(c(1422748800, 1425168000, 1427846400,
1430438400, 1433116800, 1435708800), tzone = "UTC", class = c("POSIXct",
"POSIXt")), R1 = c(-0.0225553678146582, 0.084773882172773, -0.00628335525823254,
0.189767902403849, -0.129765571642446, -0.02268699227135), R2 = c(-0.000634819869861802,
0.0566396021070485, 0.0504313735522286, -0.0275926732076482,
0.0473125483284236, -0.0501700832780339), R3 = c(-0.0607564272876455,
0.0915928283206455, -0.116429377153136, 0.0338313435925748, -0.0731748018356279,
-0.082292041771696), R4 = c(0.036716647443291, 0.0409790469126645,
-0.0594941218382615, 0.0477272727272728, 0.0115690527838033,
-0.0187634024303074), R5 = c(0.00286365940192601, 0.0128875748616479,
0.000174637626924046, 0.0238214018458469, 0.0120599342185406,
-0.0627587867116033), R6 = c(-0.0944601447872712, 0.090838356632893,
-0.0577132600192821, 0.136928528648433, -0.0137770071043408,
0.0214549609033041), MR = c(-0.0388483879770769, 0.0858362570727453,
-0.0178553084990147, 0.0567646974926548, -0.0391124787432181,
-0.014626289866472)), row.names = c(NA, -6L), class = c("tbl_df",
"tbl", "data.frame"))
我们可以使用
cbind
来指定lm
model <- lm(cbind(R1, R2, R3, R4, R5, R6) ~ MR, data = df1)
s1 <- summary(model)
检查摘要
summary(model)
Response R1 :
Call:
lm(formula = R1 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.03757 -0.06851 0.01791 0.08624 -0.06919 -0.00402
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.006368 0.028060 0.227 0.8316
MR 1.711625 0.577571 2.963 0.0414 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.06831 on 4 degrees of freedom
Multiple R-squared: 0.6871, Adjusted R-squared: 0.6088
F-statistic: 8.782 on 1 and 4 DF, p-value: 0.04141
Response R2 :
Call:
lm(formula = R2 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
-0.01047 0.03882 0.03925 -0.04355 0.03750 -0.06155
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.01232 0.02079 0.593 0.585
MR 0.06402 0.42797 0.150 0.888
Residual standard error: 0.05062 on 4 degrees of freedom
Multiple R-squared: 0.005564, Adjusted R-squared: -0.243
F-statistic: 0.02238 on 1 and 4 DF, p-value: 0.8883
Response R3 :
Call:
lm(formula = R3 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.035081 0.014541 -0.049701 -0.002909 0.023029 -0.020041
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.04197 0.01431 -2.934 0.04266 *
MR 1.38661 0.29449 4.709 0.00925 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03483 on 4 degrees of freedom
Multiple R-squared: 0.8472, Adjusted R-squared: 0.8089
F-statistic: 22.17 on 1 and 4 DF, p-value: 0.009249
Response R4 :
Call:
lm(formula = R4 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.0438966 0.0002996 -0.0603723 0.0182067 0.0188503 -0.0208810
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.007732 0.016804 0.46 0.669
MR 0.383843 0.345886 1.11 0.329
Residual standard error: 0.04091 on 4 degrees of freedom
Multiple R-squared: 0.2354, Adjusted R-squared: 0.04425
F-statistic: 1.232 on 1 and 4 DF, p-value: 0.3293
Response R5 :
Call:
lm(formula = R5 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.013692 -0.001676 0.006728 0.015178 0.022942 -0.056863
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.002917 0.013351 -0.218 0.838
MR 0.203653 0.274801 0.741 0.500
Residual standard error: 0.0325 on 4 degrees of freedom
Multiple R-squared: 0.1207, Adjusted R-squared: -0.09909
F-statistic: 0.5492 on 1 and 4 DF, p-value: 0.4998
Response R6 :
Call:
lm(formula = R6 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
-0.04498 -0.03837 -0.03832 0.04938 0.03608 0.03622
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.006197 0.020555 0.302 0.7781
MR 1.433135 0.423083 3.387 0.0276 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.05004 on 4 degrees of freedom
Multiple R-squared: 0.7415, Adjusted R-squared: 0.6769
F-statistic: 11.47 on 1 and 4 DF, p-value: 0.0276
我们可以通过
tridy
从broom
library(purrr)
library(broom)
map_dfr(summary(model), tidy, .id = 'dep_var')
# A tibble: 12 x 6
# dep_var term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Response R1 (Intercept) 0.00637 0.0281 0.227 0.832
# 2 Response R1 MR 1.71 0.578 2.96 0.0414
# 3 Response R2 (Intercept) 0.0123 0.0208 0.593 0.585
# 4 Response R2 MR 0.0640 0.428 0.150 0.888
# 5 Response R3 (Intercept) -0.0420 0.0143 -2.93 0.0427
# 6 Response R3 MR 1.39 0.294 4.71 0.00925
# 7 Response R4 (Intercept) 0.00773 0.0168 0.460 0.669
# 8 Response R4 MR 0.384 0.346 1.11 0.329
# 9 Response R5 (Intercept) -0.00292 0.0134 -0.218 0.838
#10 Response R5 MR 0.204 0.275 0.741 0.500
#11 Response R6 (Intercept) 0.00620 0.0206 0.302 0.778
#12 Response R6 MR 1.43 0.423 3.39 0.0276
我们可以使用
cbind
来指定lm
model <- lm(cbind(R1, R2, R3, R4, R5, R6) ~ MR, data = df1)
s1 <- summary(model)
检查摘要
summary(model)
Response R1 :
Call:
lm(formula = R1 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.03757 -0.06851 0.01791 0.08624 -0.06919 -0.00402
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.006368 0.028060 0.227 0.8316
MR 1.711625 0.577571 2.963 0.0414 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.06831 on 4 degrees of freedom
Multiple R-squared: 0.6871, Adjusted R-squared: 0.6088
F-statistic: 8.782 on 1 and 4 DF, p-value: 0.04141
Response R2 :
Call:
lm(formula = R2 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
-0.01047 0.03882 0.03925 -0.04355 0.03750 -0.06155
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.01232 0.02079 0.593 0.585
MR 0.06402 0.42797 0.150 0.888
Residual standard error: 0.05062 on 4 degrees of freedom
Multiple R-squared: 0.005564, Adjusted R-squared: -0.243
F-statistic: 0.02238 on 1 and 4 DF, p-value: 0.8883
Response R3 :
Call:
lm(formula = R3 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.035081 0.014541 -0.049701 -0.002909 0.023029 -0.020041
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.04197 0.01431 -2.934 0.04266 *
MR 1.38661 0.29449 4.709 0.00925 **
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.03483 on 4 degrees of freedom
Multiple R-squared: 0.8472, Adjusted R-squared: 0.8089
F-statistic: 22.17 on 1 and 4 DF, p-value: 0.009249
Response R4 :
Call:
lm(formula = R4 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.0438966 0.0002996 -0.0603723 0.0182067 0.0188503 -0.0208810
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.007732 0.016804 0.46 0.669
MR 0.383843 0.345886 1.11 0.329
Residual standard error: 0.04091 on 4 degrees of freedom
Multiple R-squared: 0.2354, Adjusted R-squared: 0.04425
F-statistic: 1.232 on 1 and 4 DF, p-value: 0.3293
Response R5 :
Call:
lm(formula = R5 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
0.013692 -0.001676 0.006728 0.015178 0.022942 -0.056863
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.002917 0.013351 -0.218 0.838
MR 0.203653 0.274801 0.741 0.500
Residual standard error: 0.0325 on 4 degrees of freedom
Multiple R-squared: 0.1207, Adjusted R-squared: -0.09909
F-statistic: 0.5492 on 1 and 4 DF, p-value: 0.4998
Response R6 :
Call:
lm(formula = R6 ~ MR, data = Beta_market_model_test)
Residuals:
1 2 3 4 5 6
-0.04498 -0.03837 -0.03832 0.04938 0.03608 0.03622
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.006197 0.020555 0.302 0.7781
MR 1.433135 0.423083 3.387 0.0276 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.05004 on 4 degrees of freedom
Multiple R-squared: 0.7415, Adjusted R-squared: 0.6769
F-statistic: 11.47 on 1 and 4 DF, p-value: 0.0276
我们可以通过
tridy
从broom
library(purrr)
library(broom)
map_dfr(summary(model), tidy, .id = 'dep_var')
# A tibble: 12 x 6
# dep_var term estimate std.error statistic p.value
# <chr> <chr> <dbl> <dbl> <dbl> <dbl>
# 1 Response R1 (Intercept) 0.00637 0.0281 0.227 0.832
# 2 Response R1 MR 1.71 0.578 2.96 0.0414
# 3 Response R2 (Intercept) 0.0123 0.0208 0.593 0.585
# 4 Response R2 MR 0.0640 0.428 0.150 0.888
# 5 Response R3 (Intercept) -0.0420 0.0143 -2.93 0.0427
# 6 Response R3 MR 1.39 0.294 4.71 0.00925
# 7 Response R4 (Intercept) 0.00773 0.0168 0.460 0.669
# 8 Response R4 MR 0.384 0.346 1.11 0.329
# 9 Response R5 (Intercept) -0.00292 0.0134 -0.218 0.838
#10 Response R5 MR 0.204 0.275 0.741 0.500
#11 Response R6 (Intercept) 0.00620 0.0206 0.302 0.778
#12 Response R6 MR 1.43 0.423 3.39 0.0276
我只是想问一个关于我的代码的问题:
library(dplyr)
library(tidyr)
library(broom)
df %>%
select(-...1) %>%
pivot_longer(R1:R6) %>%
group_by(name) %>%
nest(data = c(MR, value)) %>%
mutate(model = map(data, ~ lm(MR ~ value, data = .)),
glance = map(model, ~ glance(.x))) %>%
unnest(glance) %>%
select(- c(data, model))
# A tibble: 6 x 13
# Groups: name [6]
name r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 R1 0.687 0.609 0.0331 8.78 0.0414 1 13.2 -20.3 -20.9 0.00438
2 R2 0.00556 -0.243 0.0590 0.0224 0.888 1 9.69 -13.4 -14.0 0.0139
3 R3 0.847 0.809 0.0231 22.2 0.00925 1 15.3 -24.6 -25.2 0.00214
4 R4 0.235 0.0443 0.0517 1.23 0.329 1 10.5 -15.0 -15.6 0.0107
5 R5 0.121 -0.0991 0.0555 0.549 0.500 1 10.1 -14.1 -14.7 0.0123
6 R6 0.742 0.677 0.0301 11.5 0.0276 1 13.7 -21.5 -22.1 0.00362
# ... with 2 more variables: df.residual <int>, nobs <int>
库(dplyr)
图书馆(tidyr)
图书馆(扫帚)
df%>%
选择(-…1)%>%
枢轴长度(R1:R6)%>%
分组单位(名称)%>%
嵌套(数据=c(MR,值))%>%
突变(模型=映射(数据,~lm(MR~值,数据=),
浏览=地图(型号,~glance(.x)))%>%
最新(浏览)%>%
选择(-c(数据、模型))
#一个tibble:6x13
#分组:名称[6]
名称r.平方调整r.平方西格玛统计p.值df logLik AIC BIC偏差
1 R1 0.687 0.609 0.0331 8.78 0.0414 1 13.2-20.3-20.9 0.00438
2 R2 0.00556-0.243 0.0590 0.0224 0.888 1 9.69-13.4-14.0 0 0.0139
3 R3 0.847 0.809 0.0231 22.2 0.00925 1 15.3-24.6-25.2 0.00214
4 R4 0.235 0.0443 0.0517 1.23 0.329 1 10.5-15.0-15.6 0.0107
5 R50.121-0.0991 0.0555 0.549 0.500 110.1-14.1-14.7 0.0123
6 R6 0.742 0.677 0.03011.5 0.0276 1 13.7-21.5-22.1 0.00362
# ... 还有两个变量:df.残差,nobs
更新
感谢我亲爱的朋友@akrun,他总是给我提供有价值的建议
如果您希望避免数据透视,因为数据透视可能会将行数增加到超出限制的程度,您也可以使用以下代码:
library(dplyr)
library(tidyr)
library(broom)
df %>%
select(-1) %>%
summarise(across(-MR, ~ list(lm(reformulate('MR', response = cur_column()),
data = df) %>%
summary))) %>%
unclass %>%
map_dfr(~ tidy(.x[[1]]))
# A tibble: 12 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.00637 0.0281 0.227 0.832
2 MR 1.71 0.578 2.96 0.0414
3 (Intercept) 0.0123 0.0208 0.593 0.585
4 MR 0.0640 0.428 0.150 0.888
5 (Intercept) -0.0420 0.0143 -2.93 0.0427
6 MR 1.39 0.294 4.71 0.00925
7 (Intercept) 0.00773 0.0168 0.460 0.669
8 MR 0.384 0.346 1.11 0.329
9 (Intercept) -0.00292 0.0134 -0.218 0.838
10 MR 0.204 0.275 0.741 0.500
11 (Intercept) 0.00620 0.0206 0.302 0.778
12 MR 1.43 0.423 3.39 0.0276
库(dplyr)
图书馆(tidyr)
图书馆(扫帚)
df%>%
选择(-1)%>%
总结(跨(-MR,~list)(lm(重新格式化('MR',response=cur_column()),
数据=df)%>%
(摘要))%>%
取消分类%>%
地图_-dfr(~tidy(.x[[1]]))
#一个tibble:12x5
术语估计标准误差统计p值
1(截距)0.00637 0.0281 0.227 0.832
2 MR 1.71 0.578 2.96 0.0414
3(截距)0.0123 0.0208 0.593 0.585
4 MR 0.0640 0.428 0.150 0.888
5(截距)-0.0420 0.0143-2.93 0.0427
6 MR 1.39 0.294 4.71 0.00925
7(截距)0.00773 0.0168 0.460 0.669
8 MR 0.384 0.346 1.11 0.329
9(截距)-0.00292 0.0134-0.218 0.838
10 MR 0.204 0.275 0.741 0.500
11(截距)0.00620 0.0206 0.302 0.778
12 MR 1.43 0.423 3.39 0.0276
我发这个帖子只是想问一个关于我的代码的问题:
library(dplyr)
library(tidyr)
library(broom)
df %>%
select(-...1) %>%
pivot_longer(R1:R6) %>%
group_by(name) %>%
nest(data = c(MR, value)) %>%
mutate(model = map(data, ~ lm(MR ~ value, data = .)),
glance = map(model, ~ glance(.x))) %>%
unnest(glance) %>%
select(- c(data, model))
# A tibble: 6 x 13
# Groups: name [6]
name r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 R1 0.687 0.609 0.0331 8.78 0.0414 1 13.2 -20.3 -20.9 0.00438
2 R2 0.00556 -0.243 0.0590 0.0224 0.888 1 9.69 -13.4 -14.0 0.0139
3 R3 0.847 0.809 0.0231 22.2 0.00925 1 15.3 -24.6 -25.2 0.00214
4 R4 0.235 0.0443 0.0517 1.23 0.329 1 10.5 -15.0 -15.6 0.0107
5 R5 0.121 -0.0991 0.0555 0.549 0.500 1 10.1 -14.1 -14.7 0.0123
6 R6 0.742 0.677 0.0301 11.5 0.0276 1 13.7 -21.5 -22.1 0.00362
# ... with 2 more variables: df.residual <int>, nobs <int>
库(dplyr)
图书馆(tidyr)
图书馆(扫帚)
df%>%
选择(-…1)%>%
枢轴长度(R1:R6)%>%
分组单位(名称)%>%
嵌套(数据=c(MR,值))%>%
突变(模型=映射(数据,~lm(MR~值,数据=),
浏览=地图(型号,~glance(.x)))%>%
最新(浏览)%>%
选择(-c(数据、模型))
#一个tibble:6x13
#分组:名称[6]
名称r.平方调整r.平方西格玛统计p.值df logLik AIC BIC偏差
1 R1 0.687 0.609 0.0331 8.78 0.0414 1 13.2-20.3-20.9 0.00438
2 R2 0.00556-0.243 0.0590 0.0224 0.888 1 9.69-13.4-14.0 0 0.0139
3 R3 0.847 0.809 0.0231 22.2 0.00925 1 15.3-24.6-25.2 0.00214
4 R4 0.235 0.0443 0.0517 1.23 0.329 1 10.5-15.0-15.6 0.0107
5 R50.121-0.0991 0.0555 0.549 0.500 110.1-14.1-14.7 0.0123
6 R6 0.742 0.677 0.03011.5 0.0276 1 13.7-21.5-22.1 0.00362
# ... 还有两个变量:df.残差,nobs
更新
感谢我亲爱的朋友@akrun,他总是给我提供有价值的建议
如果您希望避免数据透视,因为数据透视可能会将行数增加到超出限制的程度,您也可以使用以下代码:
library(dplyr)
library(tidyr)
library(broom)
df %>%
select(-1) %>%
summarise(across(-MR, ~ list(lm(reformulate('MR', response = cur_column()),
data = df) %>%
summary))) %>%
unclass %>%
map_dfr(~ tidy(.x[[1]]))
# A tibble: 12 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 0.00637 0.0281 0.227 0.832
2 MR 1.71 0.578 2.96 0.0414
3 (Intercept) 0.0123 0.0208 0.593 0.585
4 MR 0.0640 0.428 0.150 0.888
5 (Intercept) -0.0420 0.0143 -2.93 0.0427
6 MR 1.39 0.294 4.71 0.00925
7 (Intercept) 0.00773 0.0168 0.460 0.669
8 MR 0.384 0.346 1.11 0.329
9 (Intercept) -0.00292 0.0134 -0.218 0.838
10 MR 0.204 0.275 0.741 0.500
11 (Intercept) 0.00620 0.0206 0.302 0.778
12 MR 1.43 0.423 3.39 0.0276
库(dplyr)
图书馆(tidyr)
图书馆(扫帚)
df%>%
选择(-1)%>%
总结(跨(-MR,~list)(lm(重新格式化('MR',response=cur_column()),
数据=df)%>%
(摘要))%>%
取消分类%>%
地图_-dfr(~tidy(.x[[1]]))
#一个tibble:12x5
术语估计标准误差统计p值
1(截距)0.00637 0.0281 0.227 0.832
2 MR 1.71 0.578 2.96 0.0414
3(截距)0.0123 0.0208 0.593 0.585
4 MR 0.0640 0.428 0.150 0.888
5(截距)-0.0420 0.0143-2.93 0.0427
6 MR 1.39 0.294 4.71 0.00925
7(截距)0.00773 0.0168 0.460 0.669
8 MR 0.384 0.346 1.11 0.329
9(截距)-0.00292 0.0134-0.218 0.838
10 MR 0.204 0.275 0.741 0.500
11(截距)0.00620 0.0206 0.302 0.778
12 MR 1.43 0.423 3.39 0