R 多组内的滚动窗口回归
我正在尝试将滚动窗口回归模型应用于数据中的多个组。我的部分数据如下:R 多组内的滚动窗口回归,r,rollapply,R,Rollapply,我正在尝试将滚动窗口回归模型应用于数据中的多个组。我的部分数据如下: gvkey year LC YTO 1 001004 1972 0.1919713 2.021182 2 001004 1973 0.2275895 2.029056 3 001004 1974 0.3341368 2.053517 4 001004 1975 0.3313518 2.090532 5 001004 1976 0.4005829 2.
gvkey year LC YTO
1 001004 1972 0.1919713 2.021182
2 001004 1973 0.2275895 2.029056
3 001004 1974 0.3341368 2.053517
4 001004 1975 0.3313518 2.090532
5 001004 1976 0.4005829 2.136939
6 001004 1977 0.4471945 2.123909
7 001004 1978 0.4442004 2.150281
8 001004 1979 0.5054544 2.173162
9 001004 1980 0.5269449 2.188077
10 001004 1981 0.5423774 2.200805
11 001004 1982 0.3528982 2.200851
12 001004 1983 0.3674031 2.190487
13 001004 1984 0.2267620 2.181291
14 001004 1985 0.2796132 2.159443
15 001004 1986 0.3382120 2.128420
16 001004 1987 0.3214131 2.089670
17 001004 1988 0.3883732 2.048279
18 001004 1989 0.4466488 1.999539
19 001004 1990 0.4929991 1.955500
20 001004 1991 0.5150894 1.934893
21 001004 1992 0.5218845 1.925521
22 001004 1993 0.5038105 1.904241
23 001004 1994 0.5041639 1.881731
24 001004 1995 0.5196658 1.863143
25 001004 1996 0.5352994 1.844464
26 001004 1997 0.4556059 1.835676
27 001004 1998 0.4905767 1.837886
28 001004 1999 0.5471959 1.824636
29 001004 2000 0.5920976 1.814944
30 001004 2001 0.5998172 1.893943
31 001004 2002 0.4499911 1.889703
32 001004 2003 0.4207154 1.870703
33 001004 2004 0.4371594 1.831638
34 001004 2005 0.4525900 1.802684
35 001004 2006 0.4342149 1.781757
36 001004 2007 0.4899473 1.753360
37 001004 2008 0.5436673 1.680464
38 001004 2009 0.5873861 1.612499
39 001004 2010 0.5216734 1.544322
40 001004 2011 0.5592963 1.415892
41 001004 2012 0.5627509 1.407393
42 001004 2013 0.5904637 1.384202
43 001004 2014 0.6170085 1.353340
44 001004 2015 0.7145900 1.314014
45 001007 1975 0.3721916 2.090532
46 001007 1976 0.2760902 2.136939
47 001007 1977 0.1866554 2.123909
48 001007 1978 0.1977654 2.150281
49 001007 1979 0.1927100 2.173162
50 001007 1980 0.2112344 2.188077
51 001007 1981 -0.2141724 2.200805
52 001007 1982 -0.2072785 2.200851
53 001007 1983 -1.7406963 2.190487
54 001007 1984 -14.8071429 2.181291
55 001009 1982 -1.2753247 2.200851
56 001009 1983 1.3349904 2.190487
57 001009 1984 2.6192237 2.181291
58 001009 1985 0.5867925 2.159443
59 001009 1986 0.6959436 2.128420
60 001009 1987 0.7142857 2.089670
61 001009 1988 0.7771897 2.048279
62 001009 1989 0.8293820 1.999539
63 001009 1990 0.8655382 1.955500
64 001009 1991 0.8712144 1.934893
65 001009 1992 0.8882548 1.925521
66 001009 1993 0.9190540 1.904241
67 001009 1994 0.9411806 1.881731
68 001010 1971 0.6492499 2.002337
69 001010 1972 0.6667664 2.021182
70 001010 1973 0.6840115 2.029056
71 001010 1974 0.7011797 2.053517
72 001010 1975 0.7189469 2.090532
73 001010 1976 0.7367344 2.136939
74 001010 1977 0.7511779 2.123909
75 001010 1978 0.7673365 2.150281
76 001010 1979 0.7795880 2.173162
77 001010 1980 0.7824448 2.188077
78 001010 1981 0.7821913 2.200805
79 001010 1982 0.7646078 2.200851
80 001010 1983 0.7426172 2.190487
81 001010 1984 -0.0657935 2.181291
82 001010 1985 0.2802410 2.159443
83 001010 1986 0.2052373 2.128420
84 001010 1987 0.2465290 2.089670
85 001010 1988 0.3437856 2.048279
86 001010 1989 0.7398662 1.999539
87 001010 1990 0.6360582 1.955500
88 001010 1991 0.7790707 1.934893
89 001010 1992 0.7588472 1.925521
90 001010 1993 0.7695341 1.904241
91 001010 1994 0.8060759 1.881731
92 001010 1995 0.8381234 1.863143
93 001010 1996 0.8661541 1.844464
94 001010 1997 0.8700456 1.835676
95 001010 1998 0.8748443 1.837886
96 001010 1999 0.8884077 1.824636
97 001010 2000 0.8979903 1.814944
98 001010 2003 0.6812689 1.870703
99 001011 1983 0.3043007 2.190487
100 001011 1984 0.3080601 2.181291
我的职能是
Match.LC.YTO<-function(x){rollapplyr(x,width=10,by.column=F,fill=NA, FUN=function(m){
temp.1<-lm(LC~YTO,data=m)
summary(temp.1)$r.squared*(sign(summary(temp.1)$coefficients[2,1]))
})}
df<-df%>%group_by(gvkey)%>%mutate(MTCH=Match.LC.YTO(df))
我查看了许多其他关于函数rollappy和rollappyr的帖子,有些帖子建议我需要在使用rollappy函数之前将df转换为zoo或matrix,但它仍然不起作用 zoo中的rollapply将接受普通矩阵和数据帧参数。这不是问题所在。此代码存在以下问题:
- 代码向
传递一个矩阵,但lm
接受一个数据帧lm
- 代码尝试在最后一组中少于10行的对象上使用宽度为10的
rollply
- 如果截距完全匹配,则
中不会有第二个系数,因此对lm
的引用将失败并产生错误系数[2,1]
和TRUE
应完整写出,因为FALSE
和T
是有效的变量名,因此非常容易出错F
- 在dplyr中使用
时,始终将其与group\u by
匹配。如果不这样做,那么输出将记住分组,下次使用输出时,您将得到一个惊喜。例如,考虑以下两个片段之间的差异。第一个结果是ungroup
是该行所属组中的元素数,而第二个结果是n
是n
中的行数out
如果您希望使用out <- df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO(LC, YTO)) out %>% mutate(n = n()) out <- df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO(LC, YTO)) %>% ungroup out %>% mutate(n = n())
代替fill=NA
,则添加一项检查,检查序列长度是否小于序列宽度,即小于10:partial=TRUE
附注2Match.LC.YTO2 <- function(LC, YTO) { lm_summary <- function(ix) { temp.1 <- lm(LC ~ YTO, subset = ix) summary(temp.1)$r.squared * sign(c(coef(temp.1), NA)[2]) } if (length(LC) < 10) return(NA) ## rollapplyr(seq_along(LC), width = 10, FUN = lm_summary, fill = NA) } df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO2(LC, YTO)) %>% ungroup
检查末尾标有##的行中的长度不再是必要的,因为最新版本的zoo会自动进行此检查。@BWilliams我曾尝试将数据帧转换为zoo格式,但仍然不起作用。此外,我的错误消息与那条不同。@b尽管问题看起来可能相似,但这里的问题实际上不同。请看我的答案。非常感谢,您的代码与我的示例数据运行得非常好。然而,当我试图将其应用于我的数据时,我收到了错误消息“在mutate_impl(.data,dots)中出错:'y'中的NA/NaN/Inf”。我已经检查了我所有的Y观察结果,没有NA/NaN/Inf,所以我不确定是什么问题。library(dplyr) library(zoo) Match.LC.YTO <- function(LC, YTO) { lm_summary <- function(ix) { temp.1 <- lm(LC ~ YTO, subset = ix) summary(temp.1)$r.squared * sign(c(coef(temp.1), NA)[2]) } rollapplyr(seq_along(LC), width = 10, FUN = lm_summary, partial = TRUE) } df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO(LC, YTO)) %>% ungroup
Match.LC.YTO2 <- function(LC, YTO) { lm_summary <- function(ix) { temp.1 <- lm(LC ~ YTO, subset = ix) summary(temp.1)$r.squared * sign(c(coef(temp.1), NA)[2]) } if (length(LC) < 10) return(NA) ## rollapplyr(seq_along(LC), width = 10, FUN = lm_summary, fill = NA) } df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO2(LC, YTO)) %>% ungroup
Lines <- " gvkey year LC YTO 1 001004 1972 0.1919713 2.021182 2 001004 1973 0.2275895 2.029056 3 001004 1974 0.3341368 2.053517 4 001004 1975 0.3313518 2.090532 5 001004 1976 0.4005829 2.136939 6 001004 1977 0.4471945 2.123909 7 001004 1978 0.4442004 2.150281 8 001004 1979 0.5054544 2.173162 9 001004 1980 0.5269449 2.188077 10 001004 1981 0.5423774 2.200805 11 001004 1982 0.3528982 2.200851 12 001004 1983 0.3674031 2.190487 13 001004 1984 0.2267620 2.181291 14 001004 1985 0.2796132 2.159443 15 001004 1986 0.3382120 2.128420 16 001004 1987 0.3214131 2.089670 17 001004 1988 0.3883732 2.048279 18 001004 1989 0.4466488 1.999539 19 001004 1990 0.4929991 1.955500 20 001004 1991 0.5150894 1.934893 21 001004 1992 0.5218845 1.925521 22 001004 1993 0.5038105 1.904241 23 001004 1994 0.5041639 1.881731 24 001004 1995 0.5196658 1.863143 25 001004 1996 0.5352994 1.844464 26 001004 1997 0.4556059 1.835676 27 001004 1998 0.4905767 1.837886 28 001004 1999 0.5471959 1.824636 29 001004 2000 0.5920976 1.814944 30 001004 2001 0.5998172 1.893943 31 001004 2002 0.4499911 1.889703 32 001004 2003 0.4207154 1.870703 33 001004 2004 0.4371594 1.831638 34 001004 2005 0.4525900 1.802684 35 001004 2006 0.4342149 1.781757 36 001004 2007 0.4899473 1.753360 37 001004 2008 0.5436673 1.680464 38 001004 2009 0.5873861 1.612499 39 001004 2010 0.5216734 1.544322 40 001004 2011 0.5592963 1.415892 41 001004 2012 0.5627509 1.407393 42 001004 2013 0.5904637 1.384202 43 001004 2014 0.6170085 1.353340 44 001004 2015 0.7145900 1.314014 45 001007 1975 0.3721916 2.090532 46 001007 1976 0.2760902 2.136939 47 001007 1977 0.1866554 2.123909 48 001007 1978 0.1977654 2.150281 49 001007 1979 0.1927100 2.173162 50 001007 1980 0.2112344 2.188077 51 001007 1981 -0.2141724 2.200805 52 001007 1982 -0.2072785 2.200851 53 001007 1983 -1.7406963 2.190487 54 001007 1984 -14.8071429 2.181291 55 001009 1982 -1.2753247 2.200851 56 001009 1983 1.3349904 2.190487 57 001009 1984 2.6192237 2.181291 58 001009 1985 0.5867925 2.159443 59 001009 1986 0.6959436 2.128420 60 001009 1987 0.7142857 2.089670 61 001009 1988 0.7771897 2.048279 62 001009 1989 0.8293820 1.999539 63 001009 1990 0.8655382 1.955500 64 001009 1991 0.8712144 1.934893 65 001009 1992 0.8882548 1.925521 66 001009 1993 0.9190540 1.904241 67 001009 1994 0.9411806 1.881731 68 001010 1971 0.6492499 2.002337 69 001010 1972 0.6667664 2.021182 70 001010 1973 0.6840115 2.029056 71 001010 1974 0.7011797 2.053517 72 001010 1975 0.7189469 2.090532 73 001010 1976 0.7367344 2.136939 74 001010 1977 0.7511779 2.123909 75 001010 1978 0.7673365 2.150281 76 001010 1979 0.7795880 2.173162 77 001010 1980 0.7824448 2.188077 78 001010 1981 0.7821913 2.200805 79 001010 1982 0.7646078 2.200851 80 001010 1983 0.7426172 2.190487 81 001010 1984 -0.0657935 2.181291 82 001010 1985 0.2802410 2.159443 83 001010 1986 0.2052373 2.128420 84 001010 1987 0.2465290 2.089670 85 001010 1988 0.3437856 2.048279 86 001010 1989 0.7398662 1.999539 87 001010 1990 0.6360582 1.955500 88 001010 1991 0.7790707 1.934893 89 001010 1992 0.7588472 1.925521 90 001010 1993 0.7695341 1.904241 91 001010 1994 0.8060759 1.881731 92 001010 1995 0.8381234 1.863143 93 001010 1996 0.8661541 1.844464 94 001010 1997 0.8700456 1.835676 95 001010 1998 0.8748443 1.837886 96 001010 1999 0.8884077 1.824636 97 001010 2000 0.8979903 1.814944 98 001010 2003 0.6812689 1.870703 99 001011 1983 0.3043007 2.190487 100 001011 1984 0.3080601 2.181291" df <- read.table(text = Lines)