R 多组内的滚动窗口回归

R 多组内的滚动窗口回归,r,rollapply,R,Rollapply,我正在尝试将滚动窗口回归模型应用于数据中的多个组。我的部分数据如下: gvkey year LC YTO 1 001004 1972 0.1919713 2.021182 2 001004 1973 0.2275895 2.029056 3 001004 1974 0.3341368 2.053517 4 001004 1975 0.3313518 2.090532 5 001004 1976 0.4005829 2.

我正在尝试将滚动窗口回归模型应用于数据中的多个组。我的部分数据如下:

     gvkey year          LC      YTO
1   001004 1972   0.1919713 2.021182
2   001004 1973   0.2275895 2.029056
3   001004 1974   0.3341368 2.053517
4   001004 1975   0.3313518 2.090532
5   001004 1976   0.4005829 2.136939
6   001004 1977   0.4471945 2.123909
7   001004 1978   0.4442004 2.150281
8   001004 1979   0.5054544 2.173162
9   001004 1980   0.5269449 2.188077
10  001004 1981   0.5423774 2.200805
11  001004 1982   0.3528982 2.200851
12  001004 1983   0.3674031 2.190487
13  001004 1984   0.2267620 2.181291
14  001004 1985   0.2796132 2.159443
15  001004 1986   0.3382120 2.128420
16  001004 1987   0.3214131 2.089670
17  001004 1988   0.3883732 2.048279
18  001004 1989   0.4466488 1.999539
19  001004 1990   0.4929991 1.955500
20  001004 1991   0.5150894 1.934893
21  001004 1992   0.5218845 1.925521
22  001004 1993   0.5038105 1.904241
23  001004 1994   0.5041639 1.881731
24  001004 1995   0.5196658 1.863143
25  001004 1996   0.5352994 1.844464
26  001004 1997   0.4556059 1.835676
27  001004 1998   0.4905767 1.837886
28  001004 1999   0.5471959 1.824636
29  001004 2000   0.5920976 1.814944
30  001004 2001   0.5998172 1.893943
31  001004 2002   0.4499911 1.889703
32  001004 2003   0.4207154 1.870703
33  001004 2004   0.4371594 1.831638
34  001004 2005   0.4525900 1.802684
35  001004 2006   0.4342149 1.781757
36  001004 2007   0.4899473 1.753360
37  001004 2008   0.5436673 1.680464
38  001004 2009   0.5873861 1.612499
39  001004 2010   0.5216734 1.544322
40  001004 2011   0.5592963 1.415892
41  001004 2012   0.5627509 1.407393
42  001004 2013   0.5904637 1.384202
43  001004 2014   0.6170085 1.353340
44  001004 2015   0.7145900 1.314014
45  001007 1975   0.3721916 2.090532
46  001007 1976   0.2760902 2.136939
47  001007 1977   0.1866554 2.123909
48  001007 1978   0.1977654 2.150281
49  001007 1979   0.1927100 2.173162
50  001007 1980   0.2112344 2.188077
51  001007 1981  -0.2141724 2.200805
52  001007 1982  -0.2072785 2.200851
53  001007 1983  -1.7406963 2.190487
54  001007 1984 -14.8071429 2.181291
55  001009 1982  -1.2753247 2.200851
56  001009 1983   1.3349904 2.190487
57  001009 1984   2.6192237 2.181291
58  001009 1985   0.5867925 2.159443
59  001009 1986   0.6959436 2.128420
60  001009 1987   0.7142857 2.089670
61  001009 1988   0.7771897 2.048279
62  001009 1989   0.8293820 1.999539
63  001009 1990   0.8655382 1.955500
64  001009 1991   0.8712144 1.934893
65  001009 1992   0.8882548 1.925521
66  001009 1993   0.9190540 1.904241
67  001009 1994   0.9411806 1.881731
68  001010 1971   0.6492499 2.002337
69  001010 1972   0.6667664 2.021182
70  001010 1973   0.6840115 2.029056
71  001010 1974   0.7011797 2.053517
72  001010 1975   0.7189469 2.090532
73  001010 1976   0.7367344 2.136939
74  001010 1977   0.7511779 2.123909
75  001010 1978   0.7673365 2.150281
76  001010 1979   0.7795880 2.173162
77  001010 1980   0.7824448 2.188077
78  001010 1981   0.7821913 2.200805
79  001010 1982   0.7646078 2.200851
80  001010 1983   0.7426172 2.190487
81  001010 1984  -0.0657935 2.181291
82  001010 1985   0.2802410 2.159443
83  001010 1986   0.2052373 2.128420
84  001010 1987   0.2465290 2.089670
85  001010 1988   0.3437856 2.048279
86  001010 1989   0.7398662 1.999539
87  001010 1990   0.6360582 1.955500
88  001010 1991   0.7790707 1.934893
89  001010 1992   0.7588472 1.925521
90  001010 1993   0.7695341 1.904241
91  001010 1994   0.8060759 1.881731
92  001010 1995   0.8381234 1.863143
93  001010 1996   0.8661541 1.844464
94  001010 1997   0.8700456 1.835676
95  001010 1998   0.8748443 1.837886
96  001010 1999   0.8884077 1.824636
97  001010 2000   0.8979903 1.814944
98  001010 2003   0.6812689 1.870703
99  001011 1983   0.3043007 2.190487
100 001011 1984   0.3080601 2.181291
我的职能是

Match.LC.YTO<-function(x){rollapplyr(x,width=10,by.column=F,fill=NA, FUN=function(m){
  temp.1<-lm(LC~YTO,data=m)
  summary(temp.1)$r.squared*(sign(summary(temp.1)$coefficients[2,1]))
})}

df<-df%>%group_by(gvkey)%>%mutate(MTCH=Match.LC.YTO(df))

我查看了许多其他关于函数rollappy和rollappyr的帖子,有些帖子建议我需要在使用rollappy函数之前将df转换为zoo或matrix,但它仍然不起作用

zoo中的rollapply将接受普通矩阵和数据帧参数。这不是问题所在。此代码存在以下问题:

  • 代码向
    lm
    传递一个矩阵,但
    lm
    接受一个数据帧

  • 代码尝试在最后一组中少于10行的对象上使用宽度为10的
    rollply

  • 如果截距完全匹配,则
    lm
    中不会有第二个系数,因此对
    系数[2,1]
    的引用将失败并产生错误

虽然没有错误,但以下是需要改进的方面:

  • TRUE
    FALSE
    应完整写出,因为
    T
    F
    是有效的变量名,因此非常容易出错

  • 在dplyr中使用
    group\u by
    时,始终将其与
    ungroup
    匹配。如果不这样做,那么输出将记住分组,下次使用输出时,您将得到一个惊喜。例如,考虑以下两个片段之间的差异。第一个结果是
    n
    是该行所属组中的元素数,而第二个结果是
    n
    out
    中的行数

    out <- df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO(LC, YTO))
    out %>% mutate(n = n())
    
    out <- df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO(LC, YTO)) %>% ungroup
    out %>% mutate(n = n())
    
    如果您希望使用
    fill=NA
    代替
    partial=TRUE
    ,则添加一项检查,检查序列长度是否小于序列宽度,即小于10:

    Match.LC.YTO2 <- function(LC, YTO) {
    
       lm_summary <- function(ix) {
          temp.1 <- lm(LC ~ YTO, subset = ix)
          summary(temp.1)$r.squared * sign(c(coef(temp.1), NA)[2])
       }
    
      if (length(LC) < 10) return(NA) ##
      rollapplyr(seq_along(LC), width = 10, FUN = lm_summary, fill = NA)
    }
    
    df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO2(LC, YTO)) %>% ungroup
    
    附注2
    检查末尾标有##的行中的长度不再是必要的,因为最新版本的zoo会自动进行此检查。

    @BWilliams我曾尝试将数据帧转换为zoo格式,但仍然不起作用。此外,我的错误消息与那条不同。@b尽管问题看起来可能相似,但这里的问题实际上不同。请看我的答案。非常感谢,您的代码与我的示例数据运行得非常好。然而,当我试图将其应用于我的数据时,我收到了错误消息“在mutate_impl(.data,dots)中出错:'y'中的NA/NaN/Inf”。我已经检查了我所有的Y观察结果,没有NA/NaN/Inf,所以我不确定是什么问题。
    library(dplyr)
    library(zoo)
    
    Match.LC.YTO <- function(LC, YTO) {
    
       lm_summary <- function(ix) {
          temp.1 <- lm(LC ~ YTO, subset = ix)
          summary(temp.1)$r.squared * sign(c(coef(temp.1), NA)[2])
       }
    
       rollapplyr(seq_along(LC), width = 10, FUN = lm_summary, partial = TRUE)
    }
    
    df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO(LC, YTO)) %>% ungroup
    
    Match.LC.YTO2 <- function(LC, YTO) {
    
       lm_summary <- function(ix) {
          temp.1 <- lm(LC ~ YTO, subset = ix)
          summary(temp.1)$r.squared * sign(c(coef(temp.1), NA)[2])
       }
    
      if (length(LC) < 10) return(NA) ##
      rollapplyr(seq_along(LC), width = 10, FUN = lm_summary, fill = NA)
    }
    
    df %>% group_by(gvkey) %>% mutate(MTCH = Match.LC.YTO2(LC, YTO)) %>% ungroup
    
    Lines <- "     gvkey year          LC      YTO
    1   001004 1972   0.1919713 2.021182
    2   001004 1973   0.2275895 2.029056
    3   001004 1974   0.3341368 2.053517
    4   001004 1975   0.3313518 2.090532
    5   001004 1976   0.4005829 2.136939
    6   001004 1977   0.4471945 2.123909
    7   001004 1978   0.4442004 2.150281
    8   001004 1979   0.5054544 2.173162
    9   001004 1980   0.5269449 2.188077
    10  001004 1981   0.5423774 2.200805
    11  001004 1982   0.3528982 2.200851
    12  001004 1983   0.3674031 2.190487
    13  001004 1984   0.2267620 2.181291
    14  001004 1985   0.2796132 2.159443
    15  001004 1986   0.3382120 2.128420
    16  001004 1987   0.3214131 2.089670
    17  001004 1988   0.3883732 2.048279
    18  001004 1989   0.4466488 1.999539
    19  001004 1990   0.4929991 1.955500
    20  001004 1991   0.5150894 1.934893
    21  001004 1992   0.5218845 1.925521
    22  001004 1993   0.5038105 1.904241
    23  001004 1994   0.5041639 1.881731
    24  001004 1995   0.5196658 1.863143
    25  001004 1996   0.5352994 1.844464
    26  001004 1997   0.4556059 1.835676
    27  001004 1998   0.4905767 1.837886
    28  001004 1999   0.5471959 1.824636
    29  001004 2000   0.5920976 1.814944
    30  001004 2001   0.5998172 1.893943
    31  001004 2002   0.4499911 1.889703
    32  001004 2003   0.4207154 1.870703
    33  001004 2004   0.4371594 1.831638
    34  001004 2005   0.4525900 1.802684
    35  001004 2006   0.4342149 1.781757
    36  001004 2007   0.4899473 1.753360
    37  001004 2008   0.5436673 1.680464
    38  001004 2009   0.5873861 1.612499
    39  001004 2010   0.5216734 1.544322
    40  001004 2011   0.5592963 1.415892
    41  001004 2012   0.5627509 1.407393
    42  001004 2013   0.5904637 1.384202
    43  001004 2014   0.6170085 1.353340
    44  001004 2015   0.7145900 1.314014
    45  001007 1975   0.3721916 2.090532
    46  001007 1976   0.2760902 2.136939
    47  001007 1977   0.1866554 2.123909
    48  001007 1978   0.1977654 2.150281
    49  001007 1979   0.1927100 2.173162
    50  001007 1980   0.2112344 2.188077
    51  001007 1981  -0.2141724 2.200805
    52  001007 1982  -0.2072785 2.200851
    53  001007 1983  -1.7406963 2.190487
    54  001007 1984 -14.8071429 2.181291
    55  001009 1982  -1.2753247 2.200851
    56  001009 1983   1.3349904 2.190487
    57  001009 1984   2.6192237 2.181291
    58  001009 1985   0.5867925 2.159443
    59  001009 1986   0.6959436 2.128420
    60  001009 1987   0.7142857 2.089670
    61  001009 1988   0.7771897 2.048279
    62  001009 1989   0.8293820 1.999539
    63  001009 1990   0.8655382 1.955500
    64  001009 1991   0.8712144 1.934893
    65  001009 1992   0.8882548 1.925521
    66  001009 1993   0.9190540 1.904241
    67  001009 1994   0.9411806 1.881731
    68  001010 1971   0.6492499 2.002337
    69  001010 1972   0.6667664 2.021182
    70  001010 1973   0.6840115 2.029056
    71  001010 1974   0.7011797 2.053517
    72  001010 1975   0.7189469 2.090532
    73  001010 1976   0.7367344 2.136939
    74  001010 1977   0.7511779 2.123909
    75  001010 1978   0.7673365 2.150281
    76  001010 1979   0.7795880 2.173162
    77  001010 1980   0.7824448 2.188077
    78  001010 1981   0.7821913 2.200805
    79  001010 1982   0.7646078 2.200851
    80  001010 1983   0.7426172 2.190487
    81  001010 1984  -0.0657935 2.181291
    82  001010 1985   0.2802410 2.159443
    83  001010 1986   0.2052373 2.128420
    84  001010 1987   0.2465290 2.089670
    85  001010 1988   0.3437856 2.048279
    86  001010 1989   0.7398662 1.999539
    87  001010 1990   0.6360582 1.955500
    88  001010 1991   0.7790707 1.934893
    89  001010 1992   0.7588472 1.925521
    90  001010 1993   0.7695341 1.904241
    91  001010 1994   0.8060759 1.881731
    92  001010 1995   0.8381234 1.863143
    93  001010 1996   0.8661541 1.844464
    94  001010 1997   0.8700456 1.835676
    95  001010 1998   0.8748443 1.837886
    96  001010 1999   0.8884077 1.824636
    97  001010 2000   0.8979903 1.814944
    98  001010 2003   0.6812689 1.870703
    99  001011 1983   0.3043007 2.190487
    100 001011 1984   0.3080601 2.181291"
    
    df <- read.table(text = Lines)