R 带条件的正向填充数据

R 带条件的正向填充数据,r,na,R,Na,我有一个数据帧,DF,看起来像这样: date permno ret sue bm gpa 1 202001 10000 0.01 0.4 0.4 NA 2 202002 10000 0.04 NA NA 0.5 3 202003 10000 -0.01 NA NA NA 4 202004 10000 0.00 1.3 0.5 NA 5 202005 10000 0.02 NA NA 0.3 6 202006 10000 0.01

我有一个数据帧,DF,看起来像这样:

    date permno   ret sue  bm gpa
1  202001  10000  0.01 0.4 0.4  NA
2  202002  10000  0.04  NA  NA 0.5
3  202003  10000 -0.01  NA  NA  NA
4  202004  10000  0.00 1.3 0.5  NA
5  202005  10000  0.02  NA  NA 0.3
6  202006  10000  0.01  NA  NA  NA
7  202007  10000  0.03  NA  NA  NA
8  202008  10000 -0.02  NA  NA 0.4
9  202001  11000  0.05 0.1 0.3  NA
10 202002  11000  0.02  NA  NA  NA
11 202003  11000  0.01  NA  NA  NA
12 202004  11000  0.00  NA  NA 0.3
13 202005  11000  0.01  NA  NA  NA
14 202006  11000 -0.01  NA  NA  NA
15 202007  11000  0.04 0.5 0.4  NA
16 202008  11000  0.30  NA  NA  NA
    date permno   ret sue  bm gpa
1  202001  10000  0.01 0.4 0.4  NA
2  202002  10000  0.04 0.4 0.4 0.5
3  202003  10000 -0.01 0.4 0.4 0.5
4  202004  10000  0.00 1.3 0.5 0.5
5  202005  10000  0.02 1.3 0.5 0.3
6  202006  10000  0.01 1.3 0.5 0.3
7  202007  10000  0.03 1.3 0.5 0.3
8  202008  10000 -0.02  NA  NA 0.4
9  202001  11000  0.05 0.1 0.3  NA
10 202002  11000  0.02 0.1 0.3  NA
11 202003  11000  0.01 0.1 0.3  NA
12 202004  11000  0.00 0.1 0.3 0.3
13 202005  11000  0.01  NA  NA 0.3
14 202006  11000 -0.01  NA  NA 0.3
15 202007  11000  0.04 0.5 0.4 0.3
16 202008  11000  0.30 0.5 0.4  NA
我使用此代码正向填充变量sue、bm和gpa:

DF1 <- 
  DF %>%
  arrange(permno,date) %>%
  group_by(permno) %>%
  mutate_at(vars(c(sue,bm,gpa)), funs(na.locf(.,na.rm=FALSE)))
我想对数据提前填充的月份设置一个限制。我想将这三个变量向前填充,直到下一个可用值,但最多3个月。因此,结果应如下所示:

    date permno   ret sue  bm gpa
1  202001  10000  0.01 0.4 0.4  NA
2  202002  10000  0.04  NA  NA 0.5
3  202003  10000 -0.01  NA  NA  NA
4  202004  10000  0.00 1.3 0.5  NA
5  202005  10000  0.02  NA  NA 0.3
6  202006  10000  0.01  NA  NA  NA
7  202007  10000  0.03  NA  NA  NA
8  202008  10000 -0.02  NA  NA 0.4
9  202001  11000  0.05 0.1 0.3  NA
10 202002  11000  0.02  NA  NA  NA
11 202003  11000  0.01  NA  NA  NA
12 202004  11000  0.00  NA  NA 0.3
13 202005  11000  0.01  NA  NA  NA
14 202006  11000 -0.01  NA  NA  NA
15 202007  11000  0.04 0.5 0.4  NA
16 202008  11000  0.30  NA  NA  NA
    date permno   ret sue  bm gpa
1  202001  10000  0.01 0.4 0.4  NA
2  202002  10000  0.04 0.4 0.4 0.5
3  202003  10000 -0.01 0.4 0.4 0.5
4  202004  10000  0.00 1.3 0.5 0.5
5  202005  10000  0.02 1.3 0.5 0.3
6  202006  10000  0.01 1.3 0.5 0.3
7  202007  10000  0.03 1.3 0.5 0.3
8  202008  10000 -0.02  NA  NA 0.4
9  202001  11000  0.05 0.1 0.3  NA
10 202002  11000  0.02 0.1 0.3  NA
11 202003  11000  0.01 0.1 0.3  NA
12 202004  11000  0.00 0.1 0.3 0.3
13 202005  11000  0.01  NA  NA 0.3
14 202006  11000 -0.01  NA  NA 0.3
15 202007  11000  0.04 0.5 0.4 0.3
16 202008  11000  0.30 0.5 0.4  NA

有人知道我如何在R中做到这一点吗?

这听起来像是一个滚动窗口的东西。但是,由于需要限制结转,一个问题是当您查看特定单元格时,它的前一个单元格已经被修复(un-
NA
'd),因此我们需要查看
rev
中的向量

助手函数,其中
2:4
基于您不超过三个月的偏好。在反向
rollappy
的情况下,
z[1]
在这种情况下很可能是
NA
,其中
z[2:4]
是前三个月


func我们可以编写自己的
na.locf()
,允许您进行所需的调整:

代码

library(zoo)
library(dplyr)

na.locf2 <- function(object, period = 3, ...){
  # consecutive NAs
  tmp1 <- rle(is.na(object))
  
  # NA count in the length of the vector 
  tmp2 <- unlist(sapply(tmp1[[1]] , function(x){
          1:x
          }))
  
  # remove all NAs
  tmp3 <- na.locf(object, ...)
  
  # reassign those that are greater than the desired period
  tmp3[tmp2 > period] <- NA
  
  # return 
  tmp3
}

# Then 

DF %>%
  arrange(permno,date) %>%
  group_by(permno) %>%
  mutate_at(vars(c(sue,bm,gpa)), funs(na.locf2(.,na.rm=FALSE)))

# Yields

# A tibble: 16 x 6
# Groups:   permno [2]
#     date permno   ret   sue    bm   gpa
#    <int>  <int> <dbl> <dbl> <dbl> <dbl>
#  1 202001  10000  0.01   0.4   0.4  NA  
#  2 202002  10000  0.04   0.4   0.4   0.5
#  3 202003  10000 -0.01   0.4   0.4   0.5
#  4 202004  10000  0      1.3   0.5   0.5
#  5 202005  10000  0.02   1.3   0.5   0.3
#  6 202006  10000  0.01   1.3   0.5   0.3
#  7 202007  10000  0.03   1.3   0.5   0.3
#  8 202008  10000 -0.02  NA    NA     0.4
#  9 202001  11000  0.05   0.1   0.3  NA  
# 10 202002  11000  0.02   0.1   0.3  NA  
# 11 202003  11000  0.01   0.1   0.3  NA  
# 12 202004  11000  0      0.1   0.3   0.3
# 13 202005  11000  0.01  NA    NA     0.3
# 14 202006  11000 -0.01  NA    NA     0.3
# 15 202007  11000  0.04   0.5   0.4   0.3
# 16 202008  11000  0.3    0.5   0.4  NA  

图书馆(动物园)
图书馆(dplyr)

不,这是有效的。谢谢