R 在时间序列中仅填写有限数量的NA

R 在时间序列中仅填写有限数量的NA,r,time-series,xts,zoo,R,Time Series,Xts,Zoo,是否有一种方法可以将NAs填充到zoo或xts对象中,并将NA的数量限制在向前。换句话说,如填充NAs,最多连续填充3个NAs,然后保持NAs从第4个值开始,直到一个有效数字 像这样的 library(zoo) x <- zoo(1:20, Sys.Date() + 1:20) x[c(2:4, 6:10, 13:18)] <- NA x 2014-09-20 2014-09-21 2014-09-22 2014-09-23 2014-09-24 2014-09-25 2014-0

是否有一种方法可以将
NA
s填充到
zoo
xts
对象中,并将
NA
的数量限制在向前。换句话说,如填充
NA
s,最多连续填充3个
NA
s,然后保持
NA
s从第4个值开始,直到一个有效数字

像这样的

library(zoo)
x <- zoo(1:20, Sys.Date() + 1:20)
x[c(2:4, 6:10, 13:18)] <- NA
x

2014-09-20 2014-09-21 2014-09-22 2014-09-23 2014-09-24 2014-09-25 2014-09-26 
         1         NA         NA         NA          5         NA         NA 
2014-09-27 2014-09-28 2014-09-29 2014-09-30 2014-10-01 2014-10-02 2014-10-03 
        NA         NA         NA         11         12         NA         NA 
2014-10-04 2014-10-05 2014-10-06 2014-10-07 2014-10-08 2014-10-09 
        NA         NA         NA         NA         19         20
我尝试了很多与
na.locf(x,maxgap=3)
等的组合,但都没有成功。我可以创建一个循环来获得所需的输出,我想知道是否有矢量化的方法来实现这一点

fillInTheBlanks <- function(v, n=3) {
  result <- v
  counter0 <- 1
  for(i in 2:length(v)) {
    value <- v[i]
    if (is.na(value)) {
      if (counter0 > n) {
        result[i] <- v[i]
      } else {  
        result[i] <- result[i-1]
        counter0 <- counter0 + 1
      } }   
    else {
      result[i] <- v[i] 
      counter0 <- 1
    }
  }
  return(result)
}

在空白处填入而不使用
na.locf
,但想法是将XT按一组非缺失值进行分割,然后对每组仅用第一个值替换前3个值(在非缺失值之后)。它是一个循环,但由于它只应用于组,所以它应该比简单的循环在所有值上都要快

zz <- 
unlist(sapply(split(coredata(x),cumsum(!is.na(x))),
       function(sx){
         if(length(sx)>3) 
           sx[2:4] <- rep(sx[1],3)
         else sx <- rep(sx[1],length(sx))
         sx
       }))
## create the zoo object since , the latter algorithm is applied only to the values 
zoo(zz,index(x))

2014-09-20 2014-09-21 2014-09-22 2014-09-23 2014-09-24 2014-09-25 2014-09-26 2014-09-27 2014-09-28 2014-09-29 2014-09-30 2014-10-01 2014-10-02 
         1          1          1          1          5          5          5          5         NA         NA         11         12         12 
2014-10-03 2014-10-04 2014-10-05 2014-10-06 2014-10-07 2014-10-08 2014-10-09 
        12         12         NA         NA         NA         19         20 
zz3)
sx[2:4]还有另一种方法:

l <- cumsum(! is.na(x))
c(NA, x[! is.na(x)])[replace(l, ave(l, l, FUN=seq_along) > 4, 0) + 1]
# [1]  1  1  1  1  5  5  5  5 NA NA 11 12 12 12 12 NA NA NA 19 20

另外一个想法是,除非我错过了什么,否则似乎是正确的:

na_locf_until = function(x, n = 3)
{
   wnn = which(!is.na(x))  
   inds = sort(c(wnn, (wnn + n+1)[which((wnn + n+1) < c(wnn[-1], length(x)))]))
   c(rep(NA, wnn[1] - 1), 
     as.vector(x)[rep(inds, c(diff(inds), length(x) - inds[length(inds)] + 1))])
}
na_locf_until(x)
#[1]  1  1  1  1  5  5  5  5 NA NA 11 12 12 12 12 NA NA NA 19 20
na_locf_until=函数(x,n=3)
{
wnn=哪个(!是.na(x))
inds=sort(c(wnn,(wnn+n+1)[其中(wnn+n+1)
数据表中玩转。表
提供了以下黑客解决方案:

np1 <- 3 + 1
dt[, 
   x_filled := x[c(rep(1, min(np1, .N)), rep(NA, max(0, .N - np1)))],
   by = cumsum(!is.na(x))]
# Or slightly simplified:
dt[, 
   x_filled := ifelse(rowid(x) < 4, x[1], x[NA]),
   by = cumsum(!is.na(x))]

> dt
          date  x x_filled
 1: 2019-02-14  1        1
 2: 2019-02-15 NA        1
 3: 2019-02-16 NA        1
 4: 2019-02-17 NA        1
 5: 2019-02-18  5        5
 6: 2019-02-19 NA        5
 7: 2019-02-20 NA        5
 8: 2019-02-21 NA        5
 9: 2019-02-22 NA       NA
10: 2019-02-23 NA       NA
11: 2019-02-24 11       11
12: 2019-02-25 12       12
13: 2019-02-26 NA       12
14: 2019-02-27 NA       12
15: 2019-02-28 NA       12
16: 2019-03-01 NA       NA
17: 2019-03-02 NA       NA
18: 2019-03-03 NA       NA
19: 2019-03-04 19       19
20: 2019-03-05 20       20

data.table
中实现这一点的最干净的方法可能是使用连接语法:

na.omit(dt)[dt, on = .(date), roll = +3, .(date, x_filled = x, x = i.x)]

          date x_filled  x
 1: 2019-02-14        1  1
 2: 2019-02-15        1 NA
 3: 2019-02-16        1 NA
 4: 2019-02-17        1 NA
 5: 2019-02-18        5  5
 6: 2019-02-19        5 NA
 7: 2019-02-20        5 NA
 8: 2019-02-21        5 NA
 9: 2019-02-22       NA NA
10: 2019-02-23       NA NA
11: 2019-02-24       11 11
12: 2019-02-25       12 12
13: 2019-02-26       12 NA
14: 2019-02-27       12 NA
15: 2019-02-28       12 NA
16: 2019-03-01       NA NA
17: 2019-03-02       NA NA
18: 2019-03-03       NA NA
19: 2019-03-04       19 19
20: 2019-03-05       20 20

*此解决方案取决于日期列,并且它是连续的

添加一些用例场景,当我们有一个qtrly数据并且我们知道数据在接下来的3个月内是好的,并且可能最多再增加3个月,但是,任何超出可接受范围的内容都应该使数据真正成为NA,并且在无限种情况下才应该填充数据。这里还有一些其他选择:作为单独的答案发布,因为技术和逻辑不同。
np1 <- 3 + 1
dt[, 
   x_filled := x[c(rep(1, min(np1, .N)), rep(NA, max(0, .N - np1)))],
   by = cumsum(!is.na(x))]
# Or slightly simplified:
dt[, 
   x_filled := ifelse(rowid(x) < 4, x[1], x[NA]),
   by = cumsum(!is.na(x))]

> dt
          date  x x_filled
 1: 2019-02-14  1        1
 2: 2019-02-15 NA        1
 3: 2019-02-16 NA        1
 4: 2019-02-17 NA        1
 5: 2019-02-18  5        5
 6: 2019-02-19 NA        5
 7: 2019-02-20 NA        5
 8: 2019-02-21 NA        5
 9: 2019-02-22 NA       NA
10: 2019-02-23 NA       NA
11: 2019-02-24 11       11
12: 2019-02-25 12       12
13: 2019-02-26 NA       12
14: 2019-02-27 NA       12
15: 2019-02-28 NA       12
16: 2019-03-01 NA       NA
17: 2019-03-02 NA       NA
18: 2019-03-03 NA       NA
19: 2019-03-04 19       19
20: 2019-03-05 20       20
library(zoo)
library(data.table)
x <- zoo(1:20, Sys.Date() + 1:20)
x[c(2:4, 6:10, 13:18)] <- NA
dt <- data.table(date = index(x), x = as.integer(x))
na.omit(dt)[dt, on = .(date), roll = +3, .(date, x_filled = x, x = i.x)]

          date x_filled  x
 1: 2019-02-14        1  1
 2: 2019-02-15        1 NA
 3: 2019-02-16        1 NA
 4: 2019-02-17        1 NA
 5: 2019-02-18        5  5
 6: 2019-02-19        5 NA
 7: 2019-02-20        5 NA
 8: 2019-02-21        5 NA
 9: 2019-02-22       NA NA
10: 2019-02-23       NA NA
11: 2019-02-24       11 11
12: 2019-02-25       12 12
13: 2019-02-26       12 NA
14: 2019-02-27       12 NA
15: 2019-02-28       12 NA
16: 2019-03-01       NA NA
17: 2019-03-02       NA NA
18: 2019-03-03       NA NA
19: 2019-03-04       19 19
20: 2019-03-05       20 20