在R数据表中使用前瞻的变量移位_R_Data.table

在R数据表中使用前瞻的变量移位

在R数据表中使用前瞻的变量移位,r,data.table,R,Data.table,我有一个数据表，上面有一段时间内每天的数值预测。这些预测包括周末，我已经使用DT语法将其转换为周一： dt[day %in% c("sat", "sun", "mon"), y := sum(y), by = c("year", "week")] 当然，sat和sun可以设置为0 但是，也有非工作日可以是一周中的任何一天。考虑到其他非工作日（nwd）和周末，这些天的预测需要转移到下一个工作日。例如： mo tu we th fr (nwd) sa su mo (nwd) tu --------

我有一个数据表，上面有一段时间内每天的数值预测。这些预测包括周末，我已经使用DT语法将其转换为周一：

dt[day %in% c("sat", "sun", "mon"), y := sum(y), by = c("year", "week")]

当然，sat和sun可以设置为0

但是，也有非工作日可以是一周中的任何一天。考虑到其他非工作日（nwd）和周末，这些天的预测需要转移到下一个工作日。例如：

mo tu we th fr (nwd) sa su mo (nwd) tu
---------------------------------------
50 60 60 20 30       0  0  0        20

预期产量（周五至下周二）：

如何在不使用循环的情况下实现这一点

编辑样本数据：

library(data.table)
dt = data.table(
     ds = seq(as.Date('2018-08-13'), as.Date('2018-08-21'), by = 1),
     nwd = c(FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE),
     pred = c(20, 40, 40, 60, 50, 60, 40, 10, 20))

这是我的第一次尝试。它还将周六和周日标记为非工作日，识别连续的工作日/非工作日，汇总每个连续日的预测，仅使用前一个工作日的预测之和更新下一个工作日的

pred

，将非工作日的

pred

设置为零，并最终删除辅助列

dt[wday(ds) %in% c(7, 1), nwd := TRUE][
  , streak := rleid(nwd)][
    , streak_sum := sum(pred), by = streak][
      , pred := pred + ifelse(!nwd & rowid(streak) == 1L, shift(streak_sum, fill = 0), 0)][
        (nwd), pred := 0][
          , `:=`(streak = NULL, streak_sum = NULL)][]

这里有一个更优雅简洁的版本，它使用了更新连接。这里的诀窍是，当聚合每个条纹中的预测值时，返回每个条纹的最后一天加上一天，这是后续条纹的开始

# aggregate by streak
tmp <- dt[wday(ds) %in% c(7, 1), nwd := TRUE][
  , .(nwd = nwd[1], next_day = last(ds) + 1, streak_sum = sum(pred)), by = rleid(nwd)]
# update join but use only non-working streaks
dt[tmp[(nwd)], on = .(ds = next_day), pred := pred + streak_sum][
  (nwd), pred := 0][]

一些可复制的数据怎么样？我添加了一些样本数据。你想最终按周汇总吗？那么，是否需要将nwd值精确地转移到下一个工作日，或者将数值包含在下周合计中就足够了？不应该进行合计。它非常有效，谢谢。我真的很想理解它，你介意稍微解释一下代码吗？添加了解释和第二个变体，将条纹小计加入到下一个工作日。谢谢，这绝对是可以理解的。

           ds   nwd pred
1: 2018-08-13 FALSE   20
2: 2018-08-14 FALSE   40
3: 2018-08-15 FALSE   40
4: 2018-08-16 FALSE   60
5: 2018-08-17  TRUE    0
6: 2018-08-18  TRUE    0
7: 2018-08-19  TRUE    0
8: 2018-08-20  TRUE    0
9: 2018-08-21 FALSE  180

# aggregate by streak
tmp <- dt[wday(ds) %in% c(7, 1), nwd := TRUE][
  , .(nwd = nwd[1], next_day = last(ds) + 1, streak_sum = sum(pred)), by = rleid(nwd)]
# update join but use only non-working streaks
dt[tmp[(nwd)], on = .(ds = next_day), pred := pred + streak_sum][
  (nwd), pred := 0][]

           ds   nwd pred
1: 2018-08-13 FALSE   20
2: 2018-08-14 FALSE   40
3: 2018-08-15 FALSE   40
4: 2018-08-16 FALSE   60
5: 2018-08-17  TRUE    0
6: 2018-08-18  TRUE    0
7: 2018-08-19  TRUE    0
8: 2018-08-20  TRUE    0
9: 2018-08-21 FALSE  180