R:滚动日期范围内的累计总和
在R中,在计算行之前,如何计算定义时间段内的总和?如果可能的话,更喜欢dplyr 例如,如果周期为10天,则函数将实现cum_rolling10:R:滚动日期范围内的累计总和,r,dplyr,cumsum,R,Dplyr,Cumsum,在R中,在计算行之前,如何计算定义时间段内的总和?如果可能的话,更喜欢dplyr 例如,如果周期为10天,则函数将实现cum_rolling10: date value cumsum cum_rolling10 1/01/2000 9 9 9 2/01/2000 1 10 10 5/01/2000 9 19 19 6/01/2000 3 22 22 7/01/2000 4 26 26 8/01
date value cumsum cum_rolling10
1/01/2000 9 9 9
2/01/2000 1 10 10
5/01/2000 9 19 19
6/01/2000 3 22 22
7/01/2000 4 26 26
8/01/2000 3 29 29
13/01/2000 10 39 29
14/01/2000 9 48 38
18/01/2000 2 50 21
19/01/2000 9 59 30
21/01/2000 8 67 38
25/01/2000 5 72 24
26/01/2000 1 73 25
30/01/2000 6 79 20
31/01/2000 6 85 18
使用
dplyr
、tidyr
、lubridate
和zoo
的解决方案
library(dplyr)
library(tidyr)
library(lubridate)
library(zoo)
dt2 <- dt %>%
mutate(date = dmy(date)) %>%
mutate(cumsum = cumsum(value)) %>%
complete(date = full_seq(date, period = 1), fill = list(value = 0)) %>%
mutate(cum_rolling10 = rollapplyr(value, width = 10, FUN = sum, partial = TRUE)) %>%
drop_na(cumsum)
dt2
# A tibble: 15 x 4
date value cumsum cum_rolling10
<date> <dbl> <int> <dbl>
1 2000-01-01 9 9 9
2 2000-01-02 1 10 10
3 2000-01-05 9 19 19
4 2000-01-06 3 22 22
5 2000-01-07 4 26 26
6 2000-01-08 3 29 29
7 2000-01-13 10 39 29
8 2000-01-14 9 48 38
9 2000-01-18 2 50 21
10 2000-01-19 9 59 30
11 2000-01-21 8 67 38
12 2000-01-25 5 72 24
13 2000-01-26 1 73 25
14 2000-01-30 6 79 20
15 2000-01-31 6 85 18
库(dplyr)
图书馆(tidyr)
图书馆(lubridate)
图书馆(动物园)
dt2%
突变(日期=dmy(日期))%>%
突变(累积总和=累积总和(值))%>%
完成(日期=完整顺序(日期,期间=1),填写=列表(值=0))%>%
突变(cum_rolling10=rollappyr(值,宽度=10,乐趣=sum,部分=TRUE))%>%
滴水(立方米)
dt2
#一个tibble:15x4
日期值累计与滚动10
1 2000-01-01 9 9 9
2 2000-01-02 1 10 10
3 2000-01-05 9 19 19
4 2000-01-06 3 22 22
5 2000-01-07 4 26 26
6 2000-01-08 3 29 29
7 2000-01-13 10 39 29
8 2000-01-14 9 48 38
9 2000-01-18 2 50 21
10 2000-01-19 9 59 30
11 2000-01-21 8 67 38
12 2000-01-25 5 72 24
13 2000-01-26 1 73 25
14 2000-01-30 6 79 20
15 2000-01-31 6 85 18
数据
dt <- structure(list(date = c("1/01/2000", "2/01/2000", "5/01/2000",
"6/01/2000", "7/01/2000", "8/01/2000", "13/01/2000", "14/01/2000",
"18/01/2000", "19/01/2000", "21/01/2000", "25/01/2000", "26/01/2000",
"30/01/2000", "31/01/2000"), value = c(9L, 1L, 9L, 3L, 4L, 3L,
10L, 9L, 2L, 9L, 8L, 5L, 1L, 6L, 6L)), .Names = c("date", "value"
), row.names = c(NA, -15L), class = "data.frame")
dt此解决方案将避免内存开销,并且迁移到dt
将很容易
滞后=7
dt %>%
mutate(date = dmy(date)) %>%
mutate(order = datediff(date,min(date)) %>%
arrange(desc(order)) %>%
mutate(n_order = lag(order + lag,1L,default = 0)) %>%
mutate(b_order = ifelse(order - n_order >= 0,order,-1)) %>%
mutate(m_order = cummax(b_order)) %>%
group_by(m_order) %>%
mutate(rolling_value = cumsum(value))
我建议使用设计用于在滚动/运行窗口上计算函数的包。您可以通过使用sum\u run
-此处的一行代码来实现这一点:
library(runner)
library(dplyr)
df %>%
mutate(
cum_rolling_10 = sum_run(
x = df$value,
k = 10,
idx = as.Date(df$date, format = "%d/%m/%Y"))
)
df
# date value cum_rolling_10
# 1 1/01/2000 9 9
# 2 2/01/2000 1 10
# 3 5/01/2000 9 19
# 4 6/01/2000 3 22
# 5 7/01/2000 4 26
# 6 8/01/2000 3 29
# 7 13/01/2000 10 29
# 8 14/01/2000 9 38
# 9 18/01/2000 2 21
# 10 19/01/2000 9 30
# 11 21/01/2000 8 38
# 12 25/01/2000 5 24
# 13 26/01/2000 1 25
# 14 30/01/2000 6 20
# 15 31/01/2000 6 18
享受吧 这对我来说也很好,但当我将其应用于带有指示组的附加字段的数据时,会遇到错误。我通过(id)添加group_,其中id指包含组id的字段。如何解决这个问题有什么想法吗?