R 用于处理重复数据的不规则时间序列的滚动窗口功能
我有以下data.frame:R 用于处理重复数据的不规则时间序列的滚动窗口功能,r,date,time-series,rolling-computation,R,Date,Time Series,Rolling Computation,我有以下data.frame: grp nr yr 1: A 1.0 2009 2: A 2.0 2009 3: A 1.5 2009 4: A 1.0 2010 5: B 3.0 2009 6: B 2.0 2010 7: B NA 2011 8: C 3.0 2014 9: C 3.0 2019 10: C 3.0 2020 11: C 4.0 2021 期望输出: grp nr yr nr_roll
grp nr yr
1: A 1.0 2009
2: A 2.0 2009
3: A 1.5 2009
4: A 1.0 2010
5: B 3.0 2009
6: B 2.0 2010
7: B NA 2011
8: C 3.0 2014
9: C 3.0 2019
10: C 3.0 2020
11: C 4.0 2021
期望输出:
grp nr yr nr_roll_period_3
1 A 1.0 2009 NA
2 A 2.0 2009 NA
3 A 1.5 2009 NA
4 A 1.0 2010 NA
5 B 3.0 2009 NA
6 B 2.0 2010 NA
7 B NA 2011 NA
8 C 3.0 2014 NA
9 C 3.0 2019 NA
10 C 3.0 2020 NA
11 C 4.0 2021 3.333333
逻辑是:
- 我想计算长度k(假设为3)期间的滚动平均值,其中3包括当前月/年/日(按组)
- 但是,如果没有连续3年/月/天,则不应计算任何内容
- 同样,在这段时间内,当计算列中有NA时,输出应为NA李>
calculate_rolling_window <-
function(dt, date_col, calc_col, id, k) {
require(data.table)
return(setDT(dt)[
, paste(calc_col, "roll_period", k, sep = "_") :=
sapply(get(date_col), function(x) mean(get(calc_col)[between(get(date_col), x - k + 1, x)])),
by = mget(id)])
}
有没有办法处理这个问题?不需要专门的数据。表
方法。这可以通过在非等联接中分组来解决,以在长度k
的滚动窗口上聚合,过滤连续数年的k
和更新联接:
中间结果mDT
包含k
期间的滚动平均值V2
,以及每个期间内唯一/不同年份的计数V1
。它是由DT
与一个data.table(包含由动态创建的上限和下限)的非等联接创建的(grp=grp,upper=yr,lower=yr-k)
对于正好包含k
不同年份的行,将对其进行筛选:
最后,它与DT
连接,将新列附加到DT
请注意,如果输入数据中存在NA
,则默认情况下mean()
返回NA
资料
库(data.table)
DT这可能会有所帮助:谢谢,但是我已经尝试过这些方法,并且不能满足当前的需求。
grp nr yr nr_roll_period_3
1: A 1.0 2009 1.500000
2: A 2.0 2009 1.500000
3: A 1.5 2009 1.500000
4: A 1.0 2010 1.375000
5: B 3.0 2009 NA
6: B 2.0 2010 NA
7: B NA 2011 NA
8: C 3.0 2014 NA
9: C 3.0 2019 NA
10: C 3.0 2020 NA
11: C 4.0 2021 3.333333
library(data.table)
k <- 3L
# group by join parameters of a non-equi join
mDT <- setDT(DT)[.(grp = grp, upper = yr, lower = yr - k),
on = .(grp, yr <= upper, yr > lower),
.(uniqueN(x.yr), mean(nr)), by = .EACHI]
# update join with filtered intermediate result
DT[mDT[V1 == k], on = .(grp, yr), paste0("nr_roll_period_", k) := V2]
DT
grp nr yr nr_roll_period
1: A 1.0 2009 NA
2: A 2.0 2009 NA
3: A 1.5 2009 NA
4: A 1.0 2010 NA
5: B 3.0 2009 NA
6: B 2.0 2010 NA
7: B NA 2011 NA
8: C 3.0 2014 NA
9: C 3.0 2019 NA
10: C 3.0 2020 NA
11: C 4.0 2021 3.333333
mDT
grp yr yr V1 V2
1: A 2009 2006 1 1.500000
2: A 2009 2006 1 1.500000
3: A 2009 2006 1 1.500000
4: A 2010 2007 2 1.375000
5: B 2009 2006 1 3.000000
6: B 2010 2007 2 2.500000
7: B 2011 2008 3 NA
8: C 2014 2011 1 3.000000
9: C 2019 2016 1 3.000000
10: C 2020 2017 2 3.000000
11: C 2021 2018 3 3.333333
mDT[V1 == k]
grp yr yr V1 V2
1: B 2011 2008 3 NA
2: C 2021 2018 3 3.333333
library(data.table)
DT <- fread(text = "rn grp nr yr
1: A 1.0 2009
2: A 2.0 2009
3: A 1.5 2009
4: A 1.0 2010
5: B 3.0 2009
6: B 2.0 2010
7: B NA 2011
8: C 3.0 2014
9: C 3.0 2019
10: C 3.0 2020
11: C 4.0 2021", drop = 1L)