通过dplyr进行时间序列插补
给定数据集:通过dplyr进行时间序列插补,r,dplyr,time-series,grouping,R,Dplyr,Time Series,Grouping,给定数据集: Category Date a b aa 2017-01-01 5 1 aa 2017-01-03 1 3 bb 2017-01-01 2 4 bb 2017-01-02 3 5 bb 2017-01-03 2 3 cc 2017-01-03 3 3 ... 我需要为给定
Category Date a b
aa 2017-01-01 5 1
aa 2017-01-03 1 3
bb 2017-01-01 2 4
bb 2017-01-02 3 5
bb 2017-01-03 2 3
cc 2017-01-03 3 3
...
我需要为给定数据集中的每个类别估算观察值。对于列a
我需要插补0
,对于b-上次观察值。对于本例,我必须获得以下信息:
Category Date a b
aa 2017-01-01 5 1
aa 2017-01-02 0 1
aa 2017-01-03 1 3
bb 2017-01-01 2 4
bb 2017-01-02 3 5
bb 2017-01-03 2 3
cc 2017-01-01 0 0 # start date for cc category, so '0'
cc 2017-01-02 0 0
cc 2017-01-03 3 3
...
库(dplyr)
图书馆(lubridate)
df1%
变异(a=ifelse(is.na(a),0,a),
b=ifelse(is.na(b),dplyr::lag(b,n=1,默认值=0),b),
b=ifelse(is.na(b),dplyr::lag(b,n=1,默认值=0),b))
这绝不是一个优雅的解决方案,但您可以创建一个包含所有要结束的行的单独数据框(只需使用rep和seq)
然后左键将旧的数据帧连接到该数据帧上,并使用lag窗口函数(需要执行两次)
希望这有点帮助。刚刚记住的coalesce是dplyr<代码>左(加入(df1,df2)%%>%group_by(类别)%%>%mutate(a=合并(a,0),b=合并(b,滞后(b,n=1,默认值=0),滞后(b,n=2,默认值=0))
library(dplyr)
library(lubridate)
df1 <- data.frame(
Category = sort(rep(paste0(letters[seq( from = 1, to = 3 )], letters[seq( from = 1, to = 3 )]),3)),
Date = rep(seq(as.Date("2017-01-01"), as.Date("2017-01-03"), by = "day"),3)
)
df2 <- data.frame(Category = c("aa", "aa", "bb", "bb", "bb", "cc"),
Date = c("2017-01-01","2017-01-03","2017-01-01","2017-01-02", "2017-01-03", "2017-01-03"),
a = c(5, 1, 2, 3, 2, 3),
b = c(1, 3, 4, 5, 3, 3)
)
df2$Date = as.Date(df2$Date)
left_join(df1, df2) %>%
group_by(Category) %>%
mutate(a = ifelse(is.na(a), 0, a),
b = ifelse(is.na(b), dplyr::lag(b,n=1,default=0),b),
b = ifelse(is.na(b), dplyr::lag(b,n=1,default=0),b))