R 添加缺少的行,但不在日期更改时添加
我试图将缺少的行添加到数据帧中(在NO_REF的每个值内),同时对一些列进行线性插值,并在其他列上插入最后的非NA值。当间隔后的DATE_X值大于间隔前的最后一个日期值时,我不知道如何防止插入缺少的日期 这是我的数据框:R 添加缺少的行,但不在日期更改时添加,r,dplyr,R,Dplyr,我试图将缺少的行添加到数据帧中(在NO_REF的每个值内),同时对一些列进行线性插值,并在其他列上插入最后的非NA值。当间隔后的DATE_X值大于间隔前的最后一个日期值时,我不知道如何防止插入缺少的日期 这是我的数据框: df = data.frame(DATE = as.Date(c("2016-01-31","2016-03-31","2016-05-31","2016-08-31","2016-12-31","2016-02-29","2016-04-30","2016-06-30","2
df = data.frame(DATE = as.Date(c("2016-01-31","2016-03-31","2016-05-31","2016-08-31","2016-12-31","2016-02-29","2016-04-30","2016-06-30","2016-08-31","2016-10-31","2016-12-31","2015-01-31","2015-02-28","2015-06-30","2015-10-31","2015-12-31")),
DATE_X = as.Date(c("2010-01-31","2010-01-31","2016-04-30","2015-03-31","2015-03-31","2010-10-31","2010-10-31","2016-05-31","2016-05-31","2015-07-31","2015-07-31","2013-01-31","2013-01-31","2013-01-31","2015-09-30","2015-09-30")),
NO_REF = c("P1","P1","P1","P2","P2","O1","O1","O1","O1","R1","R2","Q1","Q1","Q1","Q1","Q1"),
KAP = as.double(15:30),
DIV =c("PI","PI","PI","PI","PI","OP","OP","OP","OP","PR","PR","OP","OP","OP","OP","OP"))
这是我的代码:
library(dplyr)
library(multidplyr)
library(zoo)
cluster <- create_cluster(3)
cluster_eval(cluster,library(dplyr))
cluster_eval(cluster,library(zoo))
result = df %>% partition(NO_REF,cluster=cluster) %>%
group_by(NO_REF) %>%
do(left_join(data.frame(NO_REF = .$NO_REF[1], DATE = seq(min(.$DATE)+1, max(.$DATE)+1, by="1 month")-1), .,
by=c("NO_REF","DATE"))) %>% mutate(DATE_X=na.locf(DATE_X, na.rm=FALSE),
DIV=na.locf(DIV, na.rm=FALSE), KAP=na.approx(KAP)) %>% collect()
库(dplyr)
库(多DPLYR)
图书馆(动物园)
群集%分区(无参考,群集=群集)%>%
分组依据(无参考)%>%
do(左连接(数据帧(编号参考=编号参考[1],日期=序号(最小(.$DATE)+1,最大(.$DATE)+1,按=“1个月”)-1),其中。,
by=c(“无参考”、“日期”))%>%突变(日期=na.locf(日期=na.rm=FALSE),
DIV=na.locf(DIV,na.rm=FALSE),KAP=na.approx(KAP))%>%collect()
在下表中,最终结果中不应出现蓝色行
预期结果:
提前感谢您的帮助 这可能不是最有效的方法,但我认为它符合您的要求:
library(dplyr)
library(multidplyr)
library(zoo)
cluster <- create_cluster(3)
cluster_eval(cluster,library(dplyr))
cluster_eval(cluster,library(zoo))
result = df %>% partition(NO_REF,cluster=cluster) %>%
group_by(NO_REF) %>%
do(left_join(data.frame(NO_REF = .$NO_REF[1], DATE = seq(min(.$DATE)+1, max(.$DATE)+1, by="1 month")-1), .,
by=c("NO_REF","DATE"))) %>%
filter(!(is.na(DATE_X) &
na.locf(DATE_X, fromLast=TRUE, na.rm=FALSE)>
na.locf(DATE+days(ifelse(is.na(DATE_X), NA, 0)), na.rm=FALSE))) %>%
mutate(DATE_X=na.locf(DATE_X, na.rm=FALSE),
DIV=na.locf(DIV, na.rm=FALSE),
KAP=na.approx(KAP)) %>%
collect()
库(dplyr)
库(多DPLYR)
图书馆(动物园)
群集%分区(无参考,群集=群集)%>%
分组依据(无参考)%>%
do(左连接(数据帧(编号参考=编号参考[1],日期=序号(最小(.$DATE)+1,最大(.$DATE)+1,按=“1个月”)-1),其中。,
by=c(“无参考”、“日期”))%>%
过滤器(!(is.na(DATE_X)和
na.locf(DATE\u X,fromLast=TRUE,na.rm=FALSE)>
na.locf(日期+天数(如果其他(is.na(DATE_X),na,0)),na.rm=FALSE))%>%
突变(DATE_X=na.locf(DATE_X,na.rm=FALSE),
DIV=na.locf(DIV,na.rm=FALSE),
KAP=na.近似值(KAP))%>%
收集
简言之,
DATE
列被视为NA并在缺少DATE\u X
的地方向前推进,DATE\u X
被向后推进,并且删除了后者大于前者而缺少DATE\u X
的行。我认为您应该使用bind\u rows这正是我想要的。非常感谢。乐于助人:)如果你认为答案对其他面临同样问题的人也有帮助,请随意接受正确的答案。