如何根据R中的其他列复制行以将数据回填到特定日期?

如何根据R中的其他列复制行以将数据回填到特定日期?,r,R,我有一个包含7个变量和数百万行的数据框。我想创建行,根据已经编码的实例将数据回填到特定的时间点 实例按年份、ID、Var1、Var2和数字计算。您会注意到,第一个实例的日期根据这些组的不同而不同。对于第一个实例不是2015年1月1日的组,我需要在2015年1月1日之前回填其数据 以下是初始数据帧: Date <- c("4/1/2015", "5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015","3/1/2015

我有一个包含7个变量和数百万行的数据框。我想创建行,根据已经编码的实例将数据回填到特定的时间点

实例按年份、ID、Var1、Var2和数字计算。您会注意到,第一个实例的日期根据这些组的不同而不同。对于第一个实例不是2015年1月1日的组,我需要在2015年1月1日之前回填其数据

以下是初始数据帧:

Date <- c("4/1/2015", "5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015","3/1/2015","4/1/2015","5/1/2015")
Year <- 2015
ID <- c("123456", "123456", "234567", "234567", "234567", "234567", "234567", "123456", "123456", "123456")
Var1 <- c(1,1,2,2,2,2,2,1,1,1)
Var2 <- c(10,10,10,10,10,10,10,11,11,11)
Number <- c("0001", "0001", "0001","0001","0001","0001","0001","0002","0002","0002")
Instance <- c(1,2,1,2,3,4,5,1,2,3)
df <- data.frame(Date, Year, ID, Var1, Var2, Number, Instance)

以下是我的预期输出:

Date <- c("1/1/2015","2/1/2015","3/1/2015","4/1/2015", "5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015")
Year <- 2015
ID <- c("123456","123456","123456","123456", "123456", "234567", "234567", "234567", "234567", "234567", "123456","123456","123456", "123456", "123456")
Var1 <- c(1,1,1,1,1,2,2,2,2,2,1,1,1,1,1)
Var2 <- c(10,10,10,10,10,10,10,10,10,10,11,11,11,11,11)
Number <- c("0001","0001","0001","0001", "0001", "0001","0001","0001","0001","0001","0002","0002","0002","0002","0002")
Instance <- c(0,0,0,1,2,1,2,3,4,5,0,0,1,2,3)
df <- data.frame(Date, Year, ID, Var1, Var2, Number, Instance)
在按感兴趣的列进行分组后,将完成一个选项

library(tidyverse)
library(lubridate)
df %>% 
  mutate(Date = dmy(Date)) %>% 
  group_by(Year, ID, Var1, Var2, Number) %>% 
  complete(Date = seq(floor_date(Date, 'month')[1], max(Date), 
        by = '1 day'), fill = list(Instance = 0)) %>%
  select(names(df))
# A tibble: 15 x 7
# Groups:   Year, ID, Var1, Var2, Number [6]
#   Date        Year ID      Var1  Var2 Number Instance
#   <date>     <dbl> <fct>  <dbl> <dbl> <fct>     <dbl>
# 1 2015-01-01  2015 123456     1    10 0001          0
# 2 2015-01-02  2015 123456     1    10 0001          0
# 3 2015-01-03  2015 123456     1    10 0001          0
# 4 2015-01-04  2015 123456     1    10 0001          1
# 5 2015-01-05  2015 123456     1    10 0001          2
# 6 2015-01-01  2015 123456     1    11 0002          0
# 7 2015-01-02  2015 123456     1    11 0002          0
# 8 2015-01-03  2015 123456     1    11 0002          1
# 9 2015-01-04  2015 123456     1    11 0002          2
#10 2015-01-05  2015 123456     1    11 0002          3
#11 2015-01-01  2015 234567     2    10 0001          1
#12 2015-01-02  2015 234567     2    10 0001          2
#13 2015-01-03  2015 234567     2    10 0001          3
#14 2015-01-04  2015 234567     2    10 0001          4
#15 2015-01-05  2015 234567     2    10 0001          5

我得到一个错误:seq.int0,to0-from中的错误,by:to必须是一个有限数。seq.int0,to0-from中的错误,by:to必须是一个有限数。@MelissaDureiko。我猜在你的例子中,它是有效的,对吗?。因此,它必须与原始数据相关,其中可能缺少值,或者您是否可以检查Date列是否为Date Classis在本例中,它可以工作,但我的实际数据集将持续到2019年7月1日。我检查过了,Date列实际上是一个Date类。在控制台中,错误是读取无法解析的所有格式。找不到格式。@MelissaDureiko。您的日期格式为%m/d/%Y或%d/%m/%Y。您的日期列中是否有多种格式?