如何根据R中的其他列复制行以将数据回填到特定日期?
我有一个包含7个变量和数百万行的数据框。我想创建行,根据已经编码的实例将数据回填到特定的时间点 实例按年份、ID、Var1、Var2和数字计算。您会注意到,第一个实例的日期根据这些组的不同而不同。对于第一个实例不是2015年1月1日的组,我需要在2015年1月1日之前回填其数据 以下是初始数据帧:如何根据R中的其他列复制行以将数据回填到特定日期?,r,R,我有一个包含7个变量和数百万行的数据框。我想创建行,根据已经编码的实例将数据回填到特定的时间点 实例按年份、ID、Var1、Var2和数字计算。您会注意到,第一个实例的日期根据这些组的不同而不同。对于第一个实例不是2015年1月1日的组,我需要在2015年1月1日之前回填其数据 以下是初始数据帧: Date <- c("4/1/2015", "5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015","3/1/2015
Date <- c("4/1/2015", "5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015","3/1/2015","4/1/2015","5/1/2015")
Year <- 2015
ID <- c("123456", "123456", "234567", "234567", "234567", "234567", "234567", "123456", "123456", "123456")
Var1 <- c(1,1,2,2,2,2,2,1,1,1)
Var2 <- c(10,10,10,10,10,10,10,11,11,11)
Number <- c("0001", "0001", "0001","0001","0001","0001","0001","0002","0002","0002")
Instance <- c(1,2,1,2,3,4,5,1,2,3)
df <- data.frame(Date, Year, ID, Var1, Var2, Number, Instance)
以下是我的预期输出:
Date <- c("1/1/2015","2/1/2015","3/1/2015","4/1/2015", "5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015","1/1/2015","2/1/2015","3/1/2015","4/1/2015","5/1/2015")
Year <- 2015
ID <- c("123456","123456","123456","123456", "123456", "234567", "234567", "234567", "234567", "234567", "123456","123456","123456", "123456", "123456")
Var1 <- c(1,1,1,1,1,2,2,2,2,2,1,1,1,1,1)
Var2 <- c(10,10,10,10,10,10,10,10,10,10,11,11,11,11,11)
Number <- c("0001","0001","0001","0001", "0001", "0001","0001","0001","0001","0001","0002","0002","0002","0002","0002")
Instance <- c(0,0,0,1,2,1,2,3,4,5,0,0,1,2,3)
df <- data.frame(Date, Year, ID, Var1, Var2, Number, Instance)
在按感兴趣的列进行分组后,将完成一个选项
library(tidyverse)
library(lubridate)
df %>%
mutate(Date = dmy(Date)) %>%
group_by(Year, ID, Var1, Var2, Number) %>%
complete(Date = seq(floor_date(Date, 'month')[1], max(Date),
by = '1 day'), fill = list(Instance = 0)) %>%
select(names(df))
# A tibble: 15 x 7
# Groups: Year, ID, Var1, Var2, Number [6]
# Date Year ID Var1 Var2 Number Instance
# <date> <dbl> <fct> <dbl> <dbl> <fct> <dbl>
# 1 2015-01-01 2015 123456 1 10 0001 0
# 2 2015-01-02 2015 123456 1 10 0001 0
# 3 2015-01-03 2015 123456 1 10 0001 0
# 4 2015-01-04 2015 123456 1 10 0001 1
# 5 2015-01-05 2015 123456 1 10 0001 2
# 6 2015-01-01 2015 123456 1 11 0002 0
# 7 2015-01-02 2015 123456 1 11 0002 0
# 8 2015-01-03 2015 123456 1 11 0002 1
# 9 2015-01-04 2015 123456 1 11 0002 2
#10 2015-01-05 2015 123456 1 11 0002 3
#11 2015-01-01 2015 234567 2 10 0001 1
#12 2015-01-02 2015 234567 2 10 0001 2
#13 2015-01-03 2015 234567 2 10 0001 3
#14 2015-01-04 2015 234567 2 10 0001 4
#15 2015-01-05 2015 234567 2 10 0001 5
我得到一个错误:seq.int0,to0-from中的错误,by:to必须是一个有限数。seq.int0,to0-from中的错误,by:to必须是一个有限数。@MelissaDureiko。我猜在你的例子中,它是有效的,对吗?。因此,它必须与原始数据相关,其中可能缺少值,或者您是否可以检查Date列是否为Date Classis在本例中,它可以工作,但我的实际数据集将持续到2019年7月1日。我检查过了,Date列实际上是一个Date类。在控制台中,错误是读取无法解析的所有格式。找不到格式。@MelissaDureiko。您的日期格式为%m/d/%Y或%d/%m/%Y。您的日期列中是否有多种格式?