R 如何添加行并通过多个变量推断数据?
我试图为“日”添加缺失的行,并为“值”推断数据。在我的数据中,每个受试者(“id”)有2个时段(时段1和时段2)和连续天数的值 我的数据示例如下所示:R 如何添加行并通过多个变量推断数据?,r,R,我试图为“日”添加缺失的行,并为“值”推断数据。在我的数据中,每个受试者(“id”)有2个时段(时段1和时段2)和连续天数的值 我的数据示例如下所示: df <- data.frame( id = c(1,1,1,1, 1,1,1,1, 2,2,2,2, 2,2,2,2, 3,3,3,3, 3,3,3,3), period = c(1,1,1,1, 2,2,2,2, 1,1,1,1, 2,2,2,2, 1,1,1,1, 2,2,2,2), day=
df <- data.frame(
id = c(1,1,1,1, 1,1,1,1, 2,2,2,2, 2,2,2,2, 3,3,3,3, 3,3,3,3),
period = c(1,1,1,1, 2,2,2,2, 1,1,1,1, 2,2,2,2, 1,1,1,1, 2,2,2,2),
day= c(1,2,4,5, 1,3,4,5, 2,3,4,5, 1,2,3,5, 2,3,4,5, 1,2,3,4),
value =c(10,12,15,16, 11,14,15,17, 13,14,15,16, 15,16,18,20, 16,17,19,29, 14,16,18,20))
df我们可以使用complete
library(dplyr)
library(tidyr)
library(forecast)
df %>%
group_by(id, period) %>%
complete(day =1:7)%>%
mutate(value = as.numeric(na.interp(value)))
@akrun的答案很好,只要你不介意使用线性插值。但是,如果您确实想使用线性模型,可以尝试这种data.table方法
library(data.table)
model <- lm(value ~ day + period + id,data=df)
dt <- as.data.table(df)[,.SD[,.(day = 1:7,value = value[match(1:7,day)])],by=.(id,period)]
dt[is.na(value), value := predict(model,.SD),]
dt
id period day value
1: 1 1 1 10.00000
2: 1 1 2 12.00000
3: 1 1 3 12.86714
4: 1 1 4 15.00000
5: 1 1 5 16.00000
6: 1 1 6 18.13725
7: 1 1 7 19.89396
8: 1 2 1 11.00000
9: 1 2 2 12.15545
10: 1 2 3 14.00000
11: 1 2 4 15.00000
12: 1 2 5 17.00000
13: 1 2 6 19.18227
14: 1 2 7 20.93898
15: 2 1 1 11.90102
16: 2 1 2 13.00000
17: 2 1 3 14.00000
18: 2 1 4 15.00000
19: 2 1 5 16.00000
20: 2 1 6 20.68455
21: 2 1 7 22.44125
22: 2 2 1 15.00000
23: 2 2 2 16.00000
24: 2 2 3 18.00000
25: 2 2 4 18.21616
26: 2 2 5 20.00000
27: 2 2 6 21.72957
28: 2 2 7 23.48627
29: 3 1 1 14.44831
30: 3 1 2 16.00000
31: 3 1 3 17.00000
32: 3 1 4 19.00000
33: 3 1 5 29.00000
34: 3 1 6 23.23184
35: 3 1 7 24.98855
36: 3 2 1 14.00000
37: 3 2 2 16.00000
38: 3 2 3 18.00000
39: 3 2 4 20.00000
40: 3 2 5 22.52016
41: 3 2 6 24.27686
42: 3 2 7 26.03357
id period day value
库(data.table)
该模型解决了第一部分,但无法使用:df3@Murat推断na。interp
正在进行线性插值。我第一眼就意识到您的编辑,抱歉。伟大的解决方案!非常感谢你!实际上我还有一个问题,我意识到我不需要外推的其他列(不包括在样本数据中)没有外推,我很难完成扩展。添加的行中应继续使用相同的值。我不确定我是否应该提出另一个问题,或者是否能够就这个问题发表评论?
library(data.table)
model <- lm(value ~ day + period + id,data=df)
dt <- as.data.table(df)[,.SD[,.(day = 1:7,value = value[match(1:7,day)])],by=.(id,period)]
dt[is.na(value), value := predict(model,.SD),]
dt
id period day value
1: 1 1 1 10.00000
2: 1 1 2 12.00000
3: 1 1 3 12.86714
4: 1 1 4 15.00000
5: 1 1 5 16.00000
6: 1 1 6 18.13725
7: 1 1 7 19.89396
8: 1 2 1 11.00000
9: 1 2 2 12.15545
10: 1 2 3 14.00000
11: 1 2 4 15.00000
12: 1 2 5 17.00000
13: 1 2 6 19.18227
14: 1 2 7 20.93898
15: 2 1 1 11.90102
16: 2 1 2 13.00000
17: 2 1 3 14.00000
18: 2 1 4 15.00000
19: 2 1 5 16.00000
20: 2 1 6 20.68455
21: 2 1 7 22.44125
22: 2 2 1 15.00000
23: 2 2 2 16.00000
24: 2 2 3 18.00000
25: 2 2 4 18.21616
26: 2 2 5 20.00000
27: 2 2 6 21.72957
28: 2 2 7 23.48627
29: 3 1 1 14.44831
30: 3 1 2 16.00000
31: 3 1 3 17.00000
32: 3 1 4 19.00000
33: 3 1 5 29.00000
34: 3 1 6 23.23184
35: 3 1 7 24.98855
36: 3 2 1 14.00000
37: 3 2 2 16.00000
38: 3 2 3 18.00000
39: 3 2 4 20.00000
40: 3 2 5 22.52016
41: 3 2 6 24.27686
42: 3 2 7 26.03357
id period day value