R 在插补数据框中包含ID变量
我正在使用R 在插补数据框中包含ID变量,r,missing-data,r-mice,R,Missing Data,R Mice,我正在使用库(mice)来估算缺失的数据。我想要一种方法告诉老鼠,ID变量应该包含在插补数据集中,但不用于插补 比如说 #making a silly data frame with missing data library(tidyverse) library(magrittr) library(mice) d1 <- data.frame( id = str_c( letters[1:20] %>% rep(each = 5), 1:5 %&g
库(mice)
来估算缺失的数据。我想要一种方法告诉老鼠
,ID变量应该包含在插补数据集中,但不用于插补
比如说
#making a silly data frame with missing data
library(tidyverse)
library(magrittr)
library(mice)
d1 <- data.frame(
id = str_c(
letters[1:20] %>%
rep(each = 5),
1:5 %>%
rep(times = 20)
),
v1 = runif(100),
v2 = runif(100),
v3 = runif(100)
)
d1[, -1] %<>%
map(
function(i){
i[extract(sample(1:100, 5, F))] <- NA
i
}
)
如何将
d1$id
作为变量包含在每个插补数据帧中?有两种方法。首先,只需将id
附加到插补数据集
d2 <- complete(m1,'long', include = T) # imputed datasets in long format (including the original)
d3 <- cbind(d1$id,d2) # as datasets are ordered simply cbind `id`
m2 <- as.mids(d3) # and transform back to mids object
您还可以防止鼠标输入变量,但由于变量不包含缺失值,因此不必这样做(鼠标将自动跳过该变量)
d2 <- complete(m1,'long', include = T) # imputed datasets in long format (including the original)
d3 <- cbind(d1$id,d2) # as datasets are ordered simply cbind `id`
m2 <- as.mids(d3) # and transform back to mids object
ini <- mice(d1,maxit=0) # dry run without iterations to get the predictor matrix
pred1 <- ini$predictorMatrix # this is your predictor matrix
pred1[,'id'] <- 0 # set all id column values to zero to exclude it as a predictor
m1 <-mice(d1, pred = pred1) # use the new matrix in mice