使用R中的.mids对象创建训练数据集_R_Training Data_R Mice

使用R中的.mids对象创建训练数据集

使用R中的.mids对象创建训练数据集,r,training-data,r-mice,R,Training Data,R Mice,我有缺少组件的数据，所以我运行了mice算法（从packagemice）。该函数返回一个.mids对象，我想将其拆分为一个训练数据集和一个测试数据集，以评估模型适合性。我希望培训和测试数据也是.mids格式的，以便它们可以与各种其他功能（如pool）相结合，根据Rubin的规则调整标准误差下面是我的尝试，我只是从数据中删除行以获得一个训练集： library(mice) data <- mice(nhanes,m=2,maxit=5,seed=1) set.seed(2) rand &

我有缺少组件的数据，所以我运行了mice算法（从package

mice

）。该函数返回一个.mids对象，我想将其拆分为一个训练数据集和一个测试数据集，以评估模型适合性。我希望培训和测试数据也是.mids格式的，以便它们可以与各种其他功能（如

pool

）相结合，根据Rubin的规则调整标准误差

下面是我的尝试，我只是从数据中删除行以获得一个训练集：

library(mice)
data <- mice(nhanes,m=2,maxit=5,seed=1)

set.seed(2)
rand <- (1:nrow(nhanes))*rbinom(nrow(nhanes),size=1,prob=0.7)
train <- data
train$data <- train$data[rand,]

我遇到一个错误，它试图用7行替换9行（可能是因为我减少了train$数据中的行数，而没有调整其他内容）

任何帮助都将不胜感激。

一种方法是循环查看

完整的

数据集，然后将

mira

类分配给列表，这将允许

pool

ing。（这正是

鼠标：：：with.mids

所做的）

不抽样的例子

library(mice)

imp <- mice(nhanes,m=2, maxit=5, seed=1)

# With in-built pooling
pool(with(imp, lm(bmi ~ chl + age)))

# Pooled coefficients:
# (Intercept)         chl         age 
# 21.38496144  0.05975537 -3.40773396 
# 
# Fraction of information about the coefficients missing due to nonresponse: 
# (Intercept)         chl         age 
#   0.6186312   0.1060668   0.7380962 

# looping manually
mod <- list(analyses=vector("list", imp$m))

for(i in 1:imp$m){
  mod$analyses[[i]] <- lm(bmi ~ chl + age, data=complete(imp, i))
}

class(mod) <- c("mira", "matrix")
pool(mod)

# Pooled coefficients:
# (Intercept)         chl         age 
# 21.38496144  0.05975537 -3.40773396 
# 
# Fraction of information about the coefficients missing due to nonresponse: 
# (Intercept)         chl         age 
#   0.6186312   0.1060668   0.7380962

库（鼠标）
接口信息处理器
library(mice)

imp <- mice(nhanes,m=2, maxit=5, seed=1)

# With in-built pooling
pool(with(imp, lm(bmi ~ chl + age)))

# Pooled coefficients:
# (Intercept)         chl         age 
# 21.38496144  0.05975537 -3.40773396 
# 
# Fraction of information about the coefficients missing due to nonresponse: 
# (Intercept)         chl         age 
#   0.6186312   0.1060668   0.7380962 

# looping manually
mod <- list(analyses=vector("list", imp$m))

for(i in 1:imp$m){
  mod$analyses[[i]] <- lm(bmi ~ chl + age, data=complete(imp, i))
}

class(mod) <- c("mira", "matrix")
pool(mod)

# Pooled coefficients:
# (Intercept)         chl         age 
# 21.38496144  0.05975537 -3.40773396 
# 
# Fraction of information about the coefficients missing due to nonresponse: 
# (Intercept)         chl         age 
#   0.6186312   0.1060668   0.7380962 

mod <- list(analyses=vector("list", imp$m))

set.seed(1)
for(i in 1:imp$m){
  rand <- (1:nrow(nhanes))*rbinom(nrow(nhanes),size=1,prob=0.7)
  mod$analyses[[i]] <- lm(bmi ~ chl + age, data=complete(imp, i)[rand,])
}

class(mod) <- c("mira", "matrix")
pool(mod)

# Pooled coefficients:
# (Intercept)         chl         age 
# 21.72382272  0.06468044 -4.23387415 
# 
# Fraction of information about the coefficients missing due to nonresponse: 
# (Intercept)         chl         age 
#   0.1496987   0.4497024   0.6101340