merTools predictInterval（）的预测顺序_R_Lme4

merTools predictInterval（）的预测顺序

merTools predictInterval（）的预测顺序,r,lme4,R,Lme4,我遇到了来自merTools的predictInterval（）的问题。与使用标准的predict（）方法对lme4进行的数据和中点预测相比，预测似乎不符合顺序。我不能用模拟数据重现问题，所以我能做的最好的事情就是显示lmerMod对象和我的一些数据 > # display input data to the model > head(inputData) id y x z 1 calibration19 1.336 0.531 001

我遇到了来自

merTools

的

predictInterval（）

的问题。与使用标准的

predict（）

方法对

lme4

进行的数据和中点预测相比，预测似乎不符合顺序。我不能用模拟数据重现问题，所以我能做的最好的事情就是显示

lmerMod

对象和我的一些数据

> # display input data to the model
> head(inputData)
             id     y     x   z
1 calibration19 1.336 0.531 001
2 calibration20 1.336 0.433 001
3 calibration22 0.042 0.432 001
4 calibration23 0.042 0.423 001
5 calibration16 3.300 0.491 001
6 calibration17 3.300 0.465 001
> sapply(inputData, class)
       id         y         x         z 
 "factor" "numeric" "numeric"  "factor" 
> 
> # fit mixed effects regression with random intercept on z
> lmeFit = lmer(y ~ x + (1 | z), inputData)
> 
> # display lmerMod object
> lmeFit
Linear mixed model fit by REML ['lmerMod']
Formula: y ~ x + (1 | z)
   Data: inputData
REML criterion at convergence: 444.245
Random effects:
 Groups   Name        Std.Dev.
 z        (Intercept) 0.3097  
 Residual             0.9682  
Number of obs: 157, groups:  z, 17
Fixed Effects:
(Intercept)            x  
    -0.4291       5.5638  
> 
> # display new data to predict in
> head(predData)
           id     x   z
1 29999900108 0.343 001
2 29999900207 0.315 001
3 29999900306 0.336 001
4 29999900405 0.408 001
5 29999900504 0.369 001
6 29999900603 0.282 001
> sapply(predData, class)
       id         x         z 
 "factor" "numeric"  "factor" 
> 
> # estimate fitted values using predict()
> set.seed(1)
> preds_mid = predict(lmeFit, newdata=predData)
> 
> # estimate fitted values using predictInterval()
> set.seed(1)
> preds_interval = predictInterval(lmeFit, newdata=predData, n.sims=1000) # wrong order
> 
> # estimate fitted values just for the first observation to confirm that it should be similar to preds_mid
> set.seed(1)
> preds_interval_first_row = predictInterval(lmeFit, newdata=predData[1,], n.sims=1000)
> 
> # display results
> head(preds_mid) # correct prediction
       1        2        3        4        5        6 
1.256860 1.101074 1.217913 1.618505 1.401518 0.917470 
> head(preds_interval) # incorrect order
       fit      upr          lwr
1 1.512410 2.694813  0.133571198
2 1.273143 2.521899  0.009878347
3 1.398273 2.785358  0.232501376
4 1.878165 3.188086  0.625161201
5 1.605049 2.813737  0.379167003
6 1.147415 2.417980 -0.108547846
> preds_interval_first_row # correct prediction
       fit      upr         lwr
1 1.244366 2.537451 -0.04911808
> preds_interval[round(preds_interval$fit,3)==round(preds_interval_first_row$fit,3),] # the correct prediction ends up as observation 1033
          fit      upr           lwr
1033 1.244261 2.457012 -0.0001299777
>

换言之，根据

predict（）

方法，我的数据帧

predictdata

的第一次观测值应该在1.25左右，但使用

predictInterval（）

方法，它的值应该在1.5左右。这似乎不仅仅是由于预测方法的不同，因为如果我将

newdata

参数限制为

predData

的第一行，则得到的拟合值约为1.25，正如预期的那样

我不能用模拟数据重现问题，这一事实使我相信这与我的输入或预测数据的属性有关。我尝试将因子变量重新分类为字符，在拟合模型之前，在拟合模型和预测之间，强制执行行的顺序，但没有成功

这是一个已知的问题吗？我能做些什么来避免它呢？

我试图为这个问题提供一个最小的可重复的例子，但没有成功

library(merTools)
d <- data.frame(x = rnorm(1000), z = sample(1:25L, 1000, replace=TRUE),
              id = sample(LETTERS, 1000, replace = TRUE))
d$z <- as.factor(d$z)
d$id <- factor(d$id)
d$y <- simulate(~x+(1|z),family = gaussian,
              newdata=d,
              newparams=list(beta=c(2, -1.1), theta=c(.25),
                             sigma = c(.23)), seed =463)[[1]]
 lmeFit <- lmer(y ~ x + (1|z), data = d)
 predData <- data.frame(x = rnorm(25), z = sample(1:25L, 25, replace=TRUE),
              id = sample(LETTERS, 25, replace = TRUE))
predData$z <- as.factor(predData$z)
predData$id <- factor(predData$id)
predict(lmeFit, predData)
predictInterval(lmeFit, predData)
predictInterval(lmeFit, predData[1, ])

库（merTools）
d我这里没有答案，但我是merTools
的开发者之一，这似乎是一个相当大的问题。我的猜测是，当我们试图保持在模型中观察到但在预测数据中未观察到的因子水平时，这可能会发生。我会看看我是否能复制这个。您能确认您正在使用来自CRAN的最新版本的merTools吗？谢谢您的回复。根据sessionInfo（）
，我正在使用merTools\u 0.2.0
。然而，我正在运行Rx64 3.2.2，merTools注意到它是为3.2.4构建的。我的一个想法是，我的predData
对象中的id
变量是一组存储为字符的数字（这不是我的想法）。predictInterval（）
是否可以将其转换为数字？需要指出的是，我可以首先轻松地测试这种情况。