R emmeans在模型规范和置信区间方面的意外行为_R_Glm_Emmeans

R emmeans在模型规范和置信区间方面的意外行为

R emmeans在模型规范和置信区间方面的意外行为,r,glm,emmeans,R,Glm,Emmeans,我的数据是包含许多零的整数。我想用二项式广义线性模型分别对零进行建模。在model语句中，我在tilde的左侧指定了Y>0，这给了我一个二进制（TRUE，FALSE）向量。我使用指定（type=“response”）的emmeans包进一步分析了数据。然后我意识到（在我的实际数据上）置信区间似乎是错的。我尝试解决这个问题，并决定创建一个新变量，分别包含数据帧中的TRUE和FALSE值。这解决了问题。为什么会这样下面是重现此行为的代码（尽管其效果不如我的原始数据集中的发音）：下面是使用新变量的

我的数据是包含许多零的整数。我想用二项式广义线性模型分别对零进行建模。在model语句中，我在tilde的左侧指定了

Y>0

，这给了我一个二进制（

TRUE

，

FALSE

）向量。我使用指定（

type=“response”

）的

emmeans

包进一步分析了数据。然后我意识到（在我的实际数据上）置信区间似乎是错的。我尝试解决这个问题，并决定创建一个新变量，分别包含数据帧中的

TRUE

和

FALSE

值。这解决了问题。为什么会这样

下面是重现此行为的代码（尽管其效果不如我的原始数据集中的发音）：

下面是使用新变量的第二个模型：

# binomial GLM using variable no0
m2 <- glm(no0 ~ X, family = binomial(), d)
summary(m2)

Call:
glm(formula = no0 ~ X, family = binomial(), data = d)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.1460  -1.1774   0.4590   0.7954   1.1774  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)  
(Intercept)   2.1972     1.0540   2.085   0.0371 *
XB           -0.8109     1.3175  -0.615   0.5382  
XC           -2.1972     1.2292  -1.788   0.0739 .
XD           -2.1972     1.2292  -1.788   0.0739 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 50.446  on 39  degrees of freedom
Residual deviance: 44.236  on 36  degrees of freedom
AIC: 52.236

Number of Fisher Scoring iterations: 4

一切都好。但是当我添加

type=response

参数时，除了置信区间不同外，所有参数看起来都不错（比较下面的两个输出）：

但是当我添加

comparations=T

参数时，置信区间现在是相同的，但是，两者都匹配基于模型中

Y>0

规范的模型（请参见

m3

和

em3

）

p3发生的是emmeans允许同时存在响应转换和链接功能的情况。例如，当您使用gamma族、反向链接和平方根响应变换拟合模型时，这非常方便。但是，在这种情况下，
被视为响应转换：
> emm1 <- emmeans(m1, "X")

> str(emm1)
'emmGrid' object with variables:
    X = A, B, C, D
Transformation: “logit” 
Additional response transformation: “>” 

> emm1a <- update(emm1, tran2 = NULL)
> confint(emm1a, type = "response")
 X response     SE  df asymp.LCL asymp.UCL
 A      0.9 0.0949 Inf     0.533     0.986
 B      0.8 0.1265 Inf     0.459     0.950
 C      0.5 0.1581 Inf     0.225     0.775
 D      0.5 0.1581 Inf     0.225     0.775

Confidence level used: 0.95 
Intervals are back-transformed from the logit scale 

。。。或者通过删除第二个转换：
> emm1 <- emmeans(m1, "X")

> str(emm1)
'emmGrid' object with variables:
    X = A, B, C, D
Transformation: “logit” 
Additional response transformation: “>” 

> emm1a <- update(emm1, tran2 = NULL)
> confint(emm1a, type = "response")
 X response     SE  df asymp.LCL asymp.UCL
 A      0.9 0.0949 Inf     0.533     0.986
 B      0.8 0.1265 Inf     0.459     0.950
 C      0.5 0.1581 Inf     0.225     0.775
 D      0.5 0.1581 Inf     0.225     0.775

Confidence level used: 0.95 
Intervals are back-transformed from the logit scale 

我会考虑是否可以做出可靠的确定响应转换何时不明确的变化。
另一个问题：在图<代码> P3 < /代码>和<代码> P4<代码>但是，置信区间与<代码> EM3和<代码> EM4< /代码>的输出不匹配。相反，p3
和p4
置信区间与em1
输出匹配，后者使用X>0~。
和type=response。与曲线图相关的问题实际上是完全独立的。我做了一个设计决定，在请求进行比较时，只对对象重新进行网格化，因为链接尺度上的比较不能返回到响应尺度上的比较。我会考虑这是不是明智之举，如果不是，那要解决什么问题。
p1 <- plot(em3, comparisons = F) + scale_x_continuous(limits = c(0,1.1)) + ggtitle("Y>0 ~.; and comparisons = F")
p2 <- plot(em4, comparisons = F) + scale_x_continuous(limits = c(0,1.1)) + ggtitle("no0 ~.; and comparisons = F")
gridExtra::grid.arrange(p1, p2, nrow = 2)

p3 <- plot(em3, comparisons = T) + scale_x_continuous(limits = c(0,1.1)) + ggtitle("Y>0 ~.; and comparisons = T")
p4 <- plot(em4, comparisons = T) + scale_x_continuous(limits = c(0,1.1))+ ggtitle("no0 ~.; and comparisons = T")
gridExtra::grid.arrange(p3, p4, nrow = 2)

> emm1 <- emmeans(m1, "X")

> str(emm1)
'emmGrid' object with variables:
    X = A, B, C, D
Transformation: “logit” 
Additional response transformation: “>” 

> confint(emm1, type = "unlink")
 X response     SE  df asymp.LCL asymp.UCL
 A      0.9 0.0949 Inf     0.533     0.986
 B      0.8 0.1265 Inf     0.459     0.950
 C      0.5 0.1581 Inf     0.225     0.775
 D      0.5 0.1581 Inf     0.225     0.775

Confidence level used: 0.95 
Intervals are back-transformed from the logit scale 

> emm1a <- update(emm1, tran2 = NULL)
> confint(emm1a, type = "response")
 X response     SE  df asymp.LCL asymp.UCL
 A      0.9 0.0949 Inf     0.533     0.986
 B      0.8 0.1265 Inf     0.459     0.950
 C      0.5 0.1581 Inf     0.225     0.775
 D      0.5 0.1581 Inf     0.225     0.775

Confidence level used: 0.95 
Intervals are back-transformed from the logit scale 

> confint(regrid(emm1, transform = "unlink"))
 X response     SE  df asymp.LCL asymp.UCL
 A      0.9 0.0949 Inf     0.714      1.09
 B      0.8 0.1265 Inf     0.552      1.05
 C      0.5 0.1581 Inf     0.190      0.81
 D      0.5 0.1581 Inf     0.190      0.81

Results are given on the > (not the response) scale. 
Confidence level used: 0.95