空间计量经济模型中p.值的计算:summary()和texreg()之间为什么不一致?

空间计量经济模型中p.值的计算:summary()和texreg()之间为什么不一致?,r,spatial,p-value,texreg,R,Spatial,P Value,Texreg,我估计了一些空间计量经济模型,其中既包含空间自回归项rho,也包含空间误差项lambda。在尝试传达我的结果时,我使用了texreg包,它接受我正在使用的sacsarlm模型。但是,我注意到,texreg正在为相同的rho和lambda参数打印p值Texreg似乎正在返回model@LR1$p.value模型对象的插槽 参数rho和lambda的大小不同,标准误差也不同,因此它们不应具有等效的p值。如果我对模型对象调用summary,我会得到唯一的p值,但尽管在str(model)调用中遍历了每

我估计了一些空间计量经济模型,其中既包含空间自回归项rho,也包含空间误差项lambda。在尝试传达我的结果时,我使用了
texreg
包,它接受我正在使用的
sacsarlm
模型。但是,我注意到,
texreg
正在为相同的rho和lambda参数打印p值
Texreg
似乎正在返回
model@LR1$p.value
模型对象的插槽

参数rho和lambda的大小不同,标准误差也不同,因此它们不应具有等效的p值。如果我对模型对象调用summary,我会得到唯一的p值,但尽管在
str(model)
调用中遍历了每个元素,但无法确定这些值存储在模型对象中的什么位置

我的问题有两个:

  • 我认为这是texreg(和screenreg等)函数中的错误,还是我的解释有误
  • 如何计算正确的p值或在模型对象中找到它(我正在为texreg编写一个新的提取函数,需要找到正确的值)
  • 下面是一个显示问题的最小示例:

    library(spdep)
    library(texreg)
    set.seed(42)
    W.ran <- matrix(rbinom(100*100, 1, .3),nrow=100)
    X <- rnorm(100)
    Y <- .2 * X + rnorm(100) + .9*(W.ran %*% X)
    
    W.test <- mat2listw(W.ran)
    model <- sacsarlm(Y~X, type = "sacmixed",
                    listw=W.test, zero.policy=TRUE)
    summary(model)
    
    Call:sacsarlm(formula = Y ~ X, listw = W.test, type = "sacmixed", 
        zero.policy = TRUE)
    
    Residuals:
          Min        1Q    Median        3Q       Max 
    -2.379283 -0.750922  0.036044  0.675951  2.577148 
    
    Type: sacmixed 
    Coefficients: (asymptotic standard errors) 
                       Estimate  Std. Error z value Pr(>|z|)
    (Intercept)      0.91037455  0.65700059  1.3857   0.1659
    X               -0.00076362  0.10330510 -0.0074   0.9941
    lag.(Intercept) -0.03193863  0.02310075 -1.3826   0.1668
    lag.X            0.89764491  0.02231353 40.2287   <2e-16
    
    Rho: -0.0028541
    Asymptotic standard error: 0.0059647
        z-value: -0.47849, p-value: 0.6323
    Lambda: -0.020578
    Asymptotic standard error: 0.020057
        z-value: -1.026, p-value: 0.3049
    
    LR test value: 288.74, p-value: < 2.22e-16
    
    Log likelihood: -145.4423 for sacmixed model
    ML residual variance (sigma squared): 1.0851, (sigma: 1.0417)
    Number of observations: 100 
    Number of parameters estimated: 7 
    AIC: 304.88, (AIC for lm: 585.63)
    
    screenreg(model)
    
    =================================
                          Model 1    
    ---------------------------------
    (Intercept)              0.91    
                            (0.66)   
    X                       -0.00    
                            (0.10)   
    lag.(Intercept)         -0.03    
                            (0.02)   
    lag.X                    0.90 ***
                            (0.02)   
    ---------------------------------
    Num. obs.              100       
    Parameters               7       
    AIC (Linear model)     585.63    
    AIC (Spatial model)    304.88    
    Log Likelihood        -145.44    
    Wald test: statistic     1.05    
    Wald test: p-value       0.90    
    Lambda: statistic       -0.02    
    Lambda: p-value          0.00    
    Rho: statistic          -0.00    
    Rho: p-value             0.00    
    =================================
    *** p < 0.001, ** p < 0.01, * p < 0.05
    
    库(spdep)
    图书馆(texreg)
    种子(42)
    
    W.ran
    texreg
    作者在这里。谢谢你抓到这个。如我在回复中所述,
    texreg
    使用
    extract
    方法从任何(当前支持70多种)模型对象类型中检索相关信息。对于
    sarlm
    对象,方法的GOF部分似乎存在故障

    以下是该方法当前的外观(从
    texreg
    1.36.13开始):

    您可以看到,该函数首先区分不同的子模型(
    error
    vs.
    sac
    /
    sacmixed
    vs.else),然后决定使用哪些标准错误,然后动态计算p值,而不将其保存在任何位置

    因此,这也是我们在
    extract
    方法中需要做的,以便获得与
    spdep
    包中的
    summary
    方法相同的结果。我们还需要将其从GOF块移到表的系数块(有关讨论,请参阅下面的注释部分)。以下是我在
    提取方法中采用他们方法的尝试:

    # extension for sarlm objects (spdep package)
    extract.sarlm <- function(model, include.nobs = TRUE, include.loglik = TRUE,
        include.aic = TRUE, include.lr = TRUE, include.wald = TRUE, ...) {
      s <- summary(model, ...)
    
      names <- rownames(s$Coef)
      cf <- s$Coef[, 1]
      se <- s$Coef[, 2]
      p <- s$Coef[, ncol(s$Coef)]
    
      if (model$type != "error") {  # include coefficient for autocorrelation term
        rho <- model$rho
        cf <- c(cf, rho)
        names <- c(names, "$\\rho$")
        if (!is.null(model$rho.se)) {
          if (!is.null(model$adj.se)) {
            rho.se <- sqrt((model$rho.se^2) * model$adj.se)   
          } else {
            rho.se <- model$rho.se
          }
          rho.pval <- 2 * (1 - pnorm(abs(rho / rho.se)))
          se <- c(se, rho.se)
          p <- c(p, rho.pval)
        } else {
          se <- c(se, NA)
          p <- c(p, NA)
        }
      }
    
      if (!is.null(model$lambda)) {
        cf <-c(cf, model$lambda)
        names <- c(names, "$\\lambda$")  
        if (!is.null(model$lambda.se)) {
          if (!is.null(model$adj.se)) {
            lambda.se <- sqrt((model$lambda.se^2) * model$adj.se)   
          } else {
            lambda.se <- model$lambda.se
          }
          lambda.pval <- 2 * (1 - pnorm(abs(model$lambda / lambda.se)))
          se <- c(se, lambda.se)
          p <- c(p, lambda.pval)
        } else {
          se <- c(se, NA)
          p <- c(p, NA)
        }
      }
    
      gof <- numeric()
      gof.names <- character()
      gof.decimal <- logical()
    
      if (include.nobs == TRUE) {
        n <- length(s$fitted.values)
        param <- s$parameters
        gof <- c(gof, n, param)
        gof.names <- c(gof.names, "Num.\ obs.", "Parameters")
        gof.decimal <- c(gof.decimal, FALSE, FALSE)
      }
      if (include.loglik == TRUE) {
        ll <- s$LL
        gof <- c(gof, ll)
        gof.names <- c(gof.names, "Log Likelihood")
        gof.decimal <- c(gof.decimal, TRUE)
      }
      if (include.aic == TRUE) {
        aic <- AIC(model)
        aiclm <- s$AIC_lm.model
        gof <- c(gof, aiclm, aic)
        gof.names <- c(gof.names, "AIC (Linear model)", "AIC (Spatial model)")
        gof.decimal <- c(gof.decimal, TRUE, TRUE)
      }
      if (include.lr == TRUE && !is.null(s$LR1)) {
        gof <- c(gof, s$LR1$statistic[[1]], s$LR1$p.value[[1]])
        gof.names <- c(gof.names, "LR test: statistic", "LR test: p-value")
        gof.decimal <- c(gof.decimal, TRUE, TRUE)
      }
      if (include.wald == TRUE && !is.null(model$Wald1)) {
        waldstat <- model$Wald1$statistic
        waldp <- model$Wald1$p.value
        gof <- c(gof, waldstat, waldp)
        gof.names <- c(gof.names, "Wald test: statistic", "Wald test: p-value")
        gof.decimal <- c(gof.decimal, TRUE, TRUE)
      }
    
      tr <- createTexreg(
          coef.names = names, 
          coef = cf, 
          se = se, 
          pvalues = p, 
          gof.names = gof.names, 
          gof = gof, 
          gof.decimal = gof.decimal
      )
      return(tr)
    }
    
    setMethod("extract", signature = className("sarlm", "spdep"), 
        definition = extract.sarlm)
    

    是的,有时R有点模糊-当您调用summary时,您感兴趣的值可能不是从mode对象中提取的,而是由summary函数在此时计算的。这不是一个完整的答案,但您应该查看
    broom
    包,它可以在易于使用的data.frame(通常包括p值等)中提供重要的模型信息。为什么不手动替换
    texreg
    输出中的值呢?它吐出乳胶或html毕竟…我会看看扫帚包,谢谢@jakub。我正在一个Swave文档中生成texreg的latex输出,该文档正在编织到最终产品中,因此手动编辑latex是不现实的。根据位于的完整列表,
    broom
    还不支持这些对象。我喜欢
    broom
    的概念,但我认为
    texreg
    的泛型
    extract
    函数的概念是一样的,主要区别在于提取的输出被交给几个函数来创建回归表(在
    texreg
    中),而不是数据帧(在
    broom
    中)。这是一个非常彻底的答案,谢谢!事实上,我已经为我正在估算的sacsarlm模型编写了(一个不太完整的)提取函数(我想要一些特殊的更改,以及将rho和lambda的位置移到系数上),并以通常的方式计算p值。您的想法是在spdep源代码中查找摘要的确切内容(而不是通过猜测反向工程)更合理。此外,我认为使用提取函数扩展texreg是一个好主意,我非常感谢您以这种方式编写它!很高兴它起到了作用。您认为GOF块中项目的理想顺序是什么?实际上,我将Rho和Lambda移到了顶部的系数列中。Rho是主要的coef我对我的研究很感兴趣,因为我正在用它来捕捉网络分析中的同侪效应。使用这些模型并不少见,如果是这种情况,那么你当然希望空间自相关系数显示为系数。这实际上就是extract.lnam()的方法如果您使用sna软件包中这种空间模型的lnam版本(速度要慢得多,这就是我使用spdep的原因),是否会发生这种情况.我不是说提取函数必须改变才能这样工作。我怀疑一些研究人员在对其他关系建模感兴趣时,正在拟合空间模型来控制空间内生性,而对于他们来说,将其作为GOF可能更有意义。然而,似乎使rho和lambda系数might更灵活:如果你认为它们是GOF,如果它们显示为系数,那没什么大不了的,但是如果你对系数的大小感兴趣,如果它们显示为GOF度量,那就很烦人了。这只是我的意见。这很有意义。我已经将rho和lambda移到系数块并更新了答案。它有趣的是,您正在使用
    spdep
    进行网络分析。您可能还想看看我的
    tnam
    软件包,用于
    R
    (实现时态网络自相关模型)。这是您使用的模型的扩展,用于任意类型的网络相关性以及面板数据(如果需要).除了空间
    print.summary.sarlm <- function(x, digits = max(5, .Options$digits - 3),
        signif.stars = FALSE, ...)
    {
        cat("\nCall:", deparse(x$call), sep = "", fill=TRUE)
           if (x$type == "error") if (isTRUE(all.equal(x$lambda, x$interval[1])) ||
                isTRUE(all.equal(x$lambda, x$interval[2]))) 
                warning("lambda on interval bound - results should not be used")
           if (x$type == "lag" || x$type == "mixed")
                if (isTRUE(all.equal(x$rho, x$interval[1])) ||
                isTRUE(all.equal(x$rho, x$interval[2]))) 
                warning("rho on interval bound - results should not be used")
        cat("\nResiduals:\n")
        resid <- residuals(x)
        nam <- c("Min", "1Q", "Median", "3Q", "Max")
        rq <- if (length(dim(resid)) == 2L) 
            structure(apply(t(resid), 1, quantile), dimnames = list(nam, 
                dimnames(resid)[[2]]))
        else structure(quantile(resid), names = nam)
        print(rq, digits = digits, ...)
        cat("\nType:", x$type, "\n")
        if (x$zero.policy) {
            zero.regs <- attr(x, "zero.regs")
            if (!is.null(zero.regs))
                cat("Regions with no neighbours included:\n",
                zero.regs, "\n")
        }
            if (!is.null(x$coeftitle)) {
            cat("Coefficients:", x$coeftitle, "\n")
            coefs <- x$Coef
            if (!is.null(aliased <- x$aliased) && any(x$aliased)){
            cat("    (", table(aliased)["TRUE"], 
                " not defined because of singularities)\n", sep = "")
            cn <- names(aliased)
            coefs <- matrix(NA, length(aliased), 4, dimnames = list(cn, 
                        colnames(x$Coef)))
                    coefs[!aliased, ] <- x$Coef
            }
            printCoefmat(coefs, signif.stars=signif.stars, digits=digits,
            na.print="NA")
        }
    #   res <- LR.sarlm(x, x$lm.model)
        res <- x$LR1
            pref <- ifelse(x$ase, "Asymptotic", "Approximate (numerical Hessian)")
        if (x$type == "error") {
            cat("\nLambda: ", format(signif(x$lambda, digits)),
                ", LR test value: ", format(signif(res$statistic,
                            digits)), ", p-value: ", format.pval(res$p.value,
                            digits), "\n", sep="")
            if (!is.null(x$lambda.se)) {
                        if (!is.null(x$adj.se)) {
                            x$lambda.se <- sqrt((x$lambda.se^2)*x$adj.se)   
                        }
                cat(pref, " standard error: ", 
                    format(signif(x$lambda.se, digits)),
                ifelse(is.null(x$adj.se), "\n    z-value: ",
                                   "\n    t-value: "), format(signif((x$lambda/
                    x$lambda.se), digits)),
                ", p-value: ", format.pval(2*(1-pnorm(abs(x$lambda/
                    x$lambda.se))), digits), "\n", sep="")
                cat("Wald statistic: ", format(signif(x$Wald1$statistic, 
                digits)), ", p-value: ", format.pval(x$Wald1$p.value, 
                digits), "\n", sep="")
            }
        } else if (x$type == "sac" || x$type == "sacmixed") {
            cat("\nRho: ", format(signif(x$rho, digits)), "\n",
                        sep="")
                    if (!is.null(x$rho.se)) {
                        if (!is.null(x$adj.se)) {
                            x$rho.se <- sqrt((x$rho.se^2)*x$adj.se)   
                        }
              cat(pref, " standard error: ", 
                format(signif(x$rho.se, digits)), 
                            ifelse(is.null(x$adj.se), "\n    z-value: ",
                                   "\n    t-value: "), 
                format(signif((x$rho/x$rho.se), digits)),
                ", p-value: ", format.pval(2 * (1 - pnorm(abs(x$rho/
                    x$rho.se))), digits), "\n", sep="")
                    }
            cat("Lambda: ", format(signif(x$lambda, digits)), "\n", sep="")
            if (!is.null(x$lambda.se)) {
                        pref <- ifelse(x$ase, "Asymptotic",
                            "Approximate (numerical Hessian)")
                        if (!is.null(x$adj.se)) {
                            x$lambda.se <- sqrt((x$lambda.se^2)*x$adj.se)   
                        }
                cat(pref, " standard error: ", 
                    format(signif(x$lambda.se, digits)),
                ifelse(is.null(x$adj.se), "\n    z-value: ",
                                   "\n    t-value: "), format(signif((x$lambda/
                    x$lambda.se), digits)),
                ", p-value: ", format.pval(2*(1-pnorm(abs(x$lambda/
                    x$lambda.se))), digits), "\n", sep="")
                    }
                    cat("\nLR test value: ", format(signif(res$statistic, digits)),
                ", p-value: ", format.pval(res$p.value, digits), "\n",
                        sep="")
            } else {
            cat("\nRho: ", format(signif(x$rho, digits)), 
                        ", LR test value: ", format(signif(res$statistic, digits)),
                ", p-value: ", format.pval(res$p.value, digits), "\n",
                        sep="")
                    if (!is.null(x$rho.se)) {
                        if (!is.null(x$adj.se)) {
                            x$rho.se <- sqrt((x$rho.se^2)*x$adj.se)   
                        }
              cat(pref, " standard error: ", 
                format(signif(x$rho.se, digits)), 
                            ifelse(is.null(x$adj.se), "\n    z-value: ",
                                   "\n    t-value: "), 
                format(signif((x$rho/x$rho.se), digits)),
                ", p-value: ", format.pval(2 * (1 - pnorm(abs(x$rho/
                    x$rho.se))), digits), "\n", sep="")
                    }
            if (!is.null(x$Wald1)) {
                cat("Wald statistic: ", format(signif(x$Wald1$statistic, 
                digits)), ", p-value: ", format.pval(x$Wald1$p.value, 
                digits), "\n", sep="")
            }
    
        }
        cat("\nLog likelihood:", logLik(x), "for", x$type, "model\n")
        cat("ML residual variance (sigma squared): ", 
            format(signif(x$s2, digits)), ", (sigma: ", 
            format(signif(sqrt(x$s2), digits)), ")\n", sep="")
            if (!is.null(x$NK)) cat("Nagelkerke pseudo-R-squared:",
                format(signif(x$NK, digits)), "\n")
        cat("Number of observations:", length(x$residuals), "\n")
        cat("Number of parameters estimated:", x$parameters, "\n")
        cat("AIC: ", format(signif(AIC(x), digits)), ", (AIC for lm: ",
            format(signif(x$AIC_lm.model, digits)), ")\n", sep="")
        if (x$type == "error") {
            if (!is.null(x$Haus)) {
                cat("Hausman test: ", format(signif(x$Haus$statistic, 
                digits)), ", df: ", format(x$Haus$parameter),
                            ", p-value: ", format.pval(x$Haus$p.value, digits),
                            "\n", sep="")
            }
            }
        if ((x$type == "lag" || x$type ==  "mixed") && x$ase) {
            cat("LM test for residual autocorrelation\n")
            cat("test value: ", format(signif(x$LMtest, digits)),
                ", p-value: ", format.pval((1 - pchisq(x$LMtest, 1)), 
                digits), "\n", sep="")
        }
            if (x$type != "error" && !is.null(x$LLCoef)) {
            cat("\nCoefficients: (log likelihood/likelihood ratio)\n")
            printCoefmat(x$LLCoef, signif.stars=signif.stars,
                digits=digits, na.print="NA")
            }
            correl <- x$correlation
            if (!is.null(correl)) {
                p <- NCOL(correl)
                if (p > 1) {
                        cat("\n", x$correltext, "\n")
                        correl <- format(round(correl, 2), nsmall = 2, 
                        digits = digits)
                        correl[!lower.tri(correl)] <- ""
                        print(correl[-1, -p, drop = FALSE], quote = FALSE)
                    }
            }
            cat("\n")
            invisible(x)
    }
    
    # extension for sarlm objects (spdep package)
    extract.sarlm <- function(model, include.nobs = TRUE, include.loglik = TRUE,
        include.aic = TRUE, include.lr = TRUE, include.wald = TRUE, ...) {
      s <- summary(model, ...)
    
      names <- rownames(s$Coef)
      cf <- s$Coef[, 1]
      se <- s$Coef[, 2]
      p <- s$Coef[, ncol(s$Coef)]
    
      if (model$type != "error") {  # include coefficient for autocorrelation term
        rho <- model$rho
        cf <- c(cf, rho)
        names <- c(names, "$\\rho$")
        if (!is.null(model$rho.se)) {
          if (!is.null(model$adj.se)) {
            rho.se <- sqrt((model$rho.se^2) * model$adj.se)   
          } else {
            rho.se <- model$rho.se
          }
          rho.pval <- 2 * (1 - pnorm(abs(rho / rho.se)))
          se <- c(se, rho.se)
          p <- c(p, rho.pval)
        } else {
          se <- c(se, NA)
          p <- c(p, NA)
        }
      }
    
      if (!is.null(model$lambda)) {
        cf <-c(cf, model$lambda)
        names <- c(names, "$\\lambda$")  
        if (!is.null(model$lambda.se)) {
          if (!is.null(model$adj.se)) {
            lambda.se <- sqrt((model$lambda.se^2) * model$adj.se)   
          } else {
            lambda.se <- model$lambda.se
          }
          lambda.pval <- 2 * (1 - pnorm(abs(model$lambda / lambda.se)))
          se <- c(se, lambda.se)
          p <- c(p, lambda.pval)
        } else {
          se <- c(se, NA)
          p <- c(p, NA)
        }
      }
    
      gof <- numeric()
      gof.names <- character()
      gof.decimal <- logical()
    
      if (include.nobs == TRUE) {
        n <- length(s$fitted.values)
        param <- s$parameters
        gof <- c(gof, n, param)
        gof.names <- c(gof.names, "Num.\ obs.", "Parameters")
        gof.decimal <- c(gof.decimal, FALSE, FALSE)
      }
      if (include.loglik == TRUE) {
        ll <- s$LL
        gof <- c(gof, ll)
        gof.names <- c(gof.names, "Log Likelihood")
        gof.decimal <- c(gof.decimal, TRUE)
      }
      if (include.aic == TRUE) {
        aic <- AIC(model)
        aiclm <- s$AIC_lm.model
        gof <- c(gof, aiclm, aic)
        gof.names <- c(gof.names, "AIC (Linear model)", "AIC (Spatial model)")
        gof.decimal <- c(gof.decimal, TRUE, TRUE)
      }
      if (include.lr == TRUE && !is.null(s$LR1)) {
        gof <- c(gof, s$LR1$statistic[[1]], s$LR1$p.value[[1]])
        gof.names <- c(gof.names, "LR test: statistic", "LR test: p-value")
        gof.decimal <- c(gof.decimal, TRUE, TRUE)
      }
      if (include.wald == TRUE && !is.null(model$Wald1)) {
        waldstat <- model$Wald1$statistic
        waldp <- model$Wald1$p.value
        gof <- c(gof, waldstat, waldp)
        gof.names <- c(gof.names, "Wald test: statistic", "Wald test: p-value")
        gof.decimal <- c(gof.decimal, TRUE, TRUE)
      }
    
      tr <- createTexreg(
          coef.names = names, 
          coef = cf, 
          se = se, 
          pvalues = p, 
          gof.names = gof.names, 
          gof = gof, 
          gof.decimal = gof.decimal
      )
      return(tr)
    }
    
    setMethod("extract", signature = className("sarlm", "spdep"), 
        definition = extract.sarlm)
    
    =======================================
                         Model 1           
    ---------------------------------------
    (Intercept)             0.91 (0.66)    
    X                      -0.00 (0.10)    
    lag.(Intercept)        -0.03 (0.02)    
    lag.X                   0.90 (0.02) ***
    rho                    -0.00 (0.01)    
    lambda                 -0.02 (0.02)    
    ---------------------------------------
    Num. obs.             100              
    Parameters              7              
    Log Likelihood       -145.44           
    AIC (Linear model)    585.63           
    AIC (Spatial model)   304.88           
    LR test: statistic    288.74           
    LR test: p-value        0.00           
    =======================================
    *** p < 0.001, ** p < 0.01, * p < 0.05