如何使用相关矩阵或协方差矩阵来获得回归系数和模型拟合,而不是使用R?

如何使用相关矩阵或协方差矩阵来获得回归系数和模型拟合,而不是使用R?,r,regression,linear-regression,lm,R,Regression,Linear Regression,Lm,我希望能够通过提供相关或协方差矩阵而不是data.frame,从多元线性回归中回归系数。我意识到你丢失了一些与确定截距等相关的信息,但它甚至应该是相关矩阵,足以得到标准化系数和方差估计 例如,如果你有以下数据 # get some data library(MASS) data("Cars93") x <- Cars93[,c("EngineSize", "Horsepower", "RPM")] 但是如果你没有数据,而是有相关矩阵或协方差矩阵: corx <- cor(x) co

我希望能够通过提供相关或协方差矩阵而不是data.frame,从多元线性回归中回归系数。我意识到你丢失了一些与确定截距等相关的信息,但它甚至应该是相关矩阵,足以得到标准化系数和方差估计

例如,如果你有以下数据

# get some data
library(MASS)
data("Cars93")
x <- Cars93[,c("EngineSize", "Horsepower", "RPM")]
但是如果你没有数据,而是有相关矩阵或协方差矩阵:

corx <- cor(x)
covx <- cov(x)
请注意,上的回答从理论上解释了为什么可能,并提供了一些用于计算系数的自定义R代码的示例。

请记住:

$beta=(X'X)^-1。X'Y$

尝试:


使用Lavan,您可以执行以下操作:

library(MASS)
data("Cars93")
x <- Cars93[,c("EngineSize", "Horsepower", "RPM")]

lav.input<- cov(x)
lav.mean <- colMeans(x)

library(lavaan)
m1 <- 'EngineSize ~ Horsepower+RPM'
fit <- sem(m1, sample.cov = lav.input,sample.nobs = nrow(x), meanstructure = TRUE, sample.mean = lav.mean)
summary(fit, standardize=TRUE)

我认为拉万听起来是个不错的选择,我注意到@Philip为我指明了正确的方向。我只是在这里提到了如何使用lavan(特别是r平方和调整后的r平方)提取一些您可能需要的额外模型特征

有关最新版本,请参阅: :


另一种时髦的解决方案是生成与原始数据具有相同方差-协方差矩阵的数据集。您可以使用
MASS
包中的
mvrnorm()。在此新数据集上使用
lm()
,将产生与从原始数据集估计的参数估计值和标准误差相同的参数估计值和标准误差(截距除外,除非您掌握每个变量的平均值,否则无法访问截距)。下面是一个这样的示例:

#Assuming the variance covariance matrix is called VC
n <- 100 #sample size
nvar <- ncol(VC)
fake.data <- mvrnorm(n, mu = rep(0, nvar), sigma = VC, empirical = TRUE)
lm(Y~., data = fake.data)
假设方差-协方差矩阵称为VC
这有帮助吗@最近的邮件谢谢,我已经把关于Stats.SE问题的一些观点整合到了这个问题中。这篇文章似乎可以修改以获得系数。我把我的问题改了。我想我希望的是一个类似于
lm
的函数,但它只接受协方差而不是数据。也就是说,这样就很容易得到模型拟合之类的东西。你可以使用拉万。它需要一个相关/协方差矩阵作为输入。我错过了你想要的R平方值。所以:
summary(fit,standardarize=TRUE,rsquare=TRUE)
会给你想要的。与lm相关的大多数其他功能将起作用,包括
predict
anova
等。加上Lavan so的所有优点
:=
可用于在模型中定义新参数,而不是在安装后从汽车中使用
deltaMethod
(bs<-solve(covx[-1,-1],covx[-1,1]))

 Horsepower         RPM 
 0.01491908 -0.00100051 
  ms=colMeans(x)
  (b0=ms[1]-bs%*%ms[-1])

         [,1]
[1,] 5.805301
library(MASS)
data("Cars93")
x <- Cars93[,c("EngineSize", "Horsepower", "RPM")]

lav.input<- cov(x)
lav.mean <- colMeans(x)

library(lavaan)
m1 <- 'EngineSize ~ Horsepower+RPM'
fit <- sem(m1, sample.cov = lav.input,sample.nobs = nrow(x), meanstructure = TRUE, sample.mean = lav.mean)
summary(fit, standardize=TRUE)
Regressions:
                   Estimate    Std.Err  Z-value  P(>|z|)   Std.lv    Std.all
  EngineSize ~                                                              
    Horsepower          0.015    0.001   19.889    0.000      0.015    0.753
    RPM                -0.001    0.000  -15.197    0.000     -0.001   -0.576

Intercepts:
                  Estimate    Std.Err  Z-value  P(>|z|)   Std.lv    Std.all
   EngineSize          5.805    0.362   16.022    0.000      5.805    5.627

Variances:
                  Estimate    Std.Err  Z-value  P(>|z|)   Std.lv    Std.all
    EngineSize          0.142    0.021    6.819    0.000      0.142    0.133
# get data
library(MASS)
data("Cars93")
x <- Cars93[,c("EngineSize", "Horsepower", "RPM")]

# define sample statistics 
covx <- cov(x)
n <- nrow(x)
means <- sapply(x, mean) # this is optional


fit <- lavaan::sem("EngineSize ~ Horsepower + RPM", sample.cov = covx,
                   sample.mean = means,
                    sample.nobs = n)

coef(fit) # unstandardised coefficients
standardizedSolution(fit) # Standardised coefficients
inspect(fit, 'r2') # r-squared

# adjusted r-squared
adjr2 <- function(rsquared, n, p) 1 - (1-rsquared)  * ((n-1)/(n-p-1))
# update p below with number of predictor variables
adjr2(inspect(fit, 'r2'), n = inspect(fit, "nobs"), p = 2) 
covlm <- function(dv, ivs, n, cov) {
    # Assumes lavaan package
    # library(lavaan)
    # dv: charcter vector of length 1 with name of outcome variable
    # ivs: character vector of names of predictors
    # n: numeric vector of length 1: sample size
    # cov: covariance matrix where row and column names 
    #       correspond to dv and ivs
    # Return
    #      list with lavaan model fit
    #      and various other features of the model

    results <- list()
    eq <- paste(dv, "~", paste(ivs, collapse = " + "))
    results$fit <- lavaan::sem(eq, sample.cov = cov,
                       sample.nobs = n)

    # coefficients
    ufit <- parameterestimates(results$fit) 
    ufit <- ufit[ufit$op == "~", ]
    results$coef <- ufit$est
    names(results$coef) <- ufit$rhs

    sfit <- standardizedsolution(results$fit) 
    sfit <- sfit[sfit$op == "~", ]
    results$standardizedcoef <- sfit$est.std
    names(results$standardizedcoef) <- sfit$rhs

    # use unclass to not limit r2 to 3 decimals
     results$r.squared <- unclass(inspect(results$fit, 'r2')) # r-squared

    # adjusted r-squared
      adjr2 <- function(rsquared, n, p) 1 - (1-rsquared)  * ((n-1)/(n-p-1))
    results$adj.r.squared <- adjr2(unclass(inspect(results$fit, 'r2')), 
                                n = n, p = length(ivs)) 
    results

}
x <- Cars93[,c("EngineSize", "Horsepower", "RPM")]
covlm(dv = "EngineSize", ivs = c("Horsepower", "RPM"),
      n = nrow(x), cov = cov(x))
$fit
lavaan (0.5-20) converged normally after  27 iterations

  Number of observations                            93

  Estimator                                         ML
  Minimum Function Test Statistic                0.000
  Degrees of freedom                                 0
  Minimum Function Value               0.0000000000000

$coef
 Horsepower         RPM 
 0.01491908 -0.00100051 

$standardizedcoef
Horsepower        RPM 
 0.7532350 -0.5755326 

$r.squared
EngineSize 
     0.867 

$adj.r.squared
EngineSize 
     0.864 
#Assuming the variance covariance matrix is called VC
n <- 100 #sample size
nvar <- ncol(VC)
fake.data <- mvrnorm(n, mu = rep(0, nvar), sigma = VC, empirical = TRUE)
lm(Y~., data = fake.data)