R 使用预测比较GLM模型_R_Prediction_Glm

R 使用预测比较GLM模型

R 使用预测比较GLM模型,r,prediction,glm,R,Prediction,Glm,假设我对相同的数据调用glm（），但使用不同的公式和/或族创建了两个模型。现在我想通过预测未知数据来比较哪个模型更好。大概是这样的： mod1 <- glm(formula1, family1, data) mod2 <- glm(formula2, family2, data) mu1 <- predict(mod1, newdata, type = "response") mu2 <- predict(mod2, newdata, type = "response")

假设我对相同的数据调用

glm（）

，但使用不同的公式和/或族创建了两个模型。现在我想通过预测未知数据来比较哪个模型更好。大概是这样的：

mod1 <- glm(formula1, family1, data)
mod2 <- glm(formula2, family2, data)
mu1 <- predict(mod1, newdata, type = "response")
mu2 <- predict(mod2, newdata, type = "response")

mod1回答这个问题更容易
选择一个族通常更合理，而不是根据太多的拟合优度——例如，如果你的计数（非负整数）响应没有明显的上界，那么你唯一真正的选择是严格地位于指数族中的泊松
set.seed(101)
x <- runif(1000)
mu <- exp(1+2*x)
y <- rgamma(1000,shape=3,scale=mu/3)
d <- data.frame(x,y)

问题1更适合于摘要函数返回偏差，即数据和模型的-2*对数似然。您需要说明“预测的对数可能性”的含义，因为预期的预测值与模型完全一致，即LL为0。
nd <- data.frame(x=runif(100))
nd$y <- rgamma(100,shape=3,scale=exp(1+2*nd$x)/3)

mod1 <- glm(y~x,family=Gamma(link="log"),data=d)
mod2 <- glm(y~x,family=gaussian(link="log"),data=d)

mu1 <- predict(mod1, newdata=nd, type="response")
mu2 <- predict(mod2, newdata=nd, type="response")

sigma <- sqrt(summary(mod2)$dispersion)
shape <- MASS::gamma.shape(mod1)$alpha

rmse <- function(x1,x2) sqrt(mean((x1-x2)^2))
rmse(mu1,nd$y)  ## 5.845
rmse(mu2,nd$y)  ## 5.842

-sum(dgamma(nd$y,shape=shape,scale=mu1/shape,log=TRUE))  ## 276.84
-sum(dnorm(nd$y,mean=mu2,sd=sigma,log=TRUE))  ## 318.4