R 向逻辑回归(glm)模型中添加多个解释变量会产生错误吗?
我已尝试拟合以下线性模型:R 向逻辑回归(glm)模型中添加多个解释变量会产生错误吗?,r,glm,R,Glm,我已尝试拟合以下线性模型: ad.glm.all <- glm(WinLoss ~ Score + Margin + Opposition + Venue + Disposals + Marks + Goals + Behinds + Hitouts + Tackles + Rebound50s + Inside50s + Clearances + Clangers + FreesFor + ContendedPossessions + ContestedMarks + MarksIns
ad.glm.all <- glm(WinLoss ~ Score + Margin + Opposition + Venue + Disposals + Marks + Goals + Behinds + Hitouts + Tackles + Rebound50s + Inside50s + Clearances + Clangers + FreesFor + ContendedPossessions + ContestedMarks + MarksInside50 + OnePercenters + Bounces+GoalAssists,
data = ad.train, family = binomial)
当我看到这个回归模型的总结时,我得到:
Call:
glm(formula = WinLoss ~ Score + Margin + Disposals + Marks +
Goals + Behinds + Hitouts + Tackles + Rebound50s + Inside50s +
Clearances + Clangers + FreesFor + ContendedPossessions +
ContestedMarks + MarksInside50 + OnePercenters + Bounces +
GoalAssists, family = binomial, data = ad.train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.980e-05 -2.100e-08 2.100e-08 2.100e-08 3.569e-05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.578e+00 2.502e+06 0.000 1
Score 4.194e+00 5.165e+04 0.000 1
Margin 2.187e+00 3.742e+03 0.001 1
Disposals 8.946e-02 3.549e+03 0.000 1
Marks 1.427e-01 1.938e+03 0.000 1
Goals -2.288e+01 3.082e+05 0.000 1
Behinds -7.034e+00 5.482e+04 0.000 1
Hitouts 3.640e-02 5.167e+03 0.000 1
Tackles 8.939e-01 7.075e+03 0.000 1
Rebound50s -2.064e-01 8.497e+03 0.000 1
Inside50s 5.645e-01 8.133e+03 0.000 1
Clearances -1.930e-01 1.525e+04 0.000 1
Clangers -2.040e-01 1.056e+04 0.000 1
FreesFor -7.699e-01 1.762e+04 0.000 1
ContendedPossessions -5.752e-01 7.424e+03 0.000 1
ContestedMarks -1.869e+00 1.069e+04 0.000 1
MarksInside50 6.742e-01 1.676e+04 0.000 1
OnePercenters 1.616e-01 6.888e+03 0.000 1
Bounces -8.763e-01 7.669e+03 0.000 1
GoalAssists 7.570e-01 3.299e+04 0.000 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1.2540e+02 on 91 degrees of freedom
Residual deviance: 7.1154e-09 on 72 degrees of freedom
AIC: 40
Number of Fisher Scoring iterations: 25
显然这里出了严重的问题,对吗?每个变量的P值不能都是1,Z值都是0;对吧?
我给了它一个谷歌,我能找到的最好的结果是有人建议错误可能是因为变量太多(考虑到我有多少变量,这是有道理的)。因此,我开始一个接一个地删除它们,并且每次尝试都会得到错误,直到我只有一个变量(x~y);只有这样我才不会出错
有人能给我解释一下这个错误是什么意思吗?为什么我所有的P值都是1,z值都是0
提前谢谢
-特洛伊城在我看来就像是一个过度装修的粗鲁例子。你可能想试试套索/弹性网。还有,你有没有检查过你的一个预测因素是否会导致已经完美的分离?@Roland我会看看套索/弹性网,猜测它们只是你可以添加到R中处理这类事情的包?我猜,因为这是体育数据,我在看赢/输,“利润”变量可能是一个完美的分隔符;有什么建议可以确定吗?那么分数是多少?把你的依赖者和预测者列成一个列联表。更新:我想我明白了为什么glm如此怪异;没有足够的数据。当我将数据分成不同的团队时,我的训练集中有20行数据,测试集中有5行数据(每个团队)。将所有团队合并到一个大数据框架中,得出的数字是这些数字的18倍。我很确定这就是我不再犯奇怪错误的原因。谢谢你们的帮助,小伙子们。
Call:
glm(formula = WinLoss ~ Score + Margin + Disposals + Marks +
Goals + Behinds + Hitouts + Tackles + Rebound50s + Inside50s +
Clearances + Clangers + FreesFor + ContendedPossessions +
ContestedMarks + MarksInside50 + OnePercenters + Bounces +
GoalAssists, family = binomial, data = ad.train)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.980e-05 -2.100e-08 2.100e-08 2.100e-08 3.569e-05
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -8.578e+00 2.502e+06 0.000 1
Score 4.194e+00 5.165e+04 0.000 1
Margin 2.187e+00 3.742e+03 0.001 1
Disposals 8.946e-02 3.549e+03 0.000 1
Marks 1.427e-01 1.938e+03 0.000 1
Goals -2.288e+01 3.082e+05 0.000 1
Behinds -7.034e+00 5.482e+04 0.000 1
Hitouts 3.640e-02 5.167e+03 0.000 1
Tackles 8.939e-01 7.075e+03 0.000 1
Rebound50s -2.064e-01 8.497e+03 0.000 1
Inside50s 5.645e-01 8.133e+03 0.000 1
Clearances -1.930e-01 1.525e+04 0.000 1
Clangers -2.040e-01 1.056e+04 0.000 1
FreesFor -7.699e-01 1.762e+04 0.000 1
ContendedPossessions -5.752e-01 7.424e+03 0.000 1
ContestedMarks -1.869e+00 1.069e+04 0.000 1
MarksInside50 6.742e-01 1.676e+04 0.000 1
OnePercenters 1.616e-01 6.888e+03 0.000 1
Bounces -8.763e-01 7.669e+03 0.000 1
GoalAssists 7.570e-01 3.299e+04 0.000 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1.2540e+02 on 91 degrees of freedom
Residual deviance: 7.1154e-09 on 72 degrees of freedom
AIC: 40
Number of Fisher Scoring iterations: 25