R'；函数的作用是：在不报告错误或警告的情况下删除因子级别（除了截距估计器）_R_Lm_Anova

R'；函数的作用是：在不报告错误或警告的情况下删除因子级别（除了截距估计器）

R'；函数的作用是：在不报告错误或警告的情况下删除因子级别（除了截距估计器）,r,lm,anova,R,Lm,Anova,我试图使用正向逐步方差分析和AIC选择标准，将一个线性模型拟合到一个相当大的不平衡数据集，并带有交互项。共有13072次观察。以下是它的设置方式：响应变量dayseclosion是连续数字解释变量都是分类变量：主机（4级），站点（25级），年份（5级），冬季（4级）问题是，当我检查summary（）和anova（）表格时，发现与年份相关的一个因子水平正在被神秘地删除（year2020）。注意，这不是用于估算截距的水平（即year2016）。当年共有3581次观测，但在摘要（model.ai

我试图使用正向逐步方差分析和AIC选择标准，将一个线性模型拟合到一个相当大的不平衡数据集，并带有交互项。共有13072次观察。以下是它的设置方式：

响应变量

dayseclosion

是连续数字

解释变量都是分类变量：

主机

（4级），

站点

（25级），

年份

（5级），

冬季

（4级）

问题是，当我检查
summary（）
和
anova（）
表格时，发现与
年份相关的一个因子水平正在被神秘地删除（year2020 ）。注意，这不是用于估算截距的水平（即year2016 ）。当年共有3581次观测，但在摘要（model.aic.forward）表中，系数如下（仅部分显示）：此处也未显示，但与year2020 的所有交互也显示为NA 奇怪的是，根据F-stat自由度，似乎所有观测值，包括2020年的观测值都被用来拟合模型（79+12992=13071）：我的解释错了吗？2020年发生了什么事？数据的不平衡性会导致这种情况吗？我无法想象如何提供一个最小的可重复性示例，因为可能是数据的数量和复杂性导致了问题感谢您的阅读，并有可能帮助您解决此难题。是的，您的缺失可能导致您的结果考虑以下可重复的示例 x1 <- sample(1:5, 1000, replace=T) x2 <- sample(1:3, 1000, replace=T) y <- 2*x1 + 3*x2 + rnorm(1000) #no missings (everything works fine) lm(y~as.factor(x1) + as.factor(x2)) lm(y~ -1 + as.factor(x1) + as.factor(x2)) # no intercept #with missings x1[x2==2]<-NA #create specific missingness table(x1,useNA = "always") table(x2,useNA = "always") table(x1,x2,useNA = "always") #you see the missing pattern lm(y~ as.factor(x1) + as.factor(x2)) lm(y~ -1 + as.factor(x1) + as.factor(x2)) x1为了有一个满秩设计矩阵，并且由于您的模型有一个截距，作为对分类变量进行伪编码的结果，需要删除一个因子级别。如果删除截取，您将恢复“缺失”因子级别。@感谢您的回复。我考虑了你的建议，但是，还有一个不同的“年份”用于估算截距（2016年）。此外，方差分析表中的df也表明两年已被删除。你明白我的意思吗？好的，我明白你的意思。我认为你的猜测是对的，这与不平衡的设计有关。您能检查一下您有哪些主机 s和站点 syear=2020 测量值吗？我认为我们需要获得具体的数据来了解正在发生的事情。NA 系数通常意味着无法估计参数。lm 从设计矩阵中删除列以获得非奇异设计矩阵，尝试使用singular运行lm 。ok=FALSE 并检查模型矩阵mm叹息，我接受了无法使用lm（）的事实给定数据结构。似乎我需要对每个水平至少进行一次观察（即，所有地点的所有宿主在所有年份进行所有的冬季治疗），每个反应都表明了这一点（谢谢大家）。所以，可能是线性混合模型或贝叶斯模型。是时候打开一些书了。 ## Step: AIC=61561.31 ## dayseclose ~ host + site + monoverwinter + year + host:site + ## host:monoverwinter + site:monoverwinter + site:year ## ## Df Sum of Sq RSS AIC ## <none> 1433157 61561 ## + host:year 1 8.059 1433149 61563 ## + host:monoverwinter:site 3 242.885 1432914 61565 ## year2017 6.36787 2.44775 2.602 0.009292 ** ## year2018 -0.13757 1.85568 -0.074 0.940906 ## year2019 -10.56667 3.45693 -3.057 0.002243 ** ## year2020 NA NA NA NA ## Residual standard error: 10.5 on 12992 degrees of freedom ## Multiple R-squared: 0.3105, Adjusted R-squared: 0.3063 ## F-statistic: 74.06 on 79 and 12992 DF, p-value: < 2.2e-16 d =anova(model.aic.forward) as_tibble(d, rownames = "Predictors") ## # A tibble: 9 x 6 ## Predictors Df `Sum Sq` `Mean Sq` `F value` `Pr(>F)` ## <chr> <int> <dbl> <dbl> <dbl> <dbl> ## 1 host 3 497707. 165902. 1504. 0. ## 2 site 24 43276. 1803. 16.3 3.77e-67 ## 3 monoverwinter 3 34395. 11465. 104. 1.73e-66 ## 4 year 3 2422. 807. 7.32 6.71e- 5 ## 5 host:site 16 43445. 2715. 24.6 1.08e-72 ## 6 host:monoverwinter 3 7319. 2440. 22.1 2.80e-14 ## 7 site:monoverwinter 12 9229. 769. 6.97 9.13e-13 ## 8 site:year 15 7646. 510. 4.62 6.30e- 9 ## 9 Residuals 12992 1433157. 110. NA NA x1 <- sample(1:5, 1000, replace=T) x2 <- sample(1:3, 1000, replace=T) y <- 2*x1 + 3*x2 + rnorm(1000) #no missings (everything works fine) lm(y~as.factor(x1) + as.factor(x2)) lm(y~ -1 + as.factor(x1) + as.factor(x2)) # no intercept #with missings x1[x2==2]<-NA #create specific missingness table(x1,useNA = "always") table(x2,useNA = "always") table(x1,x2,useNA = "always") #you see the missing pattern lm(y~ as.factor(x1) + as.factor(x2)) lm(y~ -1 + as.factor(x1) + as.factor(x2))