使用lm而不是biglm时，glance（）出现扫帚/Dplyr错误_R_Dplyr

使用lm而不是biglm时，glance（）出现扫帚/Dplyr错误

使用lm而不是biglm时，glance（）出现扫帚/Dplyr错误,r,dplyr,R,Dplyr,我使用dplyr/broom包对多个传感器进行线性回归。当我在do语句中使用lm（）时，broom中的glance（）函数将不起作用，但如果我使用biglm（），它将起作用。这不会是一个问题，但我希望r^2、F-统计和p-val对于传统的lm（）来说，一瞥能够非常漂亮地返回我在其他地方查找过，没有发现类似的错误案例： Error in data.frame(r.squared = r.squared, adj.r.squared = adj.r.squared, : object 'fs

我使用dplyr/broom包对多个传感器进行线性回归。当我在do语句中使用lm（）时，broom中的glance（）函数将不起作用，但如果我使用biglm（），它将起作用。这不会是一个问题，但我希望r^2、F-统计和p-val对于传统的lm（）来说，一瞥能够非常漂亮地返回

我在其他地方查找过，没有发现类似的错误案例：

Error in data.frame(r.squared = r.squared, adj.r.squared = adj.r.squared,  : 
 object 'fstatistic' not found

可能的预感：

?Anova 
"The comparison between two or more models will only be valid if they are 
fitted to the same dataset. This may be a problem if there are missing
values and R's default of na.action = na.omit is used."

代码如下：

library(tidyr)
library(broom)
library(biglm) # if not install.packages("biglm")
library(dplyr)
regressionBig <- tidied_rm_outliers %>%
group_by(sensor_name, Lot.Tool, Lot.Module, Recipe, Step, Stage, MEAS_TYPE) %>%
do(fit = biglm(MEAS_AVG ~ value, data = .)) #note biglm is used

regressionBig 

#extract the r^2 from the complex list type from the data frame we just stored

glances <- regressionBig %>% glance(fit)
glances %>% 
  ungroup() %>%
  arrange(desc(r.squared))
#Biglm works but if i try the same thing with regular lm It errors on glance() 

ErrorDf <- tidied_rm_outliers %>%
  group_by(sensor_name, Lot.Tool, Lot.Module, Recipe, Step, Stage, MEAS_TYPE) %>% 
  do(fit = lm(MEAS_AVG ~ value, data = .)) #note lm is normal
ErrorDf %>% glance(fit)

#Error in data.frame(r.squared = r.squared, adj.r.squared = adj.r.squared,  : 
#object 'fstatistic' not found

library（tidyr）
图书馆（扫帚）
库（biglm）#如果没有安装.packages（“biglm”）
图书馆（dplyr）
回归大%
分组依据（传感器名称、批次、工具、批次、模块、配方、步骤、阶段、测量类型）%>%
do（拟合=biglm（测量平均值，数据=）#注：使用了biglm
回归大
#从刚才存储的数据帧中从复杂列表类型中提取r^2
浏览量%浏览量（适合）
浏览%>%
解组（）%>%
排列（描述（右平方））
#Biglm可以工作，但如果我对常规lm尝试相同的方法，则会在glance（）上出现错误
误差f%
分组依据（传感器名称、批次、工具、批次、模块、配方、步骤、阶段、测量类型）%>%
do（拟合=lm（测量平均值，数据=）#注lm正常
ErrorDf%>%浏览（适合）
#数据帧中的错误（r.squared=r.squared，adj.r.squared=adj.r.squared，：
#找不到对象“fstatistic”

我讨厌上传整个数据帧，因为我知道在s/O上通常是不可接受的，但我不确定如果不这样做，我是否可以创建一个可复制的示例。

如果您愿意，请在pastebin上输入会话信息！

它在

ErrorDf

中看起来像一个坏模型。我诊断它运行

for

循环

for (i in 1:nrow(ErrorDf)){
  print(i)
  glance(ErrorDf$fit[[i]])
}

对于模型94，似乎无法估算出

值的系数。我没有做任何进一步的调查，但它提出了一个有趣的问题，即扫帚应该如何处理这个问题。
在遇到同样的问题后，我发现了这篇文章。如果lm（）
失败是因为某些分组的事例太少，然后您可以通过在运行do（）
循环之前预筛选数据以删除这些分组来解决问题。下面的通用代码显示了如何筛选出数据点少于30个的组
require(dplyr)
require(broom)

data_grp = ( data 
    %>% group_by(factor_a, factor_b)
    %>% mutate(grp_cnt=n())
    %>% filter(grp_cnt>30)
)

在我的故障排除中找到这篇文章后，我编写了一个函数来处理这个问题。包维护人员可能（将）有一个更聪明的解决方案，但我认为它应该适用于大多数情况。感谢@Benjamin的循环灵感
collect_glance=function(mdldF){
    # mdldF should be a data frame from dplyr/broom with the column 'mdl' for the object models
    mdlglance=data_frame() #initialize empty dataframe
    metadF=mdldF %>% slice(0) %>% select(-ncol(mdldF))#create an empty data frame with only the group info
    i=1
    for(i in 1:nrow(mdldF)){
        # fill in metadata for each group for each modeling iteration
        for(colnums in 1:ncol(mdldF)-1){
            metadF[1,colnames(mdldF)[colnums]]=mdldF[i,colnames(mdldF[colnums])]
        }
        # attempt glance(). if succesful, bind to metadata. if not, return empty dataframe
        gtmp=tryCatch(glance(mdldF$mdl[[i]]) %>% bind_cols(metadF,.), error = function(e) {
            data_frame()
        })
        # test for empty dataframe. bind to mdlglance data frame if glance was successful. otherwise use full_join to join mdlglance and metadata by group names and get NA for all the other glance columns.
        if(nrow(gtmp)!=0) { 
            mdlglance=bind_rows(mdlglance,gtmp) 
        } else {
            mdlglance=full_join(mdlglance,metadF)
            }
    }
    return(mdlglance)
}

由于模型奇异性，至少有一个系数没有定义，因此lm
对象中没有返回F统计信息，因此glance
实际上找不到fstatistic
，我可以重现这种情况。