R 因素和假人的比例

R 因素和假人的比例,r,dplyr,lapply,tidyr,data-manipulation,R,Dplyr,Lapply,Tidyr,Data Manipulation,我有一个满是因子和模型的数据集,我想在dplyr::group_by(cyl) 与am列相同也是这是第一条裂缝: (df %>% pivot_longer(-cyl) ## spread out variables (vs, am) %>% group_by(cyl,name) %>% mutate(n=n()) ## obs per cyl/var combo %>% group_by(cyl,n

我有一个满是因子和模型的数据集,我想在
dplyr::group_by(cyl)

与am列相同也是

这是第一条裂缝:

(df 
    %>% pivot_longer(-cyl)       ## spread out variables (vs, am)
    %>% group_by(cyl,name)   
    %>% mutate(n=n())            ## obs per cyl/var combo
    %>% group_by(cyl,name,value) 
    %>% summarise(prop=n()/n)    ## proportion of 0/1 per cyl/var  
    %>% unique()                 ## not sure why I need this?
    %>% pivot_wider(id_cols=c(cyl,name),names_from=value,values_from=prop)
)
结果:

   cyl name     `0`    `1`
  <dbl> <chr>  <dbl>  <dbl>
1     4 am    0.273   0.727
2     4 vs    0.0909  0.909
3     6 am    0.571   0.429
...
cyl名称'0``1`
凌晨1时4分0.2730.727
2 4对0.0909 0.909
凌晨3点6分0.571 0.429
...

您能否通过示例阐明所需的输出
a
应该是什么样的?目前,它看起来像是一个包含五个因素的列表,每个因素都有每个级别的比例。您好,我添加了它。像prob.table(table())一样,一个问题是,我认为当某个特定结果为零时,它会产生NAs;关于StackOverflow(我认为)有很多Qs可以解释如何处理这个问题,例如。
(df 
    %>% pivot_longer(-cyl)       ## spread out variables (vs, am)
    %>% group_by(cyl,name)   
    %>% mutate(n=n())            ## obs per cyl/var combo
    %>% group_by(cyl,name,value) 
    %>% summarise(prop=n()/n)    ## proportion of 0/1 per cyl/var  
    %>% unique()                 ## not sure why I need this?
    %>% pivot_wider(id_cols=c(cyl,name),names_from=value,values_from=prop)
)
   cyl name     `0`    `1`
  <dbl> <chr>  <dbl>  <dbl>
1     4 am    0.273   0.727
2     4 vs    0.0909  0.909
3     6 am    0.571   0.429
...