R 因素和假人的比例
我有一个满是因子和模型的数据集,我想在R 因素和假人的比例,r,dplyr,lapply,tidyr,data-manipulation,R,Dplyr,Lapply,Tidyr,Data Manipulation,我有一个满是因子和模型的数据集,我想在dplyr::group_by(cyl) 与am列相同也是这是第一条裂缝: (df %>% pivot_longer(-cyl) ## spread out variables (vs, am) %>% group_by(cyl,name) %>% mutate(n=n()) ## obs per cyl/var combo %>% group_by(cyl,n
dplyr::group_by(cyl)
与am列相同也是这是第一条裂缝:
(df
%>% pivot_longer(-cyl) ## spread out variables (vs, am)
%>% group_by(cyl,name)
%>% mutate(n=n()) ## obs per cyl/var combo
%>% group_by(cyl,name,value)
%>% summarise(prop=n()/n) ## proportion of 0/1 per cyl/var
%>% unique() ## not sure why I need this?
%>% pivot_wider(id_cols=c(cyl,name),names_from=value,values_from=prop)
)
结果:
cyl name `0` `1`
<dbl> <chr> <dbl> <dbl>
1 4 am 0.273 0.727
2 4 vs 0.0909 0.909
3 6 am 0.571 0.429
...
cyl名称'0``1`
凌晨1时4分0.2730.727
2 4对0.0909 0.909
凌晨3点6分0.571 0.429
...
您能否通过示例阐明所需的输出a
应该是什么样的?目前,它看起来像是一个包含五个因素的列表,每个因素都有每个级别的比例。您好,我添加了它。像prob.table(table())一样,一个问题是,我认为当某个特定结果为零时,它会产生NAs;关于StackOverflow(我认为)有很多Qs可以解释如何处理这个问题,例如。
(df
%>% pivot_longer(-cyl) ## spread out variables (vs, am)
%>% group_by(cyl,name)
%>% mutate(n=n()) ## obs per cyl/var combo
%>% group_by(cyl,name,value)
%>% summarise(prop=n()/n) ## proportion of 0/1 per cyl/var
%>% unique() ## not sure why I need this?
%>% pivot_wider(id_cols=c(cyl,name),names_from=value,values_from=prop)
)
cyl name `0` `1`
<dbl> <chr> <dbl> <dbl>
1 4 am 0.273 0.727
2 4 vs 0.0909 0.909
3 6 am 0.571 0.429
...