在R中使用dplyr查找分组观测的比例
我经常使用函数在R中使用dplyr查找分组观测的比例,r,dplyr,R,Dplyr,我经常使用函数groupby()和summary()(注意:如果摘要统计信息是sum()),则这与R中的dplyr包中的count()函数相同 下面是一个示例,说明如何: library(dplyr) data <- data.frame( group = sample(rep(c("Group A", "Group B", "Group C", "Group D"), 4), 16, replace = F), factor = sample(rep(c("Factor 1",
groupby()
和summary()
(注意:如果摘要统计信息是sum()
),则这与R
中的dplyr
包中的count()
函数相同
下面是一个示例,说明如何:
library(dplyr)
data <- data.frame(
group = sample(rep(c("Group A", "Group B", "Group C", "Group D"), 4), 16, replace = F),
factor = sample(rep(c("Factor 1", "Factor 2"), 8), 16, replace = F),
var1 = sample(1:16)
)
这导致了所需的输出,我可以检查factor_prop_sum_var1
的和是否等于1
:
out_df
Source: local data frame [8 x 4]
Groups: group [4]
group factor sum_var1 factor_prop_sum_var1
<fctr> <fctr> <int> <dbl>
1 Group A Factor 1 26 0.3170732
2 Group B Factor 1 17 0.2073171
3 Group C Factor 1 19 0.2317073
4 Group D Factor 1 18 0.2195122
5 Group A Factor 2 8 0.1481481
6 Group B Factor 2 19 0.3518519
7 Group C Factor 2 7 0.1296296
8 Group D Factor 2 22 0.4074074
out_df %>% group_by(factor) %>% summarize(checking = sum(factor_prop_sum_var1))
# A tibble: 2 × 2
factor checking
<fctr> <dbl>
1 Factor 1 1
2 Factor 2 1
out\u df
来源:本地数据帧[8 x 4]
分组:分组[4]
组因子和变量1因子和属性和变量1
1 A组系数1 26 0.3170732
2 B组系数1 17 0.2073171
3 C组系数1 19 0.2317073
4 D组系数1 18 0.2195122
5 A组系数2 8 0.1481
6 B组系数2 19 0.3518519
7 C组系数2 7 0.1296296
8 D组系数2 22 0.4074074
out_df%%>%分组依据(因子)%%>%汇总(检查=总和(因子属性总和变量1))
#一个tibble:2×2
因子检查
1因素1
2因素2 1
这是可行的,但充其量也很笨重。有没有一种方法可以更优雅地做到这一点(最好是在dplyr
“管道”中) 要获得组内的比例,只需按要将比例添加到100%的列进行分组即可。因此,在这种情况下,在获得组
和因子
的每个组合的总和后,再次使用组_by
,但这次只按因子
分组,然后计算百分比
library(dplyr)
set.seed(100)
data <- data.frame(
group = sample(rep(c("Group A", "Group B", "Group C", "Group D"), 4), 16, replace = F),
factor = sample(rep(c("Factor 1", "Factor 2"), 8), 16, replace = F),
var1 = sample(1:16)
)
data %>%
group_by(group, factor) %>%
summarize(sum_var1 = sum(var1)) %>%
group_by(factor) %>%
mutate(percent = sum_var1/sum(sum_var1)) %>%
arrange(factor)
out_df
Source: local data frame [8 x 4]
Groups: group [4]
group factor sum_var1 factor_prop_sum_var1
<fctr> <fctr> <int> <dbl>
1 Group A Factor 1 26 0.3170732
2 Group B Factor 1 17 0.2073171
3 Group C Factor 1 19 0.2317073
4 Group D Factor 1 18 0.2195122
5 Group A Factor 2 8 0.1481481
6 Group B Factor 2 19 0.3518519
7 Group C Factor 2 7 0.1296296
8 Group D Factor 2 22 0.4074074
out_df %>% group_by(factor) %>% summarize(checking = sum(factor_prop_sum_var1))
# A tibble: 2 × 2
factor checking
<fctr> <dbl>
1 Factor 1 1
2 Factor 2 1
library(dplyr)
set.seed(100)
data <- data.frame(
group = sample(rep(c("Group A", "Group B", "Group C", "Group D"), 4), 16, replace = F),
factor = sample(rep(c("Factor 1", "Factor 2"), 8), 16, replace = F),
var1 = sample(1:16)
)
data %>%
group_by(group, factor) %>%
summarize(sum_var1 = sum(var1)) %>%
group_by(factor) %>%
mutate(percent = sum_var1/sum(sum_var1)) %>%
arrange(factor)
group factor sum_var1 percent
1 Group A Factor 1 13 0.25000000
2 Group B Factor 1 8 0.15384615
3 Group C Factor 1 21 0.40384615
4 Group D Factor 1 10 0.19230769
5 Group A Factor 2 20 0.23809524
6 Group B Factor 2 27 0.32142857
7 Group C Factor 2 2 0.02380952
8 Group D Factor 2 35 0.41666667