Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/google-maps/4.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
在R中使用dplyr查找分组观测的比例_R_Dplyr - Fatal编程技术网

在R中使用dplyr查找分组观测的比例

在R中使用dplyr查找分组观测的比例,r,dplyr,R,Dplyr,我经常使用函数groupby()和summary()(注意:如果摘要统计信息是sum()),则这与R中的dplyr包中的count()函数相同 下面是一个示例,说明如何: library(dplyr) data <- data.frame( group = sample(rep(c("Group A", "Group B", "Group C", "Group D"), 4), 16, replace = F), factor = sample(rep(c("Factor 1",

我经常使用函数
groupby()
summary()
(注意:如果摘要统计信息是
sum()
),则这与
R
中的
dplyr
包中的
count()
函数相同

下面是一个示例,说明如何:

library(dplyr)

data <- data.frame(
  group = sample(rep(c("Group A", "Group B", "Group C", "Group D"), 4), 16, replace = F),
  factor = sample(rep(c("Factor 1", "Factor 2"), 8), 16, replace = F),
  var1 = sample(1:16)
)
这导致了所需的输出,我可以检查
factor_prop_sum_var1
是否等于
1

out_df

Source: local data frame [8 x 4]
Groups: group [4]

    group   factor sum_var1 factor_prop_sum_var1
   <fctr>   <fctr>    <int>                <dbl>
1 Group A Factor 1       26            0.3170732
2 Group B Factor 1       17            0.2073171
3 Group C Factor 1       19            0.2317073
4 Group D Factor 1       18            0.2195122
5 Group A Factor 2        8            0.1481481
6 Group B Factor 2       19            0.3518519
7 Group C Factor 2        7            0.1296296
8 Group D Factor 2       22            0.4074074

out_df %>% group_by(factor) %>% summarize(checking = sum(factor_prop_sum_var1))

# A tibble: 2 × 2
    factor checking
    <fctr>    <dbl>
1 Factor 1        1
2 Factor 2        1
out\u df
来源:本地数据帧[8 x 4]
分组:分组[4]
组因子和变量1因子和属性和变量1
1 A组系数1 26 0.3170732
2 B组系数1 17 0.2073171
3 C组系数1 19 0.2317073
4 D组系数1 18 0.2195122
5 A组系数2 8 0.1481
6 B组系数2 19 0.3518519
7 C组系数2 7 0.1296296
8 D组系数2 22 0.4074074
out_df%%>%分组依据(因子)%%>%汇总(检查=总和(因子属性总和变量1))
#一个tibble:2×2
因子检查
1因素1
2因素2 1

这是可行的,但充其量也很笨重。有没有一种方法可以更优雅地做到这一点(最好是在
dplyr
“管道”中)

要获得组内的比例,只需按要将比例添加到100%的列进行分组即可。因此,在这种情况下,在获得
因子
的每个组合的总和后,再次使用
组_by
,但这次只按
因子
分组,然后计算百分比

library(dplyr)

set.seed(100)
data <- data.frame(
  group = sample(rep(c("Group A", "Group B", "Group C", "Group D"), 4), 16, replace = F),
  factor = sample(rep(c("Factor 1", "Factor 2"), 8), 16, replace = F),
  var1 = sample(1:16)
)

data %>% 
  group_by(group, factor) %>% 
  summarize(sum_var1 = sum(var1)) %>%
  group_by(factor) %>%
  mutate(percent = sum_var1/sum(sum_var1)) %>%
  arrange(factor)
out_df

Source: local data frame [8 x 4]
Groups: group [4]

    group   factor sum_var1 factor_prop_sum_var1
   <fctr>   <fctr>    <int>                <dbl>
1 Group A Factor 1       26            0.3170732
2 Group B Factor 1       17            0.2073171
3 Group C Factor 1       19            0.2317073
4 Group D Factor 1       18            0.2195122
5 Group A Factor 2        8            0.1481481
6 Group B Factor 2       19            0.3518519
7 Group C Factor 2        7            0.1296296
8 Group D Factor 2       22            0.4074074

out_df %>% group_by(factor) %>% summarize(checking = sum(factor_prop_sum_var1))

# A tibble: 2 × 2
    factor checking
    <fctr>    <dbl>
1 Factor 1        1
2 Factor 2        1
library(dplyr)

set.seed(100)
data <- data.frame(
  group = sample(rep(c("Group A", "Group B", "Group C", "Group D"), 4), 16, replace = F),
  factor = sample(rep(c("Factor 1", "Factor 2"), 8), 16, replace = F),
  var1 = sample(1:16)
)

data %>% 
  group_by(group, factor) %>% 
  summarize(sum_var1 = sum(var1)) %>%
  group_by(factor) %>%
  mutate(percent = sum_var1/sum(sum_var1)) %>%
  arrange(factor)
    group   factor sum_var1    percent
1 Group A Factor 1       13 0.25000000
2 Group B Factor 1        8 0.15384615
3 Group C Factor 1       21 0.40384615
4 Group D Factor 1       10 0.19230769
5 Group A Factor 2       20 0.23809524
6 Group B Factor 2       27 0.32142857
7 Group C Factor 2        2 0.02380952
8 Group D Factor 2       35 0.41666667