按R中的子组百分比汇总_R_Aggregate_Percentage

按R中的子组百分比汇总

按R中的子组百分比汇总,r,aggregate,percentage,R,Aggregate,Percentage,我有这样一个数据集： df = data.frame(group = c(rep('A',4), rep('B',3)), subgroup = c('a', 'b', 'c', 'd', 'a', 'b', 'c'), value = c(1,4,2,1,1,2,3)) group | subgroup | value ------------------------ A | a | 1 A |

我有这样一个数据集：

df = data.frame(group = c(rep('A',4), rep('B',3)),
                subgroup = c('a', 'b', 'c', 'd', 'a', 'b', 'c'),
                value = c(1,4,2,1,1,2,3))


group | subgroup | value
------------------------
  A   |    a     |  1
  A   |    b     |  4
  A   |    c     |  2
  A   |    d     |  1
  B   |    a     |  1
  B   |    b     |  2
  B   |    c     |  3

我想要的是得到每个组中每个子组的值的百分比，即输出应为：

group | subgroup | percent
------------------------
  A   |    a     |  0.125
  A   |    b     |  0.500
  A   |    c     |  0.250
  A   |    d     |  0.125
  B   |    a     |  0.167
  B   |    b     |  0.333
  B   |    c     |  0.500

例如A组，A亚组：值为1，整个A组的总和为8（A=1，b=4，c=2，d=1）-因此1/8=0.125

到目前为止，我只找到了相当简单的聚合，如，但我无法找出如何进行“除以子组内的和”部分。

根据您的评论，如果子组是唯一的，您可以这样做

library(dplyr)
group_by(df, group) %>% mutate(percent = value/sum(value))
#   group subgroup value   percent
# 1     A        a     1 0.1250000
# 2     A        b     4 0.5000000
# 3     A        c     2 0.2500000
# 4     A        d     1 0.1250000
# 5     B        a     1 0.1666667
# 6     B        b     2 0.3333333
# 7     B        c     3 0.5000000

或者要删除

值

列，同时添加

百分比

列，请使用

transmute

group_by(df, group) %>% transmute(subgroup, percent = value/sum(value))
#   group subgroup   percent
# 1     A        a 0.1250000
# 2     A        b 0.5000000
# 3     A        c 0.2500000
# 4     A        d 0.1250000
# 5     B        a 0.1666667
# 6     B        b 0.3333333
# 7     B        c 0.5000000

我们可以使用

prop.table

计算百分比/比率

基准R：

transform(df, percent = ave(value, group, FUN = prop.table))

#  group subgroup value percent
#1     A        a     1   0.125
#2     A        b     4   0.500
#3     A        c     2   0.250
#4     A        d     1   0.125
#5     B        a     1   0.167
#6     B        b     2   0.333
#7     B        c     3   0.500

dplyr

：

library(dplyr)
df %>% group_by(group) %>% mutate(percent = prop.table(value))

数据表

：

library(data.table)
setDT(df)[, percent := prop.table(value), group]