按R中的子组百分比汇总
我有这样一个数据集:按R中的子组百分比汇总,r,aggregate,percentage,R,Aggregate,Percentage,我有这样一个数据集: df = data.frame(group = c(rep('A',4), rep('B',3)), subgroup = c('a', 'b', 'c', 'd', 'a', 'b', 'c'), value = c(1,4,2,1,1,2,3)) group | subgroup | value ------------------------ A | a | 1 A |
df = data.frame(group = c(rep('A',4), rep('B',3)),
subgroup = c('a', 'b', 'c', 'd', 'a', 'b', 'c'),
value = c(1,4,2,1,1,2,3))
group | subgroup | value
------------------------
A | a | 1
A | b | 4
A | c | 2
A | d | 1
B | a | 1
B | b | 2
B | c | 3
我想要的是得到每个组中每个子组的值的百分比,即输出应为:
group | subgroup | percent
------------------------
A | a | 0.125
A | b | 0.500
A | c | 0.250
A | d | 0.125
B | a | 0.167
B | b | 0.333
B | c | 0.500
例如A组,A亚组:值为1,整个A组的总和为8(A=1,b=4,c=2,d=1)-因此1/8=0.125
到目前为止,我只找到了相当简单的聚合,如,但我无法找出如何进行“除以子组内的和”部分。根据您的评论,如果子组是唯一的,您可以这样做
library(dplyr)
group_by(df, group) %>% mutate(percent = value/sum(value))
# group subgroup value percent
# 1 A a 1 0.1250000
# 2 A b 4 0.5000000
# 3 A c 2 0.2500000
# 4 A d 1 0.1250000
# 5 B a 1 0.1666667
# 6 B b 2 0.3333333
# 7 B c 3 0.5000000
或者要删除值
列,同时添加百分比
列,请使用transmute
group_by(df, group) %>% transmute(subgroup, percent = value/sum(value))
# group subgroup percent
# 1 A a 0.1250000
# 2 A b 0.5000000
# 3 A c 0.2500000
# 4 A d 0.1250000
# 5 B a 0.1666667
# 6 B b 0.3333333
# 7 B c 0.5000000
我们可以使用
prop.table
计算百分比/比率
基准R:
transform(df, percent = ave(value, group, FUN = prop.table))
# group subgroup value percent
#1 A a 1 0.125
#2 A b 4 0.500
#3 A c 2 0.250
#4 A d 1 0.125
#5 B a 1 0.167
#6 B b 2 0.333
#7 B c 3 0.500
dplyr
:
library(dplyr)
df %>% group_by(group) %>% mutate(percent = prop.table(value))
数据表
:
library(data.table)
setDT(df)[, percent := prop.table(value), group]