使用dplyr中的group_by()添加基线/总计
当我按某些属性对数据进行分组时,我想添加一个“总计”行,该行给出比较的基线。让我们按气缸和化油器对MTCAR进行分组,例如:使用dplyr中的group_by()添加基线/总计,r,group-by,dplyr,R,Group By,Dplyr,当我按某些属性对数据进行分组时,我想添加一个“总计”行,该行给出比较的基线。让我们按气缸和化油器对MTCAR进行分组,例如: by_cyl_carb <- mtcars %>% group_by(cyl, carb) %>% summarize(median_mpg = median(mpg), avg_mpg = mean(mpg), count = n()) 11 ttl 1 13.8 13.2
by_cyl_carb <- mtcars %>%
group_by(cyl, carb) %>%
summarize(median_mpg = median(mpg),
avg_mpg = mean(mpg),
count = n())
11 ttl 1 13.8 13.2 6
12 ttl 2 15 15 1
13 ttl 3 19.3 20.4 32
14 ... etc ...
我正在使用的实际例子是按地理位置划分的房屋年平均销售价格。因此,我想报告出我感兴趣的每一个地理年份的销售价格中值,但我想对每一个地理年份进行基线比较
编辑:使用两种解决方案解决
@camille引用了解决问题的方法,@MKR也提供了解决方案。下面是一个可能有效的代码:
by_cyl_carb <- mtcars %>%
mutate_at(vars(c(cyl,carb)), funs(as.character(.))) %>%
bind_rows(mutate(., cyl = "All cylinders")) %>%
bind_rows(mutate(., carb = "All carburetors")) %>%
group_by(cyl, carb) %>%
summarize(median_mpg = median(mpg),
avg_mpg = mean(mpg),
count = n())
> by_cyl_carb
# A tibble: 19 x 5
# Groups: cyl [?]
cyl carb median_mpg avg_mpg count
<chr> <chr> <dbl> <dbl> <int>
1 4 1 27.3 27.6 5
2 4 2 25.2 25.9 6
3 4 All carburetors 26 26.7 11
4 6 1 19.8 19.8 2
5 6 4 20.1 19.8 4
6 6 6 19.7 19.7 1
7 6 All carburetors 19.7 19.7 7
8 8 2 17.1 17.2 4
9 8 3 16.4 16.3 3
10 8 4 13.8 13.2 6
11 8 8 15 15 1
12 8 All carburetors 15.2 15.1 14
13 All cylinders 1 22.8 25.3 7
14 All cylinders 2 22.1 22.4 10
15 All cylinders 3 16.4 16.3 3
16 All cylinders 4 15.2 15.8 10
17 All cylinders 6 19.7 19.7 1
18 All cylinders 8 15 15 1
19 All cylinders All carburetors 19.2 20.1 32
按循环消耗量%
在(变量(c(cyl,carb)),funs(as.character(.))%>%
绑定行(变异(,cyl=“所有圆柱体”)%>%
绑定行(变异(,carb=“所有化油器”)%>%
组别(气缸、碳水化合物)%>%
汇总(平均值=平均值),
平均每加仑=平均每加仑,
计数=n()
>由_cyl _carb
#一个tibble:19x5
#组别:共青团[?]
气缸碳水化合物平均值中位数
1 4 1 27.3 27.6 5
2 4 2 25.2 25.9 6
3.4所有化油器26.7 11
4 6 1 19.8 19.8 2
5 6 4 20.1 19.8 4
6 6 6 19.7 19.7 1
7.6所有化油器19.7 19.7 7
8 8 2 17.1 17.2 4
9 8 3 16.4 16.3 3
10 8 4 13.8 13.2 6
11 8 8 15 15 1
12.8所有化油器15.2 15.1 14
13所有气缸1 22.8 25.3 7
14所有气缸2 22.1 22.4 10
15所有气缸3 16.4 16.3 3
16所有气缸4 15.2 15.8 10
17所有气缸6 19.7 19.7 1
18所有气缸8 15 1
19所有气缸所有化油器19.2 20.1 32
使用dplyr::bind_rows
和mutate_at
的解决方案可以实现为:
library(tidyverse)
mtcars %>%
group_by(cyl, carb) %>%
summarize(median_mpg = median(mpg),
avg_mpg = mean(mpg),
count = n()) %>%
ungroup() %>%
mutate_at(vars(cyl:carb), funs(as.character(.))) %>%
bind_rows(summarise(cyl = "ttl", carb = "ttl", mtcars, median_mpg = median(mpg),
avg_mpg = mean(mpg),
count = n()))
# # A tibble: 10 x 5
# cyl carb median_mpg avg_mpg count
# <chr> <chr> <dbl> <dbl> <int>
# 1 4 1 27.3 27.6 5
# 2 4 2 25.2 25.9 6
# 3 6 1 19.8 19.8 2
# 4 6 4 20.1 19.8 4
# 5 6 6 19.7 19.7 1
# 6 8 2 17.1 17.2 4
# 7 8 3 16.4 16.3 3
# 8 8 4 13.8 13.2 6
# 9 8 8 15.0 15.0 1
#10 ttl ttl 19.2 20.1 32
库(tidyverse)
mtcars%>%
组别(气缸、碳水化合物)%>%
汇总(平均值=平均值),
平均每加仑=平均每加仑,
计数=n())%>%
解组()%>%
变异在(变量(循环:carb),funs(如字符(%))%>%
绑定行(汇总(cyl=“ttl”,carb=“ttl”,mtcars,中位数=中位数(mpg),
平均每加仑=平均每加仑,
计数=n())
##A tible:10 x 5
#气缸碳水化合物平均值中位数
#
# 1 4 1 27.3 27.6 5
# 2 4 2 25.2 25.9 6
# 3 6 1 19.8 19.8 2
# 4 6 4 20.1 19.8 4
# 5 6 6 19.7 19.7 1
# 6 8 2 17.1 17.2 4
# 7 8 3 16.4 16.3 3
# 8 8 4 13.8 13.2 6
# 9 8 8 15.0 15.0 1
#10 ttl ttl 19.2 20.1 32
不是重复的,但我最近发布了一个问题,涉及类似的问题:我的问题更多的是关于管道语法,但这可能有助于解决问题!非常感谢:)
by_cyl_carb <- mtcars %>%
mutate_at(vars(c(cyl,carb)), funs(as.character(.))) %>%
bind_rows(mutate(., cyl = "All cylinders")) %>%
bind_rows(mutate(., carb = "All carburetors")) %>%
group_by(cyl, carb) %>%
summarize(median_mpg = median(mpg),
avg_mpg = mean(mpg),
count = n())
> by_cyl_carb
# A tibble: 19 x 5
# Groups: cyl [?]
cyl carb median_mpg avg_mpg count
<chr> <chr> <dbl> <dbl> <int>
1 4 1 27.3 27.6 5
2 4 2 25.2 25.9 6
3 4 All carburetors 26 26.7 11
4 6 1 19.8 19.8 2
5 6 4 20.1 19.8 4
6 6 6 19.7 19.7 1
7 6 All carburetors 19.7 19.7 7
8 8 2 17.1 17.2 4
9 8 3 16.4 16.3 3
10 8 4 13.8 13.2 6
11 8 8 15 15 1
12 8 All carburetors 15.2 15.1 14
13 All cylinders 1 22.8 25.3 7
14 All cylinders 2 22.1 22.4 10
15 All cylinders 3 16.4 16.3 3
16 All cylinders 4 15.2 15.8 10
17 All cylinders 6 19.7 19.7 1
18 All cylinders 8 15 15 1
19 All cylinders All carburetors 19.2 20.1 32
library(tidyverse)
mtcars %>%
group_by(cyl, carb) %>%
summarize(median_mpg = median(mpg),
avg_mpg = mean(mpg),
count = n()) %>%
ungroup() %>%
mutate_at(vars(cyl:carb), funs(as.character(.))) %>%
bind_rows(summarise(cyl = "ttl", carb = "ttl", mtcars, median_mpg = median(mpg),
avg_mpg = mean(mpg),
count = n()))
# # A tibble: 10 x 5
# cyl carb median_mpg avg_mpg count
# <chr> <chr> <dbl> <dbl> <int>
# 1 4 1 27.3 27.6 5
# 2 4 2 25.2 25.9 6
# 3 6 1 19.8 19.8 2
# 4 6 4 20.1 19.8 4
# 5 6 6 19.7 19.7 1
# 6 8 2 17.1 17.2 4
# 7 8 3 16.4 16.3 3
# 8 8 4 13.8 13.2 6
# 9 8 8 15.0 15.0 1
#10 ttl ttl 19.2 20.1 32