R:ddply-通过将字符串指定为变量名来聚合数据

R:ddply-通过将字符串指定为变量名来聚合数据,r,ggplot2,dplyr,aggregate,R,Ggplot2,Dplyr,Aggregate,我得到了一个包含多个列的大数据集。例如 set.seed(1) x <- 1:15 y <- letters[1:3][sample(1:3, 15, replace = T)] z <- letters[10:13][sample(1:3, 15, replace = T)] r <- letters[20:24][sample(1:3, 15, replace = T)] df <- data.frame("Number"=x, "Section"=y,"Cha

我得到了一个包含多个列的大数据集。例如

set.seed(1)
x <- 1:15
y <- letters[1:3][sample(1:3, 15, replace = T)]
z <- letters[10:13][sample(1:3, 15, replace = T)]
r <- letters[20:24][sample(1:3, 15, replace = T)]
df <- data.frame("Number"=x, "Section"=y,"Chapter"=z,"Rating"=r)
dput(df)

structure(list(Number = 1:15, Area = structure(c(1L, 2L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 1L, 1L, 1L, 3L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), Section = structure(c(2L, 3L, 3L, 2L, 3L, 3L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 2L), .Label = c("j", "k", "l"), class = "factor"), Rating = structure(c(2L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 2L), .Label = c("A", "B", "C"), class = "factor")), class = "data.frame", row.names = c(NA,-15L))

现在的问题是,“get(Category)”现在被视为一个新列

    get.Category. Number Area Section Rating freq  rel_freq
1              k      4    c       k      A    1 0.5000000
2              k      8    b       k      A    1 0.5000000
3              j     10    a       j      B    1 0.1428571
4              j     12    a       j      B    1 0.1428571
5              k      1    a       k      B    1 0.1428571
6              k     15    c       k      B    1 0.1428571
7              l      2    b       l      B    1 0.1428571

此外,数字列应加总,例如,其他类别(此处:区域)应删除,我们应在“k”部分中仅有一行,评级为“A”

我们可以使用
count
通过在转换为符号(
sym
)并计算(
!!
)之后计算对象标识符“Category”,来获得列“Section”的频率。在
ggplot
语法中,
aes
也可以采用符号,并且可以像前面一样进行计算

library(tidyverse)
library(scales)
library(ggplot2)
df %>% 
    count(!! rlang::sym(Category), Rating) %>%
    group_by(Rating) %>% 
    mutate(rel_freq = n/sum(n)) %>%
    ggplot(., aes(x =Rating, y = rel_freq, fill = !! rlang::sym(Category))) + 
    geom_bar(position = "fill",stat = "identity",color="black") + 
    scale_y_continuous(labels = percent_format())+ 
    labs(x = "Rating", y="Relative Frequency")
-输出


请检查您是否使用了相同的数据来创建显示的图形,因为带有
样本的“评级”列有不同的级别,使用baseR:
计数(!!as.name(Category),Rating)
    get.Category. Number Area Section Rating freq  rel_freq
1              k      4    c       k      A    1 0.5000000
2              k      8    b       k      A    1 0.5000000
3              j     10    a       j      B    1 0.1428571
4              j     12    a       j      B    1 0.1428571
5              k      1    a       k      B    1 0.1428571
6              k     15    c       k      B    1 0.1428571
7              l      2    b       l      B    1 0.1428571
library(tidyverse)
library(scales)
library(ggplot2)
df %>% 
    count(!! rlang::sym(Category), Rating) %>%
    group_by(Rating) %>% 
    mutate(rel_freq = n/sum(n)) %>%
    ggplot(., aes(x =Rating, y = rel_freq, fill = !! rlang::sym(Category))) + 
    geom_bar(position = "fill",stat = "identity",color="black") + 
    scale_y_continuous(labels = percent_format())+ 
    labs(x = "Rating", y="Relative Frequency")