R:ddply-通过将字符串指定为变量名来聚合数据
我得到了一个包含多个列的大数据集。例如R:ddply-通过将字符串指定为变量名来聚合数据,r,ggplot2,dplyr,aggregate,R,Ggplot2,Dplyr,Aggregate,我得到了一个包含多个列的大数据集。例如 set.seed(1) x <- 1:15 y <- letters[1:3][sample(1:3, 15, replace = T)] z <- letters[10:13][sample(1:3, 15, replace = T)] r <- letters[20:24][sample(1:3, 15, replace = T)] df <- data.frame("Number"=x, "Section"=y,"Cha
set.seed(1)
x <- 1:15
y <- letters[1:3][sample(1:3, 15, replace = T)]
z <- letters[10:13][sample(1:3, 15, replace = T)]
r <- letters[20:24][sample(1:3, 15, replace = T)]
df <- data.frame("Number"=x, "Section"=y,"Chapter"=z,"Rating"=r)
dput(df)
structure(list(Number = 1:15, Area = structure(c(1L, 2L, 2L, 3L, 1L, 3L, 3L, 2L, 2L, 1L, 1L, 1L, 3L, 2L, 3L), .Label = c("a", "b", "c"), class = "factor"), Section = structure(c(2L, 3L, 3L, 2L, 3L, 3L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 3L, 2L), .Label = c("j", "k", "l"), class = "factor"), Rating = structure(c(2L, 2L, 2L, 1L, 3L, 3L, 3L, 1L, 3L, 2L, 3L, 2L, 3L, 2L, 2L), .Label = c("A", "B", "C"), class = "factor")), class = "data.frame", row.names = c(NA,-15L))
现在的问题是,“get(Category)”现在被视为一个新列
get.Category. Number Area Section Rating freq rel_freq
1 k 4 c k A 1 0.5000000
2 k 8 b k A 1 0.5000000
3 j 10 a j B 1 0.1428571
4 j 12 a j B 1 0.1428571
5 k 1 a k B 1 0.1428571
6 k 15 c k B 1 0.1428571
7 l 2 b l B 1 0.1428571
此外,数字列应加总,例如,其他类别(此处:区域)应删除,我们应在“k”部分中仅有一行,评级为“A” 我们可以使用
count
通过在转换为符号(sym
)并计算(!!
)之后计算对象标识符“Category”,来获得列“Section”的频率。在ggplot
语法中,aes
也可以采用符号,并且可以像前面一样进行计算
library(tidyverse)
library(scales)
library(ggplot2)
df %>%
count(!! rlang::sym(Category), Rating) %>%
group_by(Rating) %>%
mutate(rel_freq = n/sum(n)) %>%
ggplot(., aes(x =Rating, y = rel_freq, fill = !! rlang::sym(Category))) +
geom_bar(position = "fill",stat = "identity",color="black") +
scale_y_continuous(labels = percent_format())+
labs(x = "Rating", y="Relative Frequency")
-输出
请检查您是否使用了相同的数据来创建显示的图形,因为带有
样本的“评级”列有不同的级别,使用baseR:计数(!!as.name(Category),Rating)
get.Category. Number Area Section Rating freq rel_freq
1 k 4 c k A 1 0.5000000
2 k 8 b k A 1 0.5000000
3 j 10 a j B 1 0.1428571
4 j 12 a j B 1 0.1428571
5 k 1 a k B 1 0.1428571
6 k 15 c k B 1 0.1428571
7 l 2 b l B 1 0.1428571
library(tidyverse)
library(scales)
library(ggplot2)
df %>%
count(!! rlang::sym(Category), Rating) %>%
group_by(Rating) %>%
mutate(rel_freq = n/sum(n)) %>%
ggplot(., aes(x =Rating, y = rel_freq, fill = !! rlang::sym(Category))) +
geom_bar(position = "fill",stat = "identity",color="black") +
scale_y_continuous(labels = percent_format())+
labs(x = "Rating", y="Relative Frequency")