对于带有dplyr的循环,总结返回的结果与group_不同
在对于带有dplyr的循环,总结返回的结果与group_不同,r,for-loop,ggplot2,dplyr,R,For Loop,Ggplot2,Dplyr,在dplyr摘要函数上应用for循环时,我得到了奇怪的结果-不确定为什么或如何修复它 test <- data.frame(title = c("a", "b", "c","a","b","c", "a", "b", "c","a","b","c"), category = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
dplyr
摘要函数上应用for
循环时,我得到了奇怪的结果-不确定为什么或如何修复它
test <- data.frame(title = c("a", "b", "c","a","b","c", "a", "b", "c","a","b","c"),
category = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
sex = c("m", "m", "m", "f", "f", "f", "m", "m", "m", "f", "f", "f"),
salary = c(50,70,90,40,60,85, 220,270,350,180,200,330))
category_list <- unique(test$category)
tmp = list()
for (category in category_list) {
# Create an average salary line for the category
tmp[category] <- test %>%
filter(category == category) %>%
summarise(mean(salary))
print(tmp)
}
其中,group_by()
函数返回适当的结果:
test %>% group_by(category) %>% summarise(mean(salary))
# A tibble: 2 x 2
category `mean(salary)`
<fct> <dbl>
1 A 65.8
2 B 258.
所以可能是category\u list
对象出了问题?
令人惊讶的是,当我调用category\u列表的第一个元素时,我也得到了正确的答案:
test %>%
+ filter(category == category_list[1]) %>%
+ summarise(mean(salary))
mean(salary)
1 65.83333
我之所以想弄明白这一点(而不是使用groupby
),是因为我正在尝试创建一个脚本,该脚本将创建许多ggplot对象,然后这些对象将与gridExtra
库相结合
也许我错了,可以使用groupby
,但我能想到的唯一方法是使用以下伪代码:
- 1) 通过
category
创建要在geom_hline()
参数中使用的方法列表
- 2) 按
类别将数据框对象子集,每个子集将在ggplot中与其geom_hline()一起使用
- 3) 为每个
类别创建打印对象列表
- 4) 使用
for
循环外部的grid.arrange()
库中的grid.arrange()
将每个绘图组合在一起
这是我目前的代码(不起作用):
库(gridExtra)
p=列表()
平均线=列表()
tmp=list()
category_data=data.frame()
对于(类别列表中的类别){
#为类别创建平均工资线
tmp[[类别]]%
过滤器(类别==类别)%>%
总结(平均(工资))
avg_line[[category]]for循环中的问题是语句filter(category==category)
。这总是正确的,因为它两次都从数据中提取category
。如果确实需要for循环,只需重命名for循环中的迭代器
但是,您根本不需要网格。排列。facet\u wrap
提供您想要的内容(您可能需要对facet标签进行一些重新格式化,这些标签使用以strip
开头的主题元素进行控制):
category_表示%
组别(类别)%>%
汇总(变量(工资),平均值)
p%
#组别(类别)%>%
ggplot(aes(x=头衔,y=薪水,颜色=性别))+
镶嵌面包裹(~category,nrow=1,scales=“free_y”)+
几何图形线(颜色=‘白色’)+
几何点()
比例颜色手册(值=c(“#F49171“,“#81C19C”))+
geom_hline(数据=类别意味着,aes(yintercept=薪水),颜色=白色,阿尔法=0.6,大小=1)+
主题(legend.position=“无”,
panel.background=element_rect(color=“#242B47”,fill=“#242B47”),
plot.background=element_rect(color=“#242B47”,fill=“#242B47”),
axis.line=元素\线(color=“grey48”,size=0.05,linetype=“虚线”),
axis.text=元素\文本(family=“乔治亚”,color=“白色”),
轴.text.x=元素_文本(角度=90),
#去掉y轴和x轴标题
axis.title.y=元素_blank(),
axis.title.x=元素_blank(),
panel.grid.major.y=元素线(color=“grey48”,size=0.05),
panel.grid.minor.y=元素_blank(),
panel.grid.major.x=element\u blank()
P
您不能执行类似于yintercept=mean(类别数据[[category]]$salary)的操作吗
而不是麻烦地创建一个新的数据集?老实说,如果我通过split
将事物拆分为一个数据列表,然后使用lappy
或purr::map
循环进行绘图,我发现这类任务最简单。下面是一个split
-map
示例,如果你要采取不同的策略:这个过滤器是怎么回事?过滤器(category==category)。你要把category和它自己进行比较,当然答案是一样的。
test %>%
filter(category == "A") %>%
summarise(mean(salary))
mean(salary)
1 65.83333
test %>%
+ filter(category == category_list[1]) %>%
+ summarise(mean(salary))
mean(salary)
1 65.83333
library(gridExtra)
p = list()
avg_line = list()
tmp = list()
category_data = data.frame()
for (category in category_list) {
# Create an average salary line for the category
tmp[[category]] <- test %>%
filter(category == category) %>%
summarise(mean(salary))
avg_line[[category]] <- tmp[[2]]
# Subset data frame on category
category_data[[category]] <- test %>% filter(category == category)
# Make plots for each category
p[[category]] <-
ggplot(category_data[[category]], aes(x = title, y = salary)) +
geom_line(color = "white") +
geom_point(aes(color =sex)) +
scale_color_manual(values = c("#F49171", "#81C19C")) +
geom_hline(yintercept = avg_line[[category]], color = "white", alpha = 0.6, size = 1) +
theme(legend.position = "none",
panel.background = element_rect(color = "#242B47", fill = "#242B47"),
plot.background = element_rect(color = "#242B47", fill = "#242B47"),
axis.line = element_line(color = "grey48", size = 0.05, linetype = "dotted"),
axis.text = element_text(family = "Georgia", color = "white"),
axis.text.x = element_text(angle = 90),
# Get rid of the y- and x-axis titles
axis.title.y=element_blank(),
axis.title.x=element_blank(),
panel.grid.major.y = element_line(color = "grey48", size = 0.05),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank())
}
grid.arrange(grobs = p, nrow = 1)
category_means <- test %>%
group_by(category) %>%
summarize_at(vars(salary), mean)
p <- test %>%
# group_by(category) %>%
ggplot(aes(x = title, y = salary, color = sex)) +
facet_wrap(~ category, nrow = 1, scales = "free_y") +
geom_line(color = 'white') +
geom_point() +
scale_color_manual(values = c("#F49171", "#81C19C")) +
geom_hline(data = category_means, aes(yintercept = salary), color = 'white', alpha = 0.6, size = 1) +
theme(legend.position = "none",
panel.background = element_rect(color = "#242B47", fill = "#242B47"),
plot.background = element_rect(color = "#242B47", fill = "#242B47"),
axis.line = element_line(color = "grey48", size = 0.05, linetype = "dotted"),
axis.text = element_text(family = "Georgia", color = "white"),
axis.text.x = element_text(angle = 90),
# Get rid of the y- and x-axis titles
axis.title.y=element_blank(),
axis.title.x=element_blank(),
panel.grid.major.y = element_line(color = "grey48", size = 0.05),
panel.grid.minor.y = element_blank(),
panel.grid.major.x = element_blank())
p