对于带有dplyr的循环,总结返回的结果与group_不同

对于带有dplyr的循环,总结返回的结果与group_不同,r,for-loop,ggplot2,dplyr,R,For Loop,Ggplot2,Dplyr,在dplyr摘要函数上应用for循环时,我得到了奇怪的结果-不确定为什么或如何修复它 test <- data.frame(title = c("a", "b", "c","a","b","c", "a", "b", "c","a","b","c"), category = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),

dplyr
摘要函数上应用
for
循环时,我得到了奇怪的结果-不确定为什么或如何修复它

test <- data.frame(title = c("a", "b", "c","a","b","c", "a", "b", "c","a","b","c"),
                       category = c("A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "B", "B"),
                       sex = c("m", "m", "m", "f", "f", "f", "m", "m", "m", "f", "f", "f"),
                       salary = c(50,70,90,40,60,85, 220,270,350,180,200,330))

category_list <- unique(test$category)

tmp = list()

for (category in category_list) {
  # Create an average salary line for the category
  tmp[category] <- test %>% 
    filter(category == category) %>%
    summarise(mean(salary))
  print(tmp)
}
其中,
group_by()
函数返回适当的结果:

    test %>% group_by(category) %>% summarise(mean(salary))
# A tibble: 2 x 2
  category `mean(salary)`
  <fct>             <dbl>
1 A                  65.8
2 B                 258.
所以可能是
category\u list
对象出了问题? 令人惊讶的是,当我调用
category\u列表的第一个元素时,我也得到了正确的答案:

test %>% 
+     filter(category == category_list[1]) %>%
+     summarise(mean(salary))
  mean(salary)
1     65.83333
我之所以想弄明白这一点(而不是使用
groupby
),是因为我正在尝试创建一个脚本,该脚本将创建许多ggplot对象,然后这些对象将与
gridExtra
库相结合

也许我错了,可以使用
groupby
,但我能想到的唯一方法是使用以下伪代码:

  • 1) 通过
    category
    创建要在
    geom_hline()
    参数中使用的方法列表
  • 2) 按
    类别将数据框对象子集,每个子集将在ggplot中与其
    geom_hline()一起使用
  • 3) 为每个
    类别创建打印对象列表
  • 4) 使用
    for
    循环外部的
    grid.arrange()
    库中的
    grid.arrange()
    将每个绘图组合在一起
这是我目前的代码(不起作用):

库(gridExtra)
p=列表()
平均线=列表()
tmp=list()
category_data=data.frame()
对于(类别列表中的类别){
#为类别创建平均工资线
tmp[[类别]]%
过滤器(类别==类别)%>%
总结(平均(工资))

avg_line[[category]]for循环中的问题是语句
filter(category==category)
。这总是正确的,因为它两次都从数据中提取
category
。如果确实需要for循环,只需重命名for循环中的迭代器

但是,您根本不需要
网格。排列
facet\u wrap
提供您想要的内容(您可能需要对facet标签进行一些重新格式化,这些标签使用以
strip
开头的主题元素进行控制):

category_表示%
组别(类别)%>%
汇总(变量(工资),平均值)
p%
#组别(类别)%>%
ggplot(aes(x=头衔,y=薪水,颜色=性别))+
镶嵌面包裹(~category,nrow=1,scales=“free_y”)+
几何图形线(颜色=‘白色’)+
几何点()
比例颜色手册(值=c(“#F49171“,“#81C19C”))+
geom_hline(数据=类别意味着,aes(yintercept=薪水),颜色=白色,阿尔法=0.6,大小=1)+
主题(legend.position=“无”,
panel.background=element_rect(color=“#242B47”,fill=“#242B47”),
plot.background=element_rect(color=“#242B47”,fill=“#242B47”),
axis.line=元素\线(color=“grey48”,size=0.05,linetype=“虚线”),
axis.text=元素\文本(family=“乔治亚”,color=“白色”),
轴.text.x=元素_文本(角度=90),
#去掉y轴和x轴标题
axis.title.y=元素_blank(),
axis.title.x=元素_blank(),
panel.grid.major.y=元素线(color=“grey48”,size=0.05),
panel.grid.minor.y=元素_blank(),
panel.grid.major.x=element\u blank()
P

您不能执行类似于
yintercept=mean(类别数据[[category]]$salary)的操作吗
而不是麻烦地创建一个新的数据集?老实说,如果我通过
split
将事物拆分为一个数据列表,然后使用
lappy
purr::map
循环进行绘图,我发现这类任务最简单。下面是一个
split
-
map
示例,如果你要采取不同的策略:这个过滤器是怎么回事?过滤器(category==category)。你要把category和它自己进行比较,当然答案是一样的。
test %>% 
        filter(category == "A") %>%
        summarise(mean(salary))
      mean(salary)
1     65.83333
test %>% 
+     filter(category == category_list[1]) %>%
+     summarise(mean(salary))
  mean(salary)
1     65.83333
library(gridExtra)
p = list()
avg_line = list()
tmp = list()
category_data = data.frame()
for (category in category_list) {
  # Create an average salary line for the category
  tmp[[category]] <- test %>% 
    filter(category == category) %>%
    summarise(mean(salary))
  avg_line[[category]] <- tmp[[2]]

  # Subset data frame on category 
  category_data[[category]] <- test %>% filter(category == category)

  # Make plots for each category
  p[[category]] <-
    ggplot(category_data[[category]], aes(x = title, y = salary)) +
  geom_line(color = "white") +
  geom_point(aes(color =sex)) +
  scale_color_manual(values = c("#F49171", "#81C19C")) +
  geom_hline(yintercept = avg_line[[category]], color = "white", alpha = 0.6, size = 1) +
  theme(legend.position = "none",
      panel.background = element_rect(color = "#242B47", fill = "#242B47"),
      plot.background = element_rect(color = "#242B47", fill = "#242B47"),
      axis.line = element_line(color = "grey48", size = 0.05, linetype = "dotted"),
      axis.text = element_text(family = "Georgia", color = "white"),
      axis.text.x = element_text(angle = 90),
      # Get rid of the y- and x-axis titles
      axis.title.y=element_blank(),
      axis.title.x=element_blank(),
      panel.grid.major.y = element_line(color = "grey48", size = 0.05),
      panel.grid.minor.y = element_blank(),
      panel.grid.major.x = element_blank())
}

grid.arrange(grobs = p, nrow = 1)
category_means <- test %>% 
  group_by(category) %>%
  summarize_at(vars(salary), mean)

p <- test %>%
  # group_by(category) %>%
  ggplot(aes(x = title, y = salary, color = sex)) + 
  facet_wrap(~ category, nrow = 1, scales = "free_y") +  
  geom_line(color = 'white') + 
  geom_point() + 
  scale_color_manual(values = c("#F49171", "#81C19C")) +
  geom_hline(data = category_means, aes(yintercept = salary), color = 'white', alpha = 0.6, size = 1) + 
  theme(legend.position = "none",
    panel.background = element_rect(color = "#242B47", fill = "#242B47"),
    plot.background = element_rect(color = "#242B47", fill = "#242B47"),
    axis.line = element_line(color = "grey48", size = 0.05, linetype = "dotted"),
    axis.text = element_text(family = "Georgia", color = "white"),
    axis.text.x = element_text(angle = 90),
    # Get rid of the y- and x-axis titles
    axis.title.y=element_blank(),
    axis.title.x=element_blank(),
    panel.grid.major.y = element_line(color = "grey48", size = 0.05),
    panel.grid.minor.y = element_blank(),
    panel.grid.major.x = element_blank())
p