使用dplyr选择前n个组,然后绘制其他变量
我有一个数据集,我试图通过计算一个类别来选择前n个,但随后使用数据集中的其他变量进行绘图——基本上是前n个的一个聚合级别,但需要返回完整数据以在使用dplyr选择前n个组,然后绘制其他变量,r,ggplot2,dplyr,R,Ggplot2,Dplyr,我有一个数据集,我试图通过计算一个类别来选择前n个,但随后使用数据集中的其他变量进行绘图——基本上是前n个的一个聚合级别,但需要返回完整数据以在ggplot中绘图 因此,在下面的问题中,我需要两个最常见的examNames,然后按年数对它们进行绘图和facetwrap ap <- tribble( ~year, ~examName, 2014, "Statistics", 2015, "Statistics",
ggplot
中绘图
因此,在下面的问题中,我需要两个最常见的examName
s,然后按年数对它们进行绘图和facetwrap
ap <-
tribble(
~year, ~examName,
2014, "Statistics",
2015, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2016, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2017, "Statistics",
2013, "Macroeconomics",
2013, "Macroeconomics",
2014, "Macroeconomics",
2015, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2016, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2017, "Macroeconomics",
2013, "Calculus",
2014, "Calculus",
2015, "Calculus",
2016, "Calculus",
2017, "Calculus",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2017, "Psychology",
2018, "Psychology",
2018, "Psychology")
ap_top <- ap %>%
count(examName, sort = TRUE) %>%
head(2) %>%
inner_join(ap, by = "examName") %>%
select(-n)
ap_top %>%
count(examName, year) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
ap%
总分(2)%>%
内部联接(ap,by=“examName”)%>%
选择(-n)
ap_top%>%
计数(示例名称,年份)%>%
ggplot(不良事件(x=年份,y=n,组=examName))+
geom_线()+
面包(~examName)
我的想法是获得我的前n名,然后将内部连接
返回到原始数据集。然后用它来绘图;本质上使用内部联接作为过滤器
我知道有更好的方法可以做到这一点,我希望有一个更优雅的解决方案!我洗耳恭听!给出了示例数据集(抱歉,太长了)。您不需要internal\u join()
我只需在一个单独的语句中确定前两个测试,然后对它们进行筛选
top_exams <- count(ap, examName) %>%
top_n(2, n) %>% pull(examName)
ap %>%
filter(examName %in% top_exams) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
top\u考试%
顶部n(2,n)%>%pull(examName)
ap%>%
筛选(在%top\U考试中的examName%%%>%
计数(年份,考试名称)%>%
ggplot(不良事件(x=年份,y=n,组=examName))+
geom_线()+
面包(~examName)
另一种可能性:
ap %>%
group_by(examName) %>%
mutate(temp = n()) %>%
ungroup() %>%
mutate(temp = dense_rank(desc(temp))) %>%
filter(temp %in% c(1,2)) %>%
select(-temp) %>%
count(year, examName) %>%
ggplot(aes(x = year, y = n, group = examName)) +
geom_line() +
facet_wrap(~ examName)
它根据“examName”统计案例并对计数进行排序。然后,它过滤计数最大和次大的案例 此解决方案的优点在于,您可以使用密集排列
,例如在fct\u重新排序
中使用它进行绘图排序。