如何在R中对名称重复的行进行分组?

如何在R中对名称重复的行进行分组?,r,dataframe,group-by,data-visualization,subset,R,Dataframe,Group By,Data Visualization,Subset,我对R很陌生,正在为子集数据集而挣扎。 这就是数据集的来源以及我如何清理它 board_game_original<- read.csv("https://raw.githubusercontent.com/bryandmartin/STAT302/master/docs/Projects/project1_bgdataviz/board_game_raw.csv") #tidy up the column of mechanic and category with

我对R很陌生,正在为子集数据集而挣扎。 这就是数据集的来源以及我如何清理它

board_game_original<- read.csv("https://raw.githubusercontent.com/bryandmartin/STAT302/master/docs/Projects/project1_bgdataviz/board_game_raw.csv")

#tidy up the column of mechanic and category with cSplit function
library(splitstackshape)
mechanic <- board_game$mechanic
board_game_tidy <- cSplit(board_game,splitCols=c("mechanic","category"), sep = ",", direction = "long")

board\u game\u original你非常接近。您需要
dplyr::summary()

complexity\u top\u 5\u类别%
组别(类别)%>%
dplyr::总结(平均平均复杂度=平均复杂度,na.rm=真))%>%
顶级(5,平均复杂度)
#选择(平均复杂性)%>%#您不需要此选项
#过滤器(类别==c(“抽象战略行动/灵活性”、“冒险”、“理性时代”、“美国内战”))
复杂度排名前五
您不必在
summary()之前包含
dplyr::
。但是,其他一些常用软件包也有其summary()版本,因此更安全的做法是针对特定的软件包


您可以使用
top\u n()
自动选择前n个类别,而不是使用
filter()
filter
前5个类别的值,然后
类别
分组,并取
平均复杂度的
平均值

library(dplyr)

board_game_tidy %>% 
  filter(category %in% names(top_5_category)) %>%
  group_by(category) %>%
  summarise(average_complexity = mean(average_complexity))

# category           average_complexity
#  <fct>                           <dbl>
#1 Abstract Strategy               0.844
#2 Action / Dexterity              0.469
#3 Adventure                       1.25 
#4 Age of Reason                   1.95 
#5 American Civil War              1.68 
库(dplyr)
棋盘游戏整洁%>%
筛选器(类别%in%名称(前5个类别))%>%
组别(类别)%>%
总结(平均复杂度=平均复杂度)
#类别平均复杂度
#                             
#1抽象策略0.844
#2动作/灵巧度0.469
#3.1.25
#4理性年龄1.95
#5美国内战1.68

可能是
过滤器(类别%c(…)
?您好,谢谢您的回答!我尝试了你的代码,结果显示:错误:
filter()
input
.1
有问题。找不到x对象“top_5_类别”ℹ 输入
.1
类别%中的%names(top_5_category)
@harperzhu
top_5_category
出现在您的帖子中。你跑了吗?
complexity_top_5_category <- board_game_tidy %>% 
        group_by(category) %>%
        dplyr::summarise(mean_average_complexity = mean(average_complexity, na.rm=TRUE)) %>% 
        top_n(5, mean_average_complexity) 
        #select(average_complexity) %>% # you don't need this
        #filter(category == c("Abstract Strategy Action / Dexterity", "Adventure", "Age of Reason","American Civil War "))
complexity_top_5_category
library(dplyr)

board_game_tidy %>% 
  filter(category %in% names(top_5_category)) %>%
  group_by(category) %>%
  summarise(average_complexity = mean(average_complexity))

# category           average_complexity
#  <fct>                           <dbl>
#1 Abstract Strategy               0.844
#2 Action / Dexterity              0.469
#3 Adventure                       1.25 
#4 Age of Reason                   1.95 
#5 American Civil War              1.68