使用R按类别条件求和变量_R

使用R按类别条件求和变量

使用R按类别条件求和变量,r,R,我有一个数据框，显示了每年的出版物数量。但我只对会议期刊和出版物感兴趣。我想总结一下其他类型中的所有其他类别数据帧的示例： year type n 1994 Conference 2 1994 Journal 3 1995 Conference 10 1995 Editorship 3 1996 Conferenc

我有一个数据框，显示了每年的出版物数量。但我只对会议期刊和出版物感兴趣。我想总结一下其他类型中的所有其他类别

数据帧的示例：

year    type                n    
1994    Conference          2    
1994    Journal             3    
1995    Conference         10    
1995    Editorship          3    
1996    Conference         20    
1996    Editorship          2    
1996    Books and Thesis    3

结果是：

year type             n    
1994    Conference    2    
1994    Journal       3    
1995    Conference   10    
1995    Other         3    
1996    Conference   20    
1996    Other         5

使用

dplyr

我们可以

将除“日记”或“会议”以外的任何内容替换为“其他”，然后按年份和类型对它们进行汇总
library(dplyr)
df %>%
  mutate(type = sub("^((Journal|Conference))", "Other", type)) %>%
  group_by(year, type) %>%
  summarise(n = sum(n))


#  year       type     n
#  <int>      <chr> <int>
#1  1994 Conference     2
#2  1994    Journal     3
#3  1995 Conference    10
#4  1995      Other     3
#5  1996 Conference    20
#6  1996      Other     5

库（dplyr）
df%>%
变异（类型=sub（^（（期刊|会议）），“其他”，类型））%>%
分组单位（年份、类型）%>%
总结（n=总和（n））
#年份类型n
#         
#1 1994年会议2
#2 1994年期刊3
#3 1995年会议10
#4 1995其他3
#5 1996年会议20
#6 1996其他5
使用dplyr
我们可以将除“日记”或“会议”以外的任何内容替换为“其他”，然后按年份和类型对其进行汇总
library(dplyr)
df %>%
  mutate(type = sub("^((Journal|Conference))", "Other", type)) %>%
  group_by(year, type) %>%
  summarise(n = sum(n))


#  year       type     n
#  <int>      <chr> <int>
#1  1994 Conference     2
#2  1994    Journal     3
#3  1995 Conference    10
#4  1995      Other     3
#5  1996 Conference    20
#6  1996      Other     5

库（dplyr）
df%>%
变异（类型=sub（^（（期刊|会议）），“其他”，类型））%>%
分组单位（年份、类型）%>%
总结（n=总和（n））
#年份类型n
#         
#1 1994年会议2
#2 1994年期刊3
#3 1995年会议10
#4 1995其他3
#5 1996年会议20
#6 1996其他5
levels（df$type）[levels（df$type）%in%c（“编辑”、“书籍和论文”）]levels（df$type）[levels（df$type）%in%c（“编辑”、“书籍和论文”）]我们可以使用数据。表
library(data.table)
library(stringr)
setDT(df1)[, .(n = sum(n)), .(year, type = str_replace(type, 
       '(Journal|Conference)', 'Other'))]
#   year             type  n
#1: 1994            Other  5
#2: 1995            Other 10
#3: 1995       Editorship  3
#4: 1996            Other 20
#5: 1996       Editorship  2
#6: 1996 Books and Thesis  3

我们可以使用data.table

library(data.table)
library(stringr)
setDT(df1)[, .(n = sum(n)), .(year, type = str_replace(type, 
       '(Journal|Conference)', 'Other'))]
#   year             type  n
#1: 1994            Other  5
#2: 1995            Other 10
#3: 1995       Editorship  3
#4: 1996            Other 20
#5: 1996       Editorship  2
#6: 1996 Books and Thesis  3

可能重复的你没有把其他的加起来，因为还有两个。你只是想将编辑、书籍和论文重命名为其他人吗。或者你想把所有可能的重复都加起来，而不是把其他的加起来，因为还有两个。你只是想将编辑、书籍和论文重命名为其他人吗。还是你想把所有的东西都加起来