如果字符串包含R中的特定文本,则聚合

如果字符串包含R中的特定文本,则聚合,r,sum,aggregate,R,Sum,Aggregate,我已经看过很多关于这个话题的帖子,所以如果这是重复的话,我很抱歉,但是我无法解决我的问题 我有 df <- data.frame(name = c('bike+ride','shoe+store','ride','mountian%20bike','ride+along'), count = c(2,5,8,7,6)) 因此,最终结果如下所示: Group Count bike 9 ride 16 有人能帮忙吗?一个基本的想法 sappl

我已经看过很多关于这个话题的帖子,所以如果这是重复的话,我很抱歉,但是我无法解决我的问题

我有

df <- data.frame(name = c('bike+ride','shoe+store','ride','mountian%20bike','ride+along'),
             count = c(2,5,8,7,6))
因此,最终结果如下所示:

Group   Count
bike      9
ride     16
有人能帮忙吗?

一个基本的想法

sapply(sapply(as.character(group$group), function(i) grep(i, df$name)), function(i) sum(df$count[i]))


#or make it a function

aggr1 <- function(var1, grp, cnt){
  m1 <- sapply(as.character(grp), function(i) grep(i, var1))
  final_d <- sapply(m1, function(i) sum(cnt[i]))
  return(data.frame(Group = names(final_d), 
                    Count = as.integer(final_d), stringsAsFactors = FALSE)
         )
}

aggr1(df$name, group$group, df$count)

#  Group Count
#1  ride    16
#2  bike     9
sappy(sappy(如.character(group$group)、函数(i)grep(i,df$name))、函数(i)sum(df$count[i]))
#或者让它成为一个函数
aggr1一种方式是

do.call(rbind, sapply(group$group, FUN = function(x, df) {
  out <- df[grepl(pattern = x, x = df$name), ]
  data.frame(group = x, count = sum(out$count))
}, df = df, simplify = FALSE))

  group count
1  ride    16
2  bike     9
do.call(rbind,sapply)(group$group,FUN=function(x,df){

使用
tidyverse
packages
purr
dplyr
tidyr
的方法:

library(tidyverse) # for dplyr, purr and tidyr

groups <- c('ride','bike')

map_df(groups, ~setNames(summarize_(df, interp(~sum(df$count[grepl(var, name)], na.rm = TRUE), var = .x)), .x)) %>% 
      gather(group, count, na.rm = TRUE)
library(tidyverse)#用于dplyr、purr和tidyr
组%
聚集(组、计数、na.rm=TRUE)

这在包含额外的
group$group
关卡时似乎相当普通。我同意。修改了包含额外关卡的答案。如果可以的话,一些速度改进?
data.frame(group=group,count=sapply(group$group,FUN=function(x)sum(df[grepl(pattern=x,x=df$name,fixed=TRUE),“count”])
(无需在每次迭代中调用
do.call
或创建
data.frame
,添加
fixed=TRUE
等。)谢谢你的帮助。你知道我将如何处理名称中包含字符的实例吗?例如
+
%20
?在你的示例中,名称中确实包含这些字符,它可以正常工作。
# make a data.frame which locates where each group level is located
grp <- as.data.frame(sapply(group$group, FUN = function(x) grepl(pattern = x, x = df$name)))
names(grp) <- group$group

# based on above location (TRUE/FALSE), sum accordingly
data.frame(count = apply(grp, MARGIN = 2, FUN = function(x, df) {
  sum(df[x, "count"])
}, df = df))

     count
ride    16
bike     9
library(tidyverse) # for dplyr, purr and tidyr

groups <- c('ride','bike')

map_df(groups, ~setNames(summarize_(df, interp(~sum(df$count[grepl(var, name)], na.rm = TRUE), var = .x)), .x)) %>% 
      gather(group, count, na.rm = TRUE)