需要在R#中计算列中预先指定的单词_R_Dplyr

需要在R#中计算列中预先指定的单词

需要在R#中计算列中预先指定的单词,r,dplyr,R,Dplyr,我需要计算以下单词/短语在一列中出现的次数：家大满贯得分以下是行形式的输入： [1] "Ian Desmond hits an inside-the-park home run (8) on a line drive down the right-field line. Brendan Rodgers scores. Tony Wolters scores." [2] "Ian Desmond lines out sharply to center fiel

我需要计算以下单词/短语在一列中出现的次数：

家

大满贯

得分

以下是行形式的输入：

[1] "Ian Desmond hits an inside-the-park home run (8) on a line drive down the right-field line. Brendan Rodgers scores. Tony Wolters scores."
[2] "Ian Desmond lines out sharply to center fielder Jason Heyward."                                                                          
[3] "Ian Desmond hits a grand slam (9) to right center field. Charlie Blackmon scores. Trevor Story scores. David Dahl scores."               
[4] "Ian Desmond homers (12) on a fly ball to center field. Daniel Murphy scores."

所需输出 我需要的主要输出是找到多少匹配项的计数。例如，在输入行中有九个匹配项

我试图使用的代码

text <- c("Ian Desmond hits an inside-the-park home run (8) on a line drive down the right-field line. Brendan Rodgers scores. Tony Wolters scores." , "Ian Desmond lines out sharply to center fielder Jason Heyward.", "Ian Desmond hits a grand slam (9) to right center field. Charlie Blackmon scores. Trevor Story scores. David Dahl scores.", "Ian Desmond homers (12) on a fly ball to center field. Daniel Murphy scores.")

df <- data.frame(text, stringsAsFactors=FALSE)
df %>%
  filter(str_detect(text, "scores|grand slam|home")) %>%
  count()

text%
计数（）

我已经回顾了stackoverflow提供的“解决方案”，但找不到一个符合我需要的

我想计算文本向量中所有行中出现的“分数”、“大满贯”和“主场”的次数

我更喜欢dplyr解决方案；然而，我对其他方式持开放态度

至于结果，我只想数一数。在提供的输入中，待计数的单词\短语出现九次

我们可以使用
str\u extract

library(dplyr) library(stringr) library(purrr) map_dfc(c("score", "grand slam", "home"), ~ lengths(str_extract_all(df$text, .x))) %>% set_names(c("score", "grand slam", "home"))%>% mutate(Total = sum(score + `grand slam` + home)) # A tibble: 4 x 4 # score `grand slam` home Total # <int> <int> <int> <int> #1 2 0 1 9 #2 0 0 0 9 #3 3 1 0 9 #4 1 0 1 9

或使用
base R

sum(lengths(Filter(function(x) all(x > 1), gregexpr("score|grand slam|home", df$text)))) #[1] 9

我们可以使用
str\u extract

library(dplyr) library(stringr) library(purrr) map_dfc(c("score", "grand slam", "home"), ~ lengths(str_extract_all(df$text, .x))) %>% set_names(c("score", "grand slam", "home"))%>% mutate(Total = sum(score + `grand slam` + home)) # A tibble: 4 x 4 # score `grand slam` home Total # <int> <int> <int> <int> #1 2 0 1 9 #2 0 0 0 9 #3 3 1 0 9 #4 1 0 1 9

或使用
base R

sum(lengths(Filter(function(x) all(x > 1), gregexpr("score|grand slam|home", df$text)))) #[1] 9
您是否将
“荷马”
计算为与
“家”
匹配
通过将单词粘贴为一个模式，可以使用
stru count

library(stringr) words <- c('home', 'grand slam', 'scores') str_count(df$text, str_c(words, collapse = '|')) #[1] 3 0 4 2

如果您希望以这样的方式编写模式，即
“homers”
与
“home”
不匹配，则可以在模式周围使用单词边界（
\\b
）
谁的
总和将使你计算为8。你是否将“homer” 计算为与“home” 匹配通过将单词粘贴为一个模式，可以使用stru count library(stringr) words <- c('home', 'grand slam', 'scores') str_count(df$text, str_c(words, collapse = '|')) #[1] 3 0 4 2 如果您希望以这样的方式编写模式，即“homers” 与“home” 不匹配，则可以在模式周围使用单词边界（\\b ）其sum 将计算为8。这是一个使用str\u extract\u all 从包stringr 中提取的一行解决方案： length(unlist(str_extract_all(x, paste0(c('home','grand slam','scores'), collapse = '|')))) [1] 9 数据： x这是一个使用str\u extract\u all 从软件包stringr 的一行解决方案： length(unlist(str_extract_all(x, paste0(c('home','grand slam','scores'), collapse = '|')))) [1] 9 数据： x我还需要显示匹配的总数（9）。在实际的数据集中有1800多行。在下面的代码“如果它是总计数”中，波浪号和.x 做了什么？@Metsfan它在向量上循环（假设有>100个字），那么~ 用于匿名函数调用函数（x） @Metsfan您能否用预期的输出更新您的帖子，因为我还不清楚您想要的格式。我还需要显示的匹配总数（9）。在实际的数据集中有1800多行。在下面的代码“如果它是总计数”中，波浪号和.x 做了什么？@Metsfan它在向量上循环（假设有>100个字），那么~ 用于匿名函数调用函数（x） @Metsfan你能用预期的输出更新你的帖子吗，因为我还不清楚你想要的格式是的，我把“本垒打”算为“本垒打”的匹配项。然后第一部分就可以了。你不需要词的界限。是的，我把“本垒打”算成“家”的一个匹配词。然后第一部分做你想做的。你不需要字里行间。