R子集/保留至少包含两个特定文本字符串的所有行

R子集/保留至少包含两个特定文本字符串的所有行,r,text,tidyr,grepl,R,Text,Tidyr,Grepl,我有一个包含不同文本摘录的数据框 我希望对所有观察结果进行子集划分,这些观察结果至少包含我的小词典(“贫困|报告|令人担忧|通货膨胀”)中的两个术语(如报告在文本中出现两次) 这是否有效: > library(stringr) > library(dplyr) > texts %>% filter(str_count(text, pattern = "poverty|report|alarming|inflation") > 1)

我有一个包含不同文本摘录的数据框

我希望对所有观察结果进行子集划分,这些观察结果至少包含我的小词典(“贫困|报告|令人担忧|通货膨胀”)中的两个术语(如报告在文本中出现两次)

这是否有效:

> library(stringr)
> library(dplyr)
> texts %>% filter(str_count(text, pattern = "poverty|report|alarming|inflation") > 1)
                                          text id group
1 report highlights that poverty is widespread  1     4
2                             alarming reports  3     6
> 
这是否有效:

> library(stringr)
> library(dplyr)
> texts %>% filter(str_count(text, pattern = "poverty|report|alarming|inflation") > 1)
                                          text id group
1 report highlights that poverty is widespread  1     4
2                             alarming reports  3     6
> 

尝试这种
base R
方法:

#Data
texts <- data.frame(text = c("report highlights that poverty is widespread", "there is inflation", "alarming reports", "thanks for listening"), id = 1:4, group = 4:7,stringsAsFactors = F)
#Index
Index <- apply(texts[,1,drop=F],1,function(x)sum(grepl("poverty|report|alarming|inflation",
                                                       unlist(strsplit(x,split =' ')),
                                                       ignore.case=T)))
#Subset
texts[which(Index>=2),]

尝试这种
base R
方法:

#Data
texts <- data.frame(text = c("report highlights that poverty is widespread", "there is inflation", "alarming reports", "thanks for listening"), id = 1:4, group = 4:7,stringsAsFactors = F)
#Index
Index <- apply(texts[,1,drop=F],1,function(x)sum(grepl("poverty|report|alarming|inflation",
                                                       unlist(strsplit(x,split =' ')),
                                                       ignore.case=T)))
#Subset
texts[which(Index>=2),]
                                          text id group
1 report highlights that poverty is widespread  1     4
3                             alarming reports  3     6