R 尝试使用|或运算符创建字符串的值

R 尝试使用|或运算符创建字符串的值,r,text-mining,stringr,R,Text Mining,Stringr,我正试图刮一个网站链接。到目前为止,我下载了文本并将其设置为数据帧。我有以下几点: keywords <- c(credit | model) text_df <- as.data.frame.table(text_df) text_df %>% filter(str_detect(text, keywords)) keywords好的,我已经检查过了,我认为它不会按您的方式工作,因为您必须在filter()中使用or运算符 因此,它将以这种方式工作: keywords

我正试图刮一个网站链接。到目前为止,我下载了文本并将其设置为数据帧。我有以下几点:

keywords <- c(credit | model)

text_df <- as.data.frame.table(text_df)
text_df %>%
  filter(str_detect(text, keywords))

keywords好的,我已经检查过了,我认为它不会按您的方式工作,因为您必须在
filter()中使用or运算符

因此,它将以这种方式工作:

keywords <- c("virg", "tos")

 library(dplyr)
 library(stringr)

 iris %>%
      filter(str_detect(Species, keywords[1]) | str_detect(Species, keywords[2]))
关键字%
过滤器(str_检测(物种,关键字[1])str_检测(物种,关键字[2]))

作为
关键字[1]
等,您必须指定此变量中的每个“关键字”

我认为问题在于您需要将字符串作为参数传递给
stru detect
。要检查“信用”或“模型”,您可以将它们粘贴到单个字符串中,以
分隔。

库(tidyverse)
图书馆(stringr)
text_df Var`1` text
#>                                       
#>这里有一些信用信息
#>2 3此行可能包含关键字模型

我建议在处理单词时不要使用正则表达式。有一些为您的特定任务定制的包,您可以使用。例如,尝试以下方法

library(corpus)
text <- readLines("http://norvig.com/big.txt") # sherlock holmes
terms <- c("watson", "sherlock holmes", "elementary")
text_locate(text, terms)
##    text           before               instance                after             
## 1  1    …Book of The Adventures of  Sherlock Holmes                             
## 2  27     Title: The Adventures of  Sherlock Holmes                             
## 3  40   … EBOOK, THE ADVENTURES OF  SHERLOCK HOLMES  ***                        
## 4  50                               SHERLOCK HOLMES                               
## 5  77                           To  Sherlock Holmes  she is always the woman. I…
## 6  85   …," he remarked. "I think,      Watson      , that you have put on seve…
## 7  89   …t a trifle more, I fancy,      Watson      . And in practice again, I …
## 8  145  …ere's money in this case,      Watson      , if there is nothing else.…
## 9  163  …friend and colleague, Dr.      Watson      , who is occasionally good …
## 10 315  … for you. And good-night,      Watson      ," he added, as the wheels …
## 11 352  …s quite too good to lose,      Watson      . I was just balancing whet…
## 12 422  …as I had pictured it from  Sherlock Holmes ' succinct description, but…
## 13 504         "Good-night, Mister  Sherlock Holmes ."                          
## 14 515  …t it!" he cried, grasping  Sherlock Holmes  by either shoulder and loo…
## 15 553                        "Mr.  Sherlock Holmes , I believe?" said she.     
## 16 559                     "What!"  Sherlock Holmes  staggered back, white with…
## 17 565  …tter was superscribed to " Sherlock Holmes , Esq. To be left till call…
## 18 567                "MY DEAR MR.  SHERLOCK HOLMES ,--You really did it very w…
## 19 569  …est to the celebrated Mr.  Sherlock Holmes . Then I, rather imprudentl…
## 20 571  …s; and I remain, dear Mr.  Sherlock Holmes ,                           
## ⋮  (189 rows total)
库(语料库)

text现在无法检查它,但是
过滤器()
应该可以工作当请求帮助时,您应该提供一个示例输入和所需输出。通常,您需要在data.frames中搜索特定列的值,而不是整行的值,因此这里最好更具体一些。如果有帮助,我已经创建了一个简化的示例。我认为
iris%>%过滤器(str_detect(Species,paste(keywords,collapse=“|”)将获得相同的结果。
感谢您的回复,我运行了这个版本,替换了与我的数据集名称相对应的名称,结果非常好,需要做更多的工作,指定关键字[3]、关键字[4]、关键字[x]等等,但它可以工作。再次感谢!谢谢你的回复!我在我的文本文件上运行了你的代码,它工作了,但是我的文本文件比我放在这里的文本文件要混乱得多,所以我得到了正确的结果,但输出中还有一些额外的噪声。(对不起,这是我的错!)但它仍然有效。@user113156,我不太清楚输出中的额外噪声是什么意思。你可以对搜索更严格一些。例如,
它给我的任何单词中都有正确的关键字行,但我所说的额外噪音的意思是,它也给了我额外的HTML输出(在不同的行中),而这些输出中没有请求的关键字,我不明白为什么。。。
library(corpus)
text <- readLines("http://norvig.com/big.txt") # sherlock holmes
terms <- c("watson", "sherlock holmes", "elementary")
text_locate(text, terms)
##    text           before               instance                after             
## 1  1    …Book of The Adventures of  Sherlock Holmes                             
## 2  27     Title: The Adventures of  Sherlock Holmes                             
## 3  40   … EBOOK, THE ADVENTURES OF  SHERLOCK HOLMES  ***                        
## 4  50                               SHERLOCK HOLMES                               
## 5  77                           To  Sherlock Holmes  she is always the woman. I…
## 6  85   …," he remarked. "I think,      Watson      , that you have put on seve…
## 7  89   …t a trifle more, I fancy,      Watson      . And in practice again, I …
## 8  145  …ere's money in this case,      Watson      , if there is nothing else.…
## 9  163  …friend and colleague, Dr.      Watson      , who is occasionally good …
## 10 315  … for you. And good-night,      Watson      ," he added, as the wheels …
## 11 352  …s quite too good to lose,      Watson      . I was just balancing whet…
## 12 422  …as I had pictured it from  Sherlock Holmes ' succinct description, but…
## 13 504         "Good-night, Mister  Sherlock Holmes ."                          
## 14 515  …t it!" he cried, grasping  Sherlock Holmes  by either shoulder and loo…
## 15 553                        "Mr.  Sherlock Holmes , I believe?" said she.     
## 16 559                     "What!"  Sherlock Holmes  staggered back, white with…
## 17 565  …tter was superscribed to " Sherlock Holmes , Esq. To be left till call…
## 18 567                "MY DEAR MR.  SHERLOCK HOLMES ,--You really did it very w…
## 19 569  …est to the celebrated Mr.  Sherlock Holmes . Then I, rather imprudentl…
## 20 571  …s; and I remain, dear Mr.  Sherlock Holmes ,                           
## ⋮  (189 rows total)
ix <- text_detect(text, terms)
matches <- text_subset(text, terms)