R 如何在语料库中搜索单词？_R_Tm_Corpus

R 如何在语料库中搜索单词？

R 如何在语料库中搜索单词？,r,tm,corpus,R,Tm,Corpus,假设我有一个数据框，它有两列：“问题号”和“问题文本” “问题号”只是从1到长度（数据$question\u no），“问题号文本”有问题。我想对有“按顺序”和“总结”两个词的问题进行分类。到目前为止，我已经提出了以下几行代码： questions<-Corpus(VectorSouce(data$question_text)) questions<-tm_map(questions,tolower) questions<-tm_map(questions,stripWhi

假设我有一个数据框，它有两列：“问题号”和“问题文本” “问题号”只是从1到

长度（数据$question\u no）

，“问题号文本”有问题。我想对有“按顺序”和“总结”两个词的问题进行分类。到目前为止，我已经提出了以下几行代码：

questions<-Corpus(VectorSouce(data$question_text))
questions<-tm_map(questions,tolower)
questions<-tm_map(questions,stripWhiteSpace)
spesificQuestion<- ifelse(Corpus=="in order"|Corpus=="summarize",pquestions, others=

使用此数据框的

问题：
   df <- data.frame(
   question_no = c(1:6),
   question_text = c("put these words in order","summarize the  paper","nonsense",
   "summarize the story", "put something in order", "nonsense")
   )

    question_no            question_text
       1             put these words in order
       2             summarize the paper
       3             nonsense
       4             summarize the story
       5             put something in order
       6             nonsense

这就产生了
  question_no            question_text         condition_met
       1         put these words in order           Yes
       2         summarize the paper                Yes
       3         nonsense                           No
       4         summarize the story                Yes
       5         put something in order             Yes
       6         nonsense                           No

stringr:：str_detect
创建一个等于第一个参数长度的逻辑向量。它搜索原始向量中的每个元素，查看它是否包含所需的字符串。请注意，我正在检查单词“summary”和单词“order”，以避免匹配“unsummary”之类的内容。如果这对您没有关系，您可以使用If-else
将匹配字符串转换为“*”摘要。*|。*.*顺序。*”
允许您将TRUE
和FALSE
转换为您想要的内容。在这种情况下，我做了“是”和“否”
dplyr:：mutate
创建一个名为which的新列。保留TRUE和FALSE的值将允许您查看有多少或多大比例的条目包含您感兴趣的字符串。如果这是您想要的，那么去掉If_else
参数，即
     mutate (df, condition_met = str_detect(df$question_text,"\\bsummarize\\b|\\bin order\\b"))

也许可以查看grep
？除了“摘要”和“顺序”之外，question\u text条目是否可以包含单词，即您是否只查找全部或部分匹配？您想创建一个新的列来指定是否满足您的条件吗？例如：“‘总结’第一段的第二段。”假设我有这些类型的问题（或说明），我想定义其中是否有‘总结’或‘顺序’。
     mutate (df, condition_met = str_detect(df$question_text,"\\bsummarize\\b|\\bin order\\b"))