仅保留在向量R中找到的数据帧中的字

仅保留在向量R中找到的数据帧中的字,r,R,我需要从如下数据框中删除所有非英语单词: ID text 1 they all went to the store bonkobuns and bought chicken 2 if we believe no exomunch standards are in order then we're ok 3 living among the calipodians seems reasonable 4 given the state of all

我需要从如下数据框中删除所有非英语单词:

ID     text
1      they all went to the store bonkobuns and bought chicken
2      if we believe no exomunch standards are in order then we're ok
3      living among the calipodians seems reasonable  
4      given the state of all relimited editions we should be fine
我想以这样的数据帧结束:

 ID     text
 1      they all went to the store and bought chicken
 2      if we believe no standards are in order then we're ok
 3      living among the seems reasonable  
 4      given the state of all editions we should be fine
我有一个包含所有英语单词的向量:word\u vec

我可以使用tm软件包从数据帧中删除向量中的所有单词

for(k in 1:nrow(frame){
    for(i in 1:length(word_vec)){
        frame[k,] <- removeWords(frame[i,],word_vec[i])
    }
}
for(k/1:nrow(帧){
用于(1中的i:长度(word_vec)){

frame[k,]我能想到的就是以下过程:

  • 对于向量中的每一行,按空格拆分为向量
    strsplit()
  • 对于新向量中的每个元素,使用
    regexpr()
  • 如果特定位置的值返回为-1(),则删除该位置
  • 连接回字符串并存储在新向量中
  • 如果你走这条路,也许值得考虑函数()的作用:

        which(c('a','b','c','d','e') == 'd')
    [1] 4
    

    下面是一个简单的方法:

    txt <- "Hi this is an example"
    words <- c("this", "is", "an", "example")
    paste(intersect(strsplit(txt, "\\s")[[1]], words), collapse=" ")
    [1] "this is an example"
    

    txt您可以尝试
    gsub

     word_vec <- paste(c('bonkobuns ', 'exomunch ', 'calipodians ', 
              'relimited '), collapse="|")
     gsub(word_vec, '', df1$text)
     #[1] "they all went to the store and bought chicken"        
     #[2] "if we believe no standards are in order then we're ok"
     #[3] "living among the seems reasonable"                    
     #[4] "given the state of all editions we should be fine" 
    

    word\u vec这给我留下了一个空数据框。请注意,原始数据框中的行数不应减少。任何字段中的文本都将缺少任何非英语的内容。您是否尝试使用not运算符反转您的条件?使用removeWords函数?是的,不允许。如果有fu与gsub相反的操作也会起作用。@controlnetic使用
    grep()
    并将word\u vec作为模式的一部分如何?
      word_vec <- c("among", "editions", "bought", "seems", "fine", 
      "state", "in", 
      "then", "reasonable", "ok", "standards", "store", "order", "should", 
      "and", "be", "to", "they", "are", "no", "living", "all", "if", 
      "we're", "went", "of", "given", "the", "chicken", "believe", 
      "we")
    
    
      word_vec2 <-  paste(gsub('^ +| +$', '', gsub(paste(word_vec, 
            collapse="|"), '', df1$text)), collapse= ' |')
      gsub(word_vec2, '', df1$text)
      #[1] "they all went to the store and bought chicken"        
      #[2] "if we believe no standards are in order then we're ok"
      #[3] "living among the seems reasonable"                    
      #[4] "given the state of all  editions we should be fine"