R 在不丢失语料库结构的情况下循环使用tm语料库

R 在不丢失语料库结构的情况下循环使用tm语料库,r,for-loop,tm,R,For Loop,Tm,我有一个tm文档库和一个单词列表。我想在语料库上运行一个for循环,这样循环就可以按顺序从语料库中删除列表中的每个单词 某些复制数据: library(tm) m <- cbind(c("Apple blue two","Pear yellow five","Banana yellow two"), c(1, 2, 3)) tm_corpus <- Corpus(VectorSource(m[,1])) words <- as.list(c("Appl

我有一个tm文档库和一个单词列表。我想在语料库上运行一个
for
循环,这样循环就可以按顺序从语料库中删除列表中的每个单词

某些复制数据:

library(tm)
m <- cbind(c("Apple blue two","Pear yellow five","Banana yellow two"),
             c(1, 2, 3))
tm_corpus <- Corpus(VectorSource(m[,1]))
words <- as.list(c("Apple", "yellow", "two"))
单词
是由3个单词组成的列表:

[[1]]
[1] "Apple"

[[2]]
[1] "yellow"

[[3]]
[1] "two"
我试过三种不同的循环。第一个是:

tm_corpusClean <- tm_corpus
for (i in seq_along(tm_corpusClean)) {
  for (u in seq_along(words)) {
    tm_corpusClean[i] <- tm_map(tm_corpusClean[i], removeWords, words[[u]])
  }
}
最后一个循环是:

tm_corpusClean <- tm_corpus
for (i in seq_along(words)) {
  tm_corpusClean <- tm_map(tm_corpusClean, removeWords, words[[i]])
}

我哪里出错了?

在我们开始顺序删除之前,请测试
tm\u map
是否适用于您的示例:

obj1 <- tm_map(tm_corpus, removeWords, unlist(words))
sapply(obj1, `[`, "content")

$`1.content`
[1] " blue "

$`2.content`
[1] "Pear  five"

$`3.content`
[1] "Banana  "

请注意,生成的语料库位于嵌套列表中(使用两个sappy查看内容的原因)。

Hi Adam,感谢您的回答。您的代码可以工作,但给我的是NA,而不是您在此处显示的输出:
obj1
obj1[[1]]$content
返回
[1]“blue”
,因此NA只在运行
sapply(obj1,
[
,“content”)
后出现,并给出
[1]NA-NA-NA
。但它似乎对语料库本身起了作用。:)这很奇怪。
`[`
应该与
$
等效
tm_corpusClean <- tm_corpus
for (i in seq_along(words)) {
  for (u in seq_along(tm_corpusClean)) {
    tm_corpusClean[u] <- tm_map(tm_corpusClean[u], removeWords, words[[i]])
  }
}
Error in x$dmeta[i, , drop = FALSE] : incorrect number of dimensions
tm_corpusClean <- tm_corpus
for (i in seq_along(words)) {
  tm_corpusClean <- tm_map(tm_corpusClean, removeWords, words[[i]])
}
inspect(tm_corpusClean[[1]])

<<PlainTextDocument>>
Metadata:  7
Content:  chars: 6

 blue 
obj1 <- tm_map(tm_corpus, removeWords, unlist(words))
sapply(obj1, `[`, "content")

$`1.content`
[1] " blue "

$`2.content`
[1] "Pear  five"

$`3.content`
[1] "Banana  "
obj2 <- lapply(words, function(word) tm_map(tm_corpus, removeWords, word))
sapply(obj2, function(x) sapply(x, `[`, "content"))

          [,1]                [,2]             [,3]              
1.content " blue two"         "Apple blue two" "Apple blue "     
2.content "Pear yellow five"  "Pear  five"     "Pear yellow five"
3.content "Banana yellow two" "Banana  two"    "Banana yellow "