R tolower仅在函数内_R_Text_Text Mining_Tolower

R tolower仅在函数内

r text

R tolower仅在函数内,r,text,text-mining,tolower,R,Text,Text Mining,Tolower,我想从字符向量中删除单词。我就是这样做的： library(tm) words = c("the", "The", "Intelligent", "this", "This") words_to_remove = c("the", "This") removeWords(tolower(words), tolower(words_to_remove)) 这真的很好，但我希望“智能”一词能原样返回，意思是“智能”而不是“智能”。是否有可能仅在函数中使用函数tolower words_to_re

我想从字符向量中删除单词。我就是这样做的：

library(tm)
words = c("the", "The", "Intelligent", "this", "This")
words_to_remove = c("the", "This")
removeWords(tolower(words), tolower(words_to_remove))

这真的很好，但我希望“智能”一词能原样返回，意思是“智能”而不是“智能”。

是否有可能仅在函数中使用函数

tolower

words_to_remove = c("the", "This")
pattern <- paste0("\\b", words_to_remove, "\\b", collapse="|")
words = c("the", "The", "Intelligent", "this", "This")

res <- grepl(pattern, words, ignore.case=TRUE)
words[!res]

[1] "Intelligent"

在单个正则表达式求值中，此模式可以确定

单词中的任何字符串是否与要删除的字符串相匹配。
您可以在此处对grepl
使用base R方法：
words_to_remove = c("the", "This")
pattern <- paste0("\\b", words_to_remove, "\\b", collapse="|")
words = c("the", "The", "Intelligent", "this", "This")

res <- grepl(pattern, words, ignore.case=TRUE)
words[!res]

[1] "Intelligent"

在单个正则表达式求值中，此模式可以确定单词中的任何字符串是否与要删除的字符串匹配。
下面是另一个使用base R的%in%
函数的选项：
words = c("the", "The", "Intelligent", "this", "This")
words_to_remove = c("the", "This")

words[!(tolower(words) %in% tolower(words_to_remove))]

%在%中，对于“要删除的单词”列表中包含“单词”的所有情况，返回TRUE。对于要保留的单词，取相反的值。
这里是另一个使用base R的%in%
函数的选项：
words = c("the", "The", "Intelligent", "this", "This")
words_to_remove = c("the", "This")

words[!(tolower(words) %in% tolower(words_to_remove))]

%在%中，对于“要删除的单词”中包含“单词”的所有情况，返回TRUE“名单。取要保留的单词的倒数。
对于我们这些不熟悉它的人来说，removeWords
函数来自哪个包？@TimBiegeleisen它位于tm
中，removeWords
函数来自哪个包，对于我们这些不熟悉它的人来说？@TimBiegeleisen它在tm
中，这与OP的输出结果并不完全一致。我建议gsub（paste0（\\b），words_to_remove，\\b，collapse=“|”），words，ignore.case=TRUE）
谢谢，很好的解决方案，但它给了我一个错误，因为我的word_to_remove列表中有撇号，如“I've”。R表示“无效的正则表达式”。列表中可能有这样的单词吗？单撇号不是正则表达式元字符，不应该导致错误。“我认为您的数据还有其他一些问题。@谢谢您的评论。我将保留我的答案，因为当我看到一个名为remove words的函数时，我希望它会从我的输入中删除它们。对不起，是我的错。我忽略了我前面定义的一个全局变量。这就是问题所在。这与OP的输出结果不完全一致。我建议gsub（paste0（\\b），words_to_remove，\\b，collapse=“|”），words，ignore.case=TRUE）
谢谢，很好的解决方案，但它给了我一个错误，因为我的word_to_remove列表中有撇号，如“I've”。R表示“无效的正则表达式”。列表中可能有这样的单词吗？单撇号不是正则表达式元字符，不应该导致错误。“我认为您的数据还有其他一些问题。@谢谢您的评论。我将保留我的答案，因为当我看到一个名为remove words的函数时，我希望它会从我的输入中删除它们。对不起，是我的错。我忽略了我前面定义的一个全局变量。这就是问题所在。