基于grep的自由文本分类

基于grep的自由文本分类,r,R,我是个新手。我需要将自由文本(客户反馈)分类为给定的固定数量的类别。我正在尝试运行一个小代码来测试逻辑 a<-c("a","b","c","d","e") # Category a - if the free text contains any of "a","b","c","d" or "e" b<-c("f","g","h","i","j") # Category b - if the free text contains any of "f","g","h","i" or "j

我是个新手。我需要将自由文本(客户反馈)分类为给定的固定数量的类别。我正在尝试运行一个小代码来测试逻辑

a<-c("a","b","c","d","e") # Category a - if the free text contains any of "a","b","c","d" or "e"
b<-c("f","g","h","i","j") # Category b - if the free text contains any of "f","g","h","i" or "j"
check<-c("a","g","d","j") # Free text to be categorized. "a" should be categorized as a; "g" as b; "d" as a and
                          # "j" as b
count<-length(check)
output<-vector(mode="list",length = count) # Empty categorized list - targeted output is (a,b,a,b)
for (i in 1:count) {
 output[i]<-ifelse(grepl(a,check[i]),"a",ifelse(grepl(b,check[i]),"b","other"))
}
输出结果为(a,其他,其他,其他)

要么grepl不是正确的函数,要么有一种方法可以使用向量模式。
请求您的帮助和指导。

grepl
抱怨,因为“模式”(第一个参数)包含多个模式而不是一个。解决此问题的一种方法是将条件折叠成一个正则表达式(
|
表示“或”),例如:

a<-c("a","b","c","d","e") # Category a - if the free text contains any of "a","b","c","d" or "e"
b<-c("f","g","h","i","j") # Category b - if the free text contains any of "f","g","h","i" or "j"
check<-c("a","g","d","j") # Free text to be categorized. "a" should be categorized as a; "g" as b; "d" as a and
# "j" as b

# collapse regular expression
a <- paste(a, collapse = "|")
b <- paste(b, collapse = "|")

count<-length(check)
output<-vector(mode="list",length = count) # Empty categorized list - targeted output is (a,b,a,b)
for (i in 1:count) {
  output[i]<-ifelse(grepl(a,check[i]),"a",ifelse(grepl(b,check[i]),"b","other"))
}

output

这并不能回答你的问题,但是。。。看来你可能正在重新发明轮子。我建议你退房

  • tm
    包(用于文本挖掘。)
  • R、Silge和Robinson的文本挖掘,摘自O'Reilly,esp第2章,整洁数据的情绪分析

非常感谢。制动辅助系统。事后看来,这总是那么容易非常感谢,大卫。我曾经遇到过,但作为一个新手,没能得到太多。这肯定会有用的,因为我会继续。非常感谢。
a<-c("a","b","c","d","e") # Category a - if the free text contains any of "a","b","c","d" or "e"
b<-c("f","g","h","i","j") # Category b - if the free text contains any of "f","g","h","i" or "j"
check<-c("a","g","d","j") # Free text to be categorized. "a" should be categorized as a; "g" as b; "d" as a and
# "j" as b

# collapse regular expression
a <- paste(a, collapse = "|")
b <- paste(b, collapse = "|")

count<-length(check)
output<-vector(mode="list",length = count) # Empty categorized list - targeted output is (a,b,a,b)
for (i in 1:count) {
  output[i]<-ifelse(grepl(a,check[i]),"a",ifelse(grepl(b,check[i]),"b","other"))
}

output
[[1]]
[1] "a"

[[2]]
[1] "b"

[[3]]
[1] "a"

[[4]]
[1] "b"