基于grep的自由文本分类_R - Fatal编程技术网

基于grep的自由文本分类

基于grep的自由文本分类,r,R,我是个新手。我需要将自由文本（客户反馈）分类为给定的固定数量的类别。我正在尝试运行一个小代码来测试逻辑 a<-c("a","b","c","d","e") # Category a - if the free text contains any of "a","b","c","d" or "e" b<-c("f","g","h","i","j") # Category b - if the free text contains any of "f","g","h","i" or "j

我是个新手。我需要将自由文本（客户反馈）分类为给定的固定数量的类别。我正在尝试运行一个小代码来测试逻辑

a<-c("a","b","c","d","e") # Category a - if the free text contains any of "a","b","c","d" or "e"
b<-c("f","g","h","i","j") # Category b - if the free text contains any of "f","g","h","i" or "j"
check<-c("a","g","d","j") # Free text to be categorized. "a" should be categorized as a; "g" as b; "d" as a and
                          # "j" as b
count<-length(check)
output<-vector(mode="list",length = count) # Empty categorized list - targeted output is (a,b,a,b)
for (i in 1:count) {
 output[i]<-ifelse(grepl(a,check[i]),"a",ifelse(grepl(b,check[i]),"b","other"))
}

输出结果为（a，其他，其他，其他）

要么grepl不是正确的函数，要么有一种方法可以使用向量模式。

请求您的帮助和指导。

grepl

抱怨，因为“模式”（第一个参数）包含多个模式而不是一个。解决此问题的一种方法是将条件折叠成一个正则表达式（

表示“或”），例如：

a<-c("a","b","c","d","e") # Category a - if the free text contains any of "a","b","c","d" or "e"
b<-c("f","g","h","i","j") # Category b - if the free text contains any of "f","g","h","i" or "j"
check<-c("a","g","d","j") # Free text to be categorized. "a" should be categorized as a; "g" as b; "d" as a and
# "j" as b

# collapse regular expression
a <- paste(a, collapse = "|")
b <- paste(b, collapse = "|")

count<-length(check)
output<-vector(mode="list",length = count) # Empty categorized list - targeted output is (a,b,a,b)
for (i in 1:count) {
  output[i]<-ifelse(grepl(a,check[i]),"a",ifelse(grepl(b,check[i]),"b","other"))
}

output

这并不能回答你的问题，但是。。。看来你可能正在重新发明轮子。我建议你退房

```
tm
```
包（用于文本挖掘。）
R、Silge和Robinson的文本挖掘，摘自O'Reilly，esp第2章，整洁数据的情绪分析

非常感谢。制动辅助系统。事后看来，这总是那么容易非常感谢，大卫。我曾经遇到过，但作为一个新手，没能得到太多。这肯定会有用的，因为我会继续。非常感谢。

a<-c("a","b","c","d","e") # Category a - if the free text contains any of "a","b","c","d" or "e"
b<-c("f","g","h","i","j") # Category b - if the free text contains any of "f","g","h","i" or "j"
check<-c("a","g","d","j") # Free text to be categorized. "a" should be categorized as a; "g" as b; "d" as a and
# "j" as b

# collapse regular expression
a <- paste(a, collapse = "|")
b <- paste(b, collapse = "|")

count<-length(check)
output<-vector(mode="list",length = count) # Empty categorized list - targeted output is (a,b,a,b)
for (i in 1:count) {
  output[i]<-ifelse(grepl(a,check[i]),"a",ifelse(grepl(b,check[i]),"b","other"))
}

output

[[1]]
[1] "a"

[[2]]
[1] "b"

[[3]]
[1] "a"

[[4]]
[1] "b"