R：可以从每个句子（行）中提取词组吗？并创建数据帧（或矩阵）？_R_Extract_Text Mining

R：可以从每个句子（行）中提取词组吗？并创建数据帧（或矩阵）？

R：可以从每个句子（行）中提取词组吗？并创建数据帧（或矩阵）？,r,extract,text-mining,R,Extract,Text Mining,我为每个单词创建了列表，以便从句子中提取单词，例如 hello<- NULL for (i in 1:length(text)){ hello[i]<-as.character(regmatches(text[i], gregexpr("[H|h]ello?", text[i]))) } 在base R中，您可以执行以下操作： regmatches(text,gregexpr(sprintf("\\b(%s)\\b",paste0(words,collapse = "|")),te

我为每个单词创建了列表，以便从句子中提取单词，例如

hello<- NULL
for (i in 1:length(text)){
hello[i]<-as.character(regmatches(text[i], gregexpr("[H|h]ello?", text[i])))
}

在base R中，您可以执行以下操作：

regmatches(text,gregexpr(sprintf("\\b(%s)\\b",paste0(words,collapse = "|")),text))
[[1]]
[1] "Hello" "you"  

[[2]]
[1] "hello" "you"  

[[3]]
[1] "so"

[[4]]
[1] "you"

[[5]]
[1] "you" "so" 

[[6]]
[1] "you" "you" "egg"

[[7]]
[1] "you" "tea" "egg"

根据您希望结果的方式：

trimws(gsub(sprintf(".*?\\b(%s).*?|.*$",paste0(words,collapse = "|")),"\\1 ",text))
[1] "Hello you"   "hello you"   "so"          "you"         "you so"      "you you egg"
[7] "you tea egg"

在base R中，您可以执行以下操作：

regmatches(text,gregexpr(sprintf("\\b(%s)\\b",paste0(words,collapse = "|")),text))
[[1]]
[1] "Hello" "you"  

[[2]]
[1] "hello" "you"  

[[3]]
[1] "so"

[[4]]
[1] "you"

[[5]]
[1] "you" "so" 

[[6]]
[1] "you" "you" "egg"

[[7]]
[1] "you" "tea" "egg"

根据您希望结果的方式：

trimws(gsub(sprintf(".*?\\b(%s).*?|.*$",paste0(words,collapse = "|")),"\\1 ",text))
[1] "Hello you"   "hello you"   "so"          "you"         "you so"      "you you egg"
[7] "you tea egg"

你说你有一长串单词集。这里有一种方法可以将每个单词集转换成正则表达式，将其应用到语料库和句子列表中，并将点击作为字符向量。它不区分大小写，它检查单词的边界，所以你不会把年龄从代理或愤怒中拉出来

给我们

aWset(harvSent , wordsets)
[[1]]
 [1] "Oak"        "dogs"       ""           ""           ""           ""           "cheese age" ""          
 [9] ""           ""          

[[2]]
 [1] ""     ""     ""     "Open" ""     "jail" ""     ""     ""     "fire"

[[3]]
 [1] ""              ""              ""              ""              "product three" ""              ""

给我们

aWset(harvSent , wordsets)
[[1]]
 [1] "Oak"        "dogs"       ""           ""           ""           ""           "cheese age" ""          
 [9] ""           ""          

[[2]]
 [1] ""     ""     ""     "Open" ""     "jail" ""     ""     ""     "fire"

[[3]]
 [1] ""              ""              ""              ""              "product three" ""              ""

str_extract_alltext，str_c'\\b'，words，'\\b'，collapse=|使用stringr。str_extract_alltext，str_c'\\b'，words，'\\b'，collapse=|使用stringr。

aWset(harvSent , wordsets)
[[1]]
 [1] "Oak"        "dogs"       ""           ""           ""           ""           "cheese age" ""          
 [9] ""           ""          

[[2]]
 [1] ""     ""     ""     "Open" ""     "jail" ""     ""     ""     "fire"

[[3]]
 [1] ""              ""              ""              ""              "product three" ""              ""