Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/fsharp/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
R:可以从每个句子(行)中提取词组吗?并创建数据帧(或矩阵)?_R_Extract_Text Mining - Fatal编程技术网

R:可以从每个句子(行)中提取词组吗?并创建数据帧(或矩阵)?

R:可以从每个句子(行)中提取词组吗?并创建数据帧(或矩阵)?,r,extract,text-mining,R,Extract,Text Mining,我为每个单词创建了列表,以便从句子中提取单词,例如 hello<- NULL for (i in 1:length(text)){ hello[i]<-as.character(regmatches(text[i], gregexpr("[H|h]ello?", text[i]))) } 在base R中,您可以执行以下操作: regmatches(text,gregexpr(sprintf("\\b(%s)\\b",paste0(words,collapse = "|")),te

我为每个单词创建了列表,以便从句子中提取单词,例如

hello<- NULL
for (i in 1:length(text)){
hello[i]<-as.character(regmatches(text[i], gregexpr("[H|h]ello?", text[i])))
}

在base R中,您可以执行以下操作:

regmatches(text,gregexpr(sprintf("\\b(%s)\\b",paste0(words,collapse = "|")),text))
[[1]]
[1] "Hello" "you"  

[[2]]
[1] "hello" "you"  

[[3]]
[1] "so"

[[4]]
[1] "you"

[[5]]
[1] "you" "so" 

[[6]]
[1] "you" "you" "egg"

[[7]]
[1] "you" "tea" "egg"
根据您希望结果的方式:

trimws(gsub(sprintf(".*?\\b(%s).*?|.*$",paste0(words,collapse = "|")),"\\1 ",text))
[1] "Hello you"   "hello you"   "so"          "you"         "you so"      "you you egg"
[7] "you tea egg"

在base R中,您可以执行以下操作:

regmatches(text,gregexpr(sprintf("\\b(%s)\\b",paste0(words,collapse = "|")),text))
[[1]]
[1] "Hello" "you"  

[[2]]
[1] "hello" "you"  

[[3]]
[1] "so"

[[4]]
[1] "you"

[[5]]
[1] "you" "so" 

[[6]]
[1] "you" "you" "egg"

[[7]]
[1] "you" "tea" "egg"
根据您希望结果的方式:

trimws(gsub(sprintf(".*?\\b(%s).*?|.*$",paste0(words,collapse = "|")),"\\1 ",text))
[1] "Hello you"   "hello you"   "so"          "you"         "you so"      "you you egg"
[7] "you tea egg"

你说你有一长串单词集。这里有一种方法可以将每个单词集转换成正则表达式,将其应用到语料库和句子列表中,并将点击作为字符向量。它不区分大小写,它检查单词的边界,所以你不会把年龄从代理或愤怒中拉出来

给我们

aWset(harvSent , wordsets)
[[1]]
 [1] "Oak"        "dogs"       ""           ""           ""           ""           "cheese age" ""          
 [9] ""           ""          

[[2]]
 [1] ""     ""     ""     "Open" ""     "jail" ""     ""     ""     "fire"

[[3]]
 [1] ""              ""              ""              ""              "product three" ""              ""             

你说你有一长串单词集。这里有一种方法可以将每个单词集转换成正则表达式,将其应用到语料库和句子列表中,并将点击作为字符向量。它不区分大小写,它检查单词的边界,所以你不会把年龄从代理或愤怒中拉出来

给我们

aWset(harvSent , wordsets)
[[1]]
 [1] "Oak"        "dogs"       ""           ""           ""           ""           "cheese age" ""          
 [9] ""           ""          

[[2]]
 [1] ""     ""     ""     "Open" ""     "jail" ""     ""     ""     "fire"

[[3]]
 [1] ""              ""              ""              ""              "product three" ""              ""             
str_extract_alltext,str_c'\\b',words,'\\b',collapse=|使用stringr。str_extract_alltext,str_c'\\b',words,'\\b',collapse=|使用stringr。
aWset(harvSent , wordsets)
[[1]]
 [1] "Oak"        "dogs"       ""           ""           ""           ""           "cheese age" ""          
 [9] ""           ""          

[[2]]
 [1] ""     ""     ""     "Open" ""     "jail" ""     ""     ""     "fire"

[[3]]
 [1] ""              ""              ""              ""              "product three" ""              ""