R:可以从每个句子(行)中提取词组吗?并创建数据帧(或矩阵)?
我为每个单词创建了列表,以便从句子中提取单词,例如R:可以从每个句子(行)中提取词组吗?并创建数据帧(或矩阵)?,r,extract,text-mining,R,Extract,Text Mining,我为每个单词创建了列表,以便从句子中提取单词,例如 hello<- NULL for (i in 1:length(text)){ hello[i]<-as.character(regmatches(text[i], gregexpr("[H|h]ello?", text[i]))) } 在base R中,您可以执行以下操作: regmatches(text,gregexpr(sprintf("\\b(%s)\\b",paste0(words,collapse = "|")),te
hello<- NULL
for (i in 1:length(text)){
hello[i]<-as.character(regmatches(text[i], gregexpr("[H|h]ello?", text[i])))
}
在base R中,您可以执行以下操作:
regmatches(text,gregexpr(sprintf("\\b(%s)\\b",paste0(words,collapse = "|")),text))
[[1]]
[1] "Hello" "you"
[[2]]
[1] "hello" "you"
[[3]]
[1] "so"
[[4]]
[1] "you"
[[5]]
[1] "you" "so"
[[6]]
[1] "you" "you" "egg"
[[7]]
[1] "you" "tea" "egg"
根据您希望结果的方式:
trimws(gsub(sprintf(".*?\\b(%s).*?|.*$",paste0(words,collapse = "|")),"\\1 ",text))
[1] "Hello you" "hello you" "so" "you" "you so" "you you egg"
[7] "you tea egg"
在base R中,您可以执行以下操作:
regmatches(text,gregexpr(sprintf("\\b(%s)\\b",paste0(words,collapse = "|")),text))
[[1]]
[1] "Hello" "you"
[[2]]
[1] "hello" "you"
[[3]]
[1] "so"
[[4]]
[1] "you"
[[5]]
[1] "you" "so"
[[6]]
[1] "you" "you" "egg"
[[7]]
[1] "you" "tea" "egg"
根据您希望结果的方式:
trimws(gsub(sprintf(".*?\\b(%s).*?|.*$",paste0(words,collapse = "|")),"\\1 ",text))
[1] "Hello you" "hello you" "so" "you" "you so" "you you egg"
[7] "you tea egg"
你说你有一长串单词集。这里有一种方法可以将每个单词集转换成正则表达式,将其应用到语料库和句子列表中,并将点击作为字符向量。它不区分大小写,它检查单词的边界,所以你不会把年龄从代理或愤怒中拉出来 给我们
aWset(harvSent , wordsets)
[[1]]
[1] "Oak" "dogs" "" "" "" "" "cheese age" ""
[9] "" ""
[[2]]
[1] "" "" "" "Open" "" "jail" "" "" "" "fire"
[[3]]
[1] "" "" "" "" "product three" "" ""
你说你有一长串单词集。这里有一种方法可以将每个单词集转换成正则表达式,将其应用到语料库和句子列表中,并将点击作为字符向量。它不区分大小写,它检查单词的边界,所以你不会把年龄从代理或愤怒中拉出来 给我们
aWset(harvSent , wordsets)
[[1]]
[1] "Oak" "dogs" "" "" "" "" "cheese age" ""
[9] "" ""
[[2]]
[1] "" "" "" "Open" "" "jail" "" "" "" "fire"
[[3]]
[1] "" "" "" "" "product three" "" ""
str_extract_alltext,str_c'\\b',words,'\\b',collapse=|使用stringr。str_extract_alltext,str_c'\\b',words,'\\b',collapse=|使用stringr。
aWset(harvSent , wordsets)
[[1]]
[1] "Oak" "dogs" "" "" "" "" "cheese age" ""
[9] "" ""
[[2]]
[1] "" "" "" "Open" "" "jail" "" "" "" "fire"
[[3]]
[1] "" "" "" "" "product three" "" ""