R 如何从NLP树中提取元素?
我正在使用NLP软件包来解析句子。如何从创建的R 如何从NLP树中提取元素?,r,nlp,opennlp,R,Nlp,Opennlp,我正在使用NLP软件包来解析句子。如何从创建的树输出中提取元素?例如,我想从下面的例子中抓取名词短语(NP): library(NLP) library(openNLP) s <- c( "Really, I like chocolate because it is good.", "Robots are rather evil and most are devoid of decency" ) s <- as.String(s) sent_token_ann
树
输出中提取元素?例如,我想从下面的例子中抓取名词短语(NP
):
library(NLP)
library(openNLP)
s <- c(
"Really, I like chocolate because it is good.",
"Robots are rather evil and most are devoid of decency"
)
s <- as.String(s)
sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator))
parse_annotator <- Parse_Annotator()
p <- parse_annotator(s, a2)
ptexts <- sapply(p$features, `[[`, "parse")
ptexts
ptrees <- lapply(ptexts, Tree_parse)
ptrees
## [[1]]
## (TOP
## (S
## (S
## (S
## (ADVP (RB Really))
## (, ,)
## (NP (PRP I))
## (VP
## (VBP like)
## (NP (NN chocolate))
## (SBAR (IN because) (S (NP (PRP it)) (VP (VBZ is) (ADJP (JJ good)))))))
## (. .)
## (, ,)
## (NP (NNP Robots))
## (VP (VBP are) (ADJP (RB rather) (JJ evil))))
## (CC and)
## (S (NP (RBS most)) (VP (VBP are) (ADJP (JJ devoid) (PP (IN of) (NP (NN decency))))))))
或者作为列表
而不是向量
这可能需要从以下位置安装openNLPmodels.en:
下载并运行
install.packages(
"http://datacube.wu.ac.at/src/contrib/openNLPmodels.en_1.5-1.tar.gz",
repos=NULL,
type="source"
)
`
如果有帮助的话,人们可以使用my Dropbox中的curl包直接获取树的源代码:
library(curl)
ptrees <- source(curl("https://dl.dropboxusercontent.com/u/61803503/Errors/tree.R"))[[1]]
库(curl)
树
library(curl)
ptrees <- source(curl("https://dl.dropboxusercontent.com/u/61803503/Errors/tree.R"))[[1]]