R 如何从NLP树中提取元素?

R 如何从NLP树中提取元素?,r,nlp,opennlp,R,Nlp,Opennlp,我正在使用NLP软件包来解析句子。如何从创建的树输出中提取元素?例如,我想从下面的例子中抓取名词短语(NP): library(NLP) library(openNLP) s <- c( "Really, I like chocolate because it is good.", "Robots are rather evil and most are devoid of decency" ) s <- as.String(s) sent_token_ann

我正在使用NLP软件包来解析句子。如何从创建的
输出中提取元素?例如,我想从下面的例子中抓取名词短语(
NP
):

library(NLP)
library(openNLP)

s <- c(
    "Really, I like chocolate because it is good.", 
    "Robots are rather evil and most are devoid of decency"
)
s <- as.String(s)


sent_token_annotator <- Maxent_Sent_Token_Annotator()
word_token_annotator <- Maxent_Word_Token_Annotator()
a2 <- annotate(s, list(sent_token_annotator, word_token_annotator))

parse_annotator <- Parse_Annotator()

p <- parse_annotator(s, a2)
ptexts <- sapply(p$features, `[[`, "parse")
ptexts

ptrees <- lapply(ptexts, Tree_parse)

ptrees

## [[1]]
## (TOP
##   (S
##     (S
##       (S
##         (ADVP (RB Really))
##         (, ,)
##         (NP (PRP I))
##         (VP
##           (VBP like)
##           (NP (NN chocolate))
##           (SBAR (IN because) (S (NP (PRP it)) (VP (VBZ is) (ADJP (JJ good)))))))
##       (. .)
##       (, ,)
##       (NP (NNP Robots))
##       (VP (VBP are) (ADJP (RB rather) (JJ evil))))
##     (CC and)
##     (S (NP (RBS most)) (VP (VBP are) (ADJP (JJ devoid) (PP (IN of) (NP (NN decency))))))))
或者作为
列表
而不是向量

这可能需要从以下位置安装openNLPmodels.en:

下载并运行

install.packages(
    "http://datacube.wu.ac.at/src/contrib/openNLPmodels.en_1.5-1.tar.gz",  
    repos=NULL, 
    type="source"
)
              `
如果有帮助的话,人们可以使用my Dropbox中的curl包直接获取
树的源代码:

library(curl)
ptrees <- source(curl("https://dl.dropboxusercontent.com/u/61803503/Errors/tree.R"))[[1]]
库(curl)
树
library(curl)
ptrees <- source(curl("https://dl.dropboxusercontent.com/u/61803503/Errors/tree.R"))[[1]]