Parsing 如何使用nltk或spacy从带括号的解析字符串中获取解析NLP树对象？_Parsing_Nlp_Nltk_Stanford Nlp_Spacy

Parsing 如何使用nltk或spacy从带括号的解析字符串中获取解析NLP树对象？

parsing nlp stanford-nlp

Parsing 如何使用nltk或spacy从带括号的解析字符串中获取解析NLP树对象？,parsing,nlp,nltk,stanford-nlp,spacy,Parsing,Nlp,Nltk,Stanford Nlp,Spacy,我有一句话“你可以说他们经常洗澡，这增加了他们的兴奋和生活乐趣。”我无法获得NLP解析树，如以下示例所示： (ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ a

我有一句话“你可以说他们经常洗澡，这增加了他们的兴奋和生活乐趣。”我无法获得NLP解析树，如以下示例所示：

(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))

我想复制这个问题的解决方案，但我有一个字符串句子，而不是NLP树

顺便说一句，我正在使用python 3

使用

树。fromstring（）

方法：

>>> from nltk import Tree
>>> parse = Tree.fromstring('(ROOT (S (NP (PRP You)) (VP (MD could) (VP (VB say) (SBAR (IN that) (S (NP (PRP they)) (ADVP (RB regularly)) (VP (VB catch) (NP (NP (DT a) (NN shower)) (, ,) (SBAR (WHNP (WDT which)) (S (VP (VBZ adds) (PP (TO to) (NP (NP (PRP$ their) (NN exhilaration)) (CC and) (NP (FW joie) (FW de) (FW vivre))))))))))))) (. .)))')

>>> parse
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('PRP', ['You'])]), Tree('VP', [Tree('MD', ['could']), Tree('VP', [Tree('VB', ['say']), Tree('SBAR', [Tree('IN', ['that']), Tree('S', [Tree('NP', [Tree('PRP', ['they'])]), Tree('ADVP', [Tree('RB', ['regularly'])]), Tree('VP', [Tree('VB', ['catch']), Tree('NP', [Tree('NP', [Tree('DT', ['a']), Tree('NN', ['shower'])]), Tree(',', [',']), Tree('SBAR', [Tree('WHNP', [Tree('WDT', ['which'])]), Tree('S', [Tree('VP', [Tree('VBZ', ['adds']), Tree('PP', [Tree('TO', ['to']), Tree('NP', [Tree('NP', [Tree('PRP$', ['their']), Tree('NN', ['exhilaration'])]), Tree('CC', ['and']), Tree('NP', [Tree('FW', ['joie']), Tree('FW', ['de']), Tree('FW', ['vivre'])])])])])])])])])])])])]), Tree('.', ['.'])])])

>>> parse.pretty_print()
                                                       ROOT                                                             
                                                        |                                                                
                                                        S                                                               
  ______________________________________________________|_____________________________________________________________   
 |         VP                                                                                                         | 
 |     ____|___                                                                                                       |  
 |    |        VP                                                                                                     | 
 |    |     ___|____                                                                                                  |  
 |    |    |       SBAR                                                                                               | 
 |    |    |    ____|_______                                                                                          |  
 |    |    |   |            S                                                                                         | 
 |    |    |   |     _______|____________                                                                             |  
 |    |    |   |    |       |            VP                                                                           | 
 |    |    |   |    |       |        ____|______________                                                              |  
 |    |    |   |    |       |       |                   NP                                                            | 
 |    |    |   |    |       |       |         __________|__________                                                   |  
 |    |    |   |    |       |       |        |          |         SBAR                                                | 
 |    |    |   |    |       |       |        |          |      ____|____                                              |  
 |    |    |   |    |       |       |        |          |     |         S                                             | 
 |    |    |   |    |       |       |        |          |     |         |                                             |  
 |    |    |   |    |       |       |        |          |     |         VP                                            | 
 |    |    |   |    |       |       |        |          |     |     ____|____                                         |  
 |    |    |   |    |       |       |        |          |     |    |         PP                                       | 
 |    |    |   |    |       |       |        |          |     |    |     ____|_____________________                   |  
 |    |    |   |    |       |       |        |          |     |    |    |                          NP                 | 
 |    |    |   |    |       |       |        |          |     |    |    |          ________________|________          |  
 NP   |    |   |    NP     ADVP     |        NP         |    WHNP  |    |         NP               |        NP        | 
 |    |    |   |    |       |       |     ___|____      |     |    |    |     ____|_______         |    ____|____     |  
PRP   MD   VB  IN  PRP      RB      VB   DT       NN    ,    WDT  VBZ   TO  PRP$          NN       CC  FW   FW   FW   . 
 |    |    |   |    |       |       |    |        |     |     |    |    |    |            |        |   |    |    |    |  
You could say that they regularly catch  a      shower  ,   which adds  to their     exhilaration and joie  de vivre  .

我将假设有一个很好的理由来解释为什么需要这种格式的依赖项解析树。通过使用CNN（卷积神经网络）生成CFGs（上下文无关语法），这项工作做得很好，它已经准备好生成，而且速度非常快。您可以执行以下操作，亲自查看（然后阅读前面链接中的文档）：

现在，您可以创建一个算法来导航这棵树，并相应地打印（对不起，我找不到一个快速的示例，但是您可以看到索引以及如何遍历解析）。您可以做的另一件事是以某种方式提取CFG，然后使用进行解析，并随后以所需的格式显示。这来自NLTK剧本（修改为使用Python 3）：

但是，您可以看到需要定义CFG（因此，如果您尝试使用原始文本代替示例，您会发现它不理解CFG中未定义的标记）

获得所需格式的最简单方法似乎是使用斯坦福大学的NLP解析器。摘自（对不起，我还没有测试过）：

我没有测试这一点，因为我没有时间安装Stanford解析器，这可能是一个有点麻烦的过程（相对于安装Python模块），也就是说，假设您正在寻找Python解决方案

我希望这能有所帮助，很抱歉，这不是一个直接的答案。

这实际上并不能解决我的问题，我需要知道如何获取您在Tree.fromstring（）方法中指定为参数的值。我有很多串句子，大约70k。我不能手动为每一个指定NLP树。我很困惑=）你的意思是想从字符串中获取解析？还是要将解析后的字符串解析为

nltk.Tree

对象？

StanfordParser

代码无法与最新版本的nltk一起使用，因为它已被弃用。我建议使用

nltk.parse.corenlp.CoreNLPParser

。是的，很好。无论您如何启动并运行

StanfordParser

，它似乎将输出所需的解析树格式。

import spacy

nlp = spacy.load('en')

text = 'You could say that they regularly catch a shower , which adds to their exhilaration and joie de vivre.'

for token in nlp(text):
    print(token.dep_, end='\t')
    print(token.idx, end='\t')
    print(token.text, end='\t')
    print(token.tag_, end='\t')
    print(token.head.text, end='\t')
    print(token.head.tag_, end='\t')
    print(token.head.idx, end='\t')
    print(' '.join([w.text for w in token.subtree]), end='\t')
    print(' '.join([w.text for w in token.children]))

import nltk
from nltk import CFG

grammar = CFG.fromstring("""
  S -> NP VP
  VP -> V NP | V NP PP
  V -> "saw" | "ate"
  NP -> "John" | "Mary" | "Bob" | Det N | Det N PP
  Det -> "a" | "an" | "the" | "my"
  N -> "dog" | "cat" | "cookie" | "park"
  PP -> P NP
  P -> "in" | "on" | "by" | "with"
  """)

text = 'Mary saw Bob'

sent = text.split()
rd_parser = nltk.RecursiveDescentParser(grammar)
for p in rd_parser.parse(sent):
    print(p)
# (S (NP Mary) (VP (V saw) (NP Bob)))

parser = StanfordParser(model_path='edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz')
parsed = parser.raw_parse('Jack payed up to 5% more for each unit')
for line in parsed:
    print(line, end=' ') # This will print all in one line, as desired