Python 从斯坦福语法分析器的上下文无关短语结构输出中提取信息

Python 从斯坦福语法分析器的上下文无关短语结构输出中提取信息,python,nlp,stanford-nlp,Python,Nlp,Stanford Nlp,斯坦福解析器(http://nlp.stanford.edu/software/lex-parser.shtml)给出上下文无关的短语结构树,如下所示。提取树中所有名词短语(NP)和动词短语(NP)的最佳方法是什么?是否有任何Python(或Java)库可以让我阅读这样的结构?多谢各位 (ROOT (S (S (NP (NP (DT The) (JJS strongest) (NN rain)) (VP (ADVP (R

斯坦福解析器(http://nlp.stanford.edu/software/lex-parser.shtml)给出上下文无关的短语结构树,如下所示。提取树中所有名词短语(NP)和动词短语(NP)的最佳方法是什么?是否有任何Python(或Java)库可以让我阅读这样的结构?多谢各位

(ROOT
  (S
    (S
      (NP
        (NP (DT The) (JJS strongest) (NN rain))
        (VP
          (ADVP (RB ever))
          (VBN recorded)
          (PP (IN in)
            (NP (NNP India)))))
      (VP
        (VP (VBD shut)
          (PRT (RP down))
          (NP
            (NP (DT the) (JJ financial) (NN hub))
            (PP (IN of)
              (NP (NNP Mumbai)))))
        (, ,)
        (VP (VBD snapped)
          (NP (NN communication) (NNS lines)))
        (, ,)
        (VP (VBD closed)
          (NP (NNS airports)))
        (CC and)
        (VP (VBD forced)
          (NP
            (NP (NNS thousands))
            (PP (IN of)
              (NP (NNS people))))
          (S
            (VP (TO to)
              (VP
                (VP (VB sleep)
                  (PP (IN in)
                    (NP (PRP$ their) (NNS offices))))
                (CC or)
                (VP (VB walk)
                  (NP (NN home))
                  (PP (IN during)
                    (NP (DT the) (NN night))))))))))
    (, ,)
    (NP (NNS officials))
    (VP (VBD said)
      (NP-TMP (NN today)))
    (. .)))

查看自然语言工具包(NLTK),网址为

该工具包是用Python编写的,并提供了用于精确读取这些类型的树(以及许多其他内容)的代码

或者,您可以编写自己的递归函数来实现这一点。这将非常简单


只是为了好玩:下面是一个超级简单的实现:

def parse():
  itr = iter(filter(lambda x: x, re.split("\\s+", s.replace('(', ' ( ').replace(')', ' ) '))))

  def _parse():
    stuff = []
    for x in itr:
      if x == ')':
        return stuff
      elif x == '(':
        stuff.append(_parse())
      else:
        stuff.append(x)
    return stuff

  return _parse()[0]

def find(parsed, tag):
  if parsed[0] == tag:
    yield parsed
  for x in parsed[1:]:
    for y in find(x, tag):
      yield y

p = parse()
np = find(p, 'NP')
for x in np:
  print x
收益率:

['NP', ['NP', ['DT', 'The'], ['JJS', 'strongest'], ['NN', 'rain']], ['VP', ['ADVP', ['RB', 'ever']], ['VBN', 'recorded'], ['PP', ['IN', 'in'], ['NP', ['NNP', 'India']]]]]
['NP', ['DT', 'The'], ['JJS', 'strongest'], ['NN', 'rain']]
['NP', ['NNP', 'India']]
['NP', ['NP', ['DT', 'the'], ['JJ', 'financial'], ['NN', 'hub']], ['PP', ['IN', 'of' ['NP', ['NNP', 'Mumbai']]]]
['NP', ['DT', 'the'], ['JJ', 'financial'], ['NN', 'hub']]
['NP', ['NNP', 'Mumbai']]
['NP', ['NN', 'communication'], ['NNS', 'lines']]
['NP', ['NNS', 'airports']]
['NP', ['NP', ['NNS', 'thousands']], ['PP', ['IN', 'of'], ['NP', ['NNS', 'people']]]]
['NP', ['NNS', 'thousands']]
['NP', ['NNS', 'people']]
['NP', ['PRP$', 'their'], ['NNS', 'offices']]
['NP', ['NN', 'home']]
['NP', ['DT', 'the'], ['NN', 'night']]
['NP', ['NNS', 'officials']]