从python树中提取父节点和子节点_Python_Tree_Nltk_Stanford Nlp

从python树中提取父节点和子节点

python tree stanford-nlp

从python树中提取父节点和子节点,python,tree,nltk,stanford-nlp,Python,Tree,Nltk,Stanford Nlp,我正在使用nltk的树数据结构。下面是示例nltk.Tree (S (S (ADVP (RB recently)) (NP (NN someone)) (VP (VBD mentioned) (NP (DT the) (NN word) (NN malaria)) (PP (TO to) (NP (PRP me))))) (, ,) (CC and) (IN so) (S (NP (NP (CD

我正在使用nltk的树数据结构。下面是示例nltk.Tree

(S
  (S
    (ADVP (RB recently))
    (NP (NN someone))
    (VP
      (VBD mentioned)
      (NP (DT the) (NN word) (NN malaria))
      (PP (TO to) (NP (PRP me)))))
  (, ,)
  (CC and)
  (IN so)
  (S
    (NP
      (NP (CD one) (JJ whole) (NN flood))
      (PP (IN of) (NP (NNS memories))))
    (VP (VBD came) (S (VP (VBG pouring) (ADVP (RB back))))))
  (. .))

我不知道nltk.Tree数据结构。我想为每个叶节点提取父节点和超级父节点，例如，对于我想要的“最近”节点（ADVP，RB），对于“某人”节点（NP，NN），这是我想要的最终结果。前面的答案使用了eval（）函数，我想避免这样做

[('ADVP', 'RB'), ('NP', 'NN'), ('VP', 'VBD'), ('NP', 'DT'), ('NP', 'NN'), ('NP', 'NN'), ('PP', 'TO'), ('NP', 'PRP'), ('S', 'CC'), ('S', 'IN'), ('NP', 'CD'), ('NP', 'JJ'), ('NP', 'NN'), ('PP', 'IN'), ('NP', 'NNS'), ('VP', 'VBD'), ('VP', 'VBG'), ('ADVP', 'RB')]

Python代码，无需使用eval函数和nltk树数据结构

sentences = " (S
  (S
(ADVP (RB recently))
(NP (NN someone))
(VP
  (VBD mentioned)
  (NP (DT the) (NN word) (NN malaria))
  (PP (TO to) (NP (PRP me)))))
  (, ,)
  (CC and)
  (IN so)
  (S
    (NP
      (NP (CD one) (JJ whole) (NN flood))
      (PP (IN of) (NP (NNS memories))))
    (VP (VBD came) (S (VP (VBG pouring) (ADVP (RB back))))))
  (. .))"

print list(tails(sentences))


def tails(items, path=()):
for child in items:
    if type(child) is nltk.Tree:
        if child.label() in {".", ","}:  # ignore punctuation
            continue
        for result in tails(child, path + (child.label(),)):
            yield result
    else:
        yield path[-2:]

可能与@leekaiInetsky的代码重复，该代码使用eval（）函数，导致堆栈完全错误。但是，我使用nltk树数据结构解决了这个问题。我将在下面发布我的答案。