如何在python中将nltk树(Stanford)转换为newick格式?

如何在python中将nltk树(Stanford)转换为newick格式?,python,tree,nltk,Python,Tree,Nltk,我有一棵斯坦福树,我想把它转换成newick格式 (ROOT (S (NP (DT A) (NN friend)) (VP (VBZ comes) (NP (NP (JJ early)) (, ,) (NP (NP (NNS others)) (SBAR (WHA

我有一棵斯坦福树,我想把它转换成newick格式

    (ROOT
     (S
        (NP (DT A) (NN friend))
        (VP
         (VBZ comes)
         (NP
           (NP (JJ early))
           (, ,)
           (NP
             (NP (NNS others))
             (SBAR
                (WHADVP (WRB when))
                (S (NP (PRP they)) (VP (VBP have) (NP (NN time))))))))))

可能有一些方法可以通过字符串处理来实现这一点,但我会解析它们并以newick格式递归地打印它们。有点小的实现:

import re

class Tree(object):
    def __init__(self, label):
        self.label = label
        self.children = []

    @staticmethod
    def _tokenize(string):
        return list(reversed(re.findall(r'\(|\)|[^ \n\t()]+', string)))

    @classmethod
    def from_string(cls, string):
        tokens = cls._tokenize(string)
        return cls._tree(tokens)

    @classmethod
    def _tree(cls, tokens):
        t = tokens.pop()
        if t == '(':
            tree = cls(tokens.pop())
            for subtree in cls._trees(tokens):
                tree.children.append(subtree)
            return tree
        else:
            return cls(t)

    @classmethod
    def _trees(cls, tokens):
        while True:
            if not tokens:
                raise StopIteration
            if tokens[-1] == ')':
                tokens.pop()
                raise StopIteration
            yield cls._tree(tokens)

    def to_newick(self):
        if self.children and len(self.children) == 1:
            return ','.join(child.to_newick() for child in self.children)
        elif self.chilren:
            return '(' + ','.join(child.to_newick() for child in self.children) + ')'
        else:
            return self.label
当然,请注意,在转换过程中信息会丢失,因为只保留终端节点。用法:

>>> s = """(ROOT (..."""
>>> Tree.from_string(s).to_newick()
...

我几乎一字不差地从我的中复制了它(如果您正在使用解析树,这可能会很有用),只需将
添加到\u newick
。对不起,我不明白您的意思!!我复制了您的代码,但它不起作用:(我在尝试遍历树“ete2.parser.newick.newickeror:找到空叶节点”时遇到此错误。我想是因为逗号的缘故吧?如果从初始字符串中取出
(,)
,它能起作用吗?没错,问题是标点符号。我会这样做的。谢谢=)