Python 对解析树的结构进行编码_Python_Lstm_Sentiment Analysis_Recurrent Neural Network_Parse Tree

Python 对解析树的结构进行编码

python

Python 对解析树的结构进行编码,python,lstm,sentiment-analysis,recurrent-neural-network,parse-tree,Python,Lstm,Sentiment Analysis,Recurrent Neural Network,Parse Tree,我正在研究数据集，我试图理解这两个文件STree.txt和SOStr.txt，它们对每个句子的三个语法进行编码例如，我如何解码这个解析三 Effective|but|too-tepid|biopic 6|6|5|5|7|7|0 自述文件中说： SOStr.txt和STree.txt对解析树的结构进行编码。STree以父指针格式对树进行编码。每行对应于DataSetSequences.txt文件中的每个句子是否有解析器将句子转换成这种格式？我怎样才能破解这个解析三 Effectiv

我正在研究数据集，我试图理解这两个文件STree.txt和SOStr.txt，它们对每个句子的三个语法进行编码

例如，我如何解码这个解析三

 Effective|but|too-tepid|biopic

 6|6|5|5|7|7|0

自述文件中说：

SOStr.txt和STree.txt对解析树的结构进行编码。STree以父指针格式对树进行编码。每行对应于DataSetSequences.txt文件中的每个句子

是否有解析器将句子转换成这种格式？我怎样才能破解这个解析三

 Effective|but|too-tepid|biopic

 6|6|5|5|7|7|0

我用这个python脚本打印上一句的选区树：

with open( 'parents.txt') as parentsfile,\ open( 'sents.txt') as toksfile: parents=[] toks =[] const_trees =[] for line in parentsfile: parents.append(map(int, line.split())) for line in toksfile: toks.append(line.strip().split()) for i in xrange(len(toks)): const_trees.append(load_constituency_tree(parents[i], toks[i])) #print (const_trees[i].left.word) attrs = vars(const_trees[i]) print ', '.join("%s: %s" % item for item in attrs.items()) attrs = vars(const_trees[i].right) print ', '.join("%s: %s" % item for item in attrs.items()) attrs = vars(const_trees[i].left) print ', '.join("%s: %s" % item for item in attrs.items()) attrs = vars(const_trees[i].right.right) print ', '.join("%s: %s" % item for item in attrs.items()) attrs = vars(const_trees[i].right.left) print ', '.join("%s: %s" % item for item in attrs.items()) attrs = vars(const_trees[i].left.left) print ', '.join("%s: %s" % item for item in attrs.items()) attrs = vars(const_trees[i].left.right) print ', '.join("%s: %s" % item for item in attrs.items()) break
我意识到第一句话的树如下：

6 | +-------------+------------+ | | 5 4 +---------+---------+ +---------+---------+ | | | | Effective but too-tepid biopic
如本文所述，非终端是词组类型，但在树的这个表示中，这些是索引，可能是词组类型字典的索引，我的问题是这本字典在哪里？我如何在一组短语中转换这个int
我的解决方案： 我不确定这是否是解决方案，但我将此函数用于转换到响应父指针列表中：

# given the array returned by ptree.trepositions('postorder') of the nltk library i.e # an array of tuple like this: # [(0, 0), (0,), (1, 0, 0), (1, 0), (1, 1, 0), (1, 1, 1), (1, 1), (1,), ()] # that describe the structure of a tree where each index of the array is the index of a node in the tree in a postorder fashion # return a list of parents for each node i.e [2, 9, 4, 8, 7, 7, 8, 9, 0] where 0 means that is the root. # the previous array describe the structure for this tree # S # ___________|___ # | VP # | _________|___ # NP V NP # | | ___|____ # I enjoyed my cookie def make_parents_list(treepositions): parents = [] for i in range(0,len(treepositions)): if len(treepositions[i])==0: parent = 0 parents.append(parent) if len(treepositions[i])>0: parent_s = [j+1 for j in range(0,len(treepositions)) if ((j > i) and (len(treepositions[j]) == (len(treepositions[i])-1))) ] #print parent_s[0] parents.append(parent_s[0]) return parents