Python NLTK和stanford语法中名词短语的词头查找规则_Python_Algorithm_Tree_Nltk_Stanford Nlp

Python NLTK和stanford语法中名词短语的词头查找规则

python algorithm tree stanford-nlp

Python NLTK和stanford语法中名词短语的词头查找规则,python,algorithm,tree,nltk,stanford-nlp,Python,Algorithm,Tree,Nltk,Stanford Nlp,一般来说，名词短语的开头是名词，它位于NP的最右边，如下图所示，树是父NP的开头。所以 ROOT | S ___|________________________ NP

一般来说，名词短语的开头是名词，它位于NP的最右边，如下图所示，树是父NP的开头。所以

ROOT | S ___|________________________ NP | ___|_____________ | | PP VP | ____|____ ____|___ NP | NP | PRT ___|_______ | | | | DT JJ NN NN IN NNP VBD RP | | | | | | | | The old oak tree from India fell down 上述代码给出了输出：

名词短语：['The'，'old'，'oak'，'tree'，'from'，'India'] NPhead：印度名词短语：['The'，'old'，'oak'，'tree'] NPhead:树 NP:[“印度”] NPhead：印度

虽然现在它为给定的句子提供了正确的输出，但我需要合并一个条件，即只提取最右边的名词作为head，目前它不检查它是否是名词（NN）

在上面代码中的np head条件中，如下所示：

t.leaves().getrightmostnoun()

包括Penn Treebank的头查找规则，因此不必只有最右边的名词是头。因此，上述条件应包含此类情况

对于其中一个答案中给出的以下示例：

发表演讲的人回家了

主语的头名词是person，但NP的最后一个离开节点——发表演讲的人是talk。

NLTK（）中有内置字符串to

Tree

object，请参阅

注意，最右边的名词并不总是NP的头名词，例如

>>> s = '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
>>> Tree.fromstring(s)
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NN', ['Carnac']), Tree('DT', ['the']), Tree('NN', ['Magnificent'])]), Tree('VP', [Tree('VBD', ['gave']), Tree('NP', [Tree('', [Tree('DT', ['a']), Tree('NN', ['talk'])])])])])])
>>> for i in Tree.fromstring(s).subtrees():
...     if i.label() == 'NP':
...             print i.leaves()[-1]
... 
Magnificent
talk

可以说，

magnific

仍然可以是头名词。另一个例子是NP包含关系从句：

发表演讲的人回家了

主语的头名词是

person

，但NP

的最后一个离开节点是发表演讲的人talk
 我正在寻找一个使用NLTK的python脚本来完成这项任务，无意中发现了这篇文章。这是我想出的解决办法。它有点嘈杂和武断，而且肯定不会总是选择正确的答案（例如，对于复合名词）。但我想发布它，以防它对其他人有帮助，有一个主要有效的解决方案
#!/usr/bin/env python

from nltk.tree import Tree

examples = [
    '(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))',
    "(ROOT\n  (S\n    (NP\n      (NP (DT the) (NN person))\n      (SBAR\n        (WHNP (WDT that))\n        (S\n          (VP (VBD gave)\n            (NP (DT the) (NN talk))))))\n    (VP (VBD went)\n      (NP (NN home)))))",
    '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
]

def find_noun_phrases(tree):
    return [subtree for subtree in tree.subtrees(lambda t: t.label()=='NP')]

def find_head_of_np(np):
    noun_tags = ['NN', 'NNS', 'NNP', 'NNPS']
    top_level_trees = [np[i] for i in range(len(np)) if type(np[i]) is Tree]
    ## search for a top-level noun
    top_level_nouns = [t for t in top_level_trees if t.label() in noun_tags]
    if len(top_level_nouns) > 0:
        ## if you find some, pick the rightmost one, just 'cause
        return top_level_nouns[-1][0]
    else:
        ## search for a top-level np
        top_level_nps = [t for t in top_level_trees if t.label()=='NP']
        if len(top_level_nps) > 0:
            ## if you find some, pick the head of the rightmost one, just 'cause
            return find_head_of_np(top_level_nps[-1])
        else:
            ## search for any noun
            nouns = [p[0] for p in np.pos() if p[1] in noun_tags]
            if len(nouns) > 0:
                ## if you find some, pick the rightmost one, just 'cause
                return nouns[-1]
            else:
                ## return the rightmost word, just 'cause
                return np.leaves()[-1]

for example in examples:
    tree = Tree.fromstring(example)
    for np in find_noun_phrases(tree):
        print "noun phrase:",
        print " ".join(np.leaves())
        head = find_head_of_np(np)
        print "head:",
        print head

对于问题和其他答案中讨论的示例，这是输出：
noun phrase: The old oak tree from India
head: tree
noun phrase: The old oak tree
head: tree
noun phrase: India
head: India
noun phrase: the person that gave the talk
head: person
noun phrase: the person
head: person
noun phrase: the talk
head: talk
noun phrase: home
head: home
noun phrase: Carnac the Magnificent
head: Magnificent
noun phrase: a talk
head: talk

你的问题是什么？@barny如何找到头部和NP请阅读帮助页面。在这种情况下，显示您确实得到的输出：“不工作”对于StackOverflow是不够的。另外，请尝试向代码中添加更多的print语句（例如，在遍历之前的一个语句（子语句），以及要遍历的条目上的另一个语句）。发布该执行跟踪的输出——前提是它不会立即向您显示问题。我最终完成了这项工作，如代码所示，但只需要添加一个条件来检查rightmost是否是NNSo，类似于上面代码中的np head条件：t.leaves（）.getrightmostnoun（）注意，最右边的名词并不总是NP的头名词！Michael Collins的论文（附录A）包括宾夕法尼亚州树状银行的头部查找规则，因此，在NLTK github问题上，没有必要只使用最右边的名词作为head3Ask，以便在遇到问题时有人帮助实施。更好的是，尝试实现，使用您的工作代码执行一个pull请求，并请求代码审查，我相信NLTK dev将帮助您完成这一任务。或者等待其他人编写代码=）
>>> s = '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
>>> Tree.fromstring(s)
Tree('ROOT', [Tree('S', [Tree('NP', [Tree('NN', ['Carnac']), Tree('DT', ['the']), Tree('NN', ['Magnificent'])]), Tree('VP', [Tree('VBD', ['gave']), Tree('NP', [Tree('', [Tree('DT', ['a']), Tree('NN', ['talk'])])])])])])
>>> for i in Tree.fromstring(s).subtrees():
...     if i.label() == 'NP':
...             print i.leaves()[-1]
... 
Magnificent
talk

#!/usr/bin/env python

from nltk.tree import Tree

examples = [
    '(ROOT (S (NP (NP (DT The) (JJ old) (NN oak) (NN tree)) (PP (IN from) (NP (NNP India)))) (VP (VBD fell) (PRT (RP down)))))',
    "(ROOT\n  (S\n    (NP\n      (NP (DT the) (NN person))\n      (SBAR\n        (WHNP (WDT that))\n        (S\n          (VP (VBD gave)\n            (NP (DT the) (NN talk))))))\n    (VP (VBD went)\n      (NP (NN home)))))",
    '(ROOT (S (NP (NN Carnac) (DT the) (NN Magnificent)) (VP (VBD gave) (NP ((DT a) (NN talk))))))'
]

def find_noun_phrases(tree):
    return [subtree for subtree in tree.subtrees(lambda t: t.label()=='NP')]

def find_head_of_np(np):
    noun_tags = ['NN', 'NNS', 'NNP', 'NNPS']
    top_level_trees = [np[i] for i in range(len(np)) if type(np[i]) is Tree]
    ## search for a top-level noun
    top_level_nouns = [t for t in top_level_trees if t.label() in noun_tags]
    if len(top_level_nouns) > 0:
        ## if you find some, pick the rightmost one, just 'cause
        return top_level_nouns[-1][0]
    else:
        ## search for a top-level np
        top_level_nps = [t for t in top_level_trees if t.label()=='NP']
        if len(top_level_nps) > 0:
            ## if you find some, pick the head of the rightmost one, just 'cause
            return find_head_of_np(top_level_nps[-1])
        else:
            ## search for any noun
            nouns = [p[0] for p in np.pos() if p[1] in noun_tags]
            if len(nouns) > 0:
                ## if you find some, pick the rightmost one, just 'cause
                return nouns[-1]
            else:
                ## return the rightmost word, just 'cause
                return np.leaves()[-1]

for example in examples:
    tree = Tree.fromstring(example)
    for np in find_noun_phrases(tree):
        print "noun phrase:",
        print " ".join(np.leaves())
        head = find_head_of_np(np)
        print "head:",
        print head

noun phrase: The old oak tree from India
head: tree
noun phrase: The old oak tree
head: tree
noun phrase: India
head: India
noun phrase: the person that gave the talk
head: person
noun phrase: the person
head: person
noun phrase: the talk
head: talk
noun phrase: home
head: home
noun phrase: Carnac the Magnificent
head: Magnificent
noun phrase: a talk
head: talk