Stanford对Python NLTK的通用依赖性

Stanford对Python NLTK的通用依赖性,python,nlp,nltk,stanford-nlp,Python,Nlp,Nltk,Stanford Nlp,有什么方法可以使用python或nltk获得通用依赖项吗?我只能生成解析树 例如: 输入句子: My dog also likes eating sausage. 输出: Universal dependencies nmod:poss(dog-2, My-1) nsubj(likes-4, dog-2) advmod(likes-4, also-3) root(ROOT-0, likes-4) xcomp(likes-4, eating-5) dobj(eating-5, sausage-

有什么方法可以使用python或nltk获得通用依赖项吗?我只能生成解析树

例如:

输入句子:

My dog also likes eating sausage.
输出:

Universal dependencies

nmod:poss(dog-2, My-1)
nsubj(likes-4, dog-2)
advmod(likes-4, also-3)
root(ROOT-0, likes-4)
xcomp(likes-4, eating-5)
dobj(eating-5, sausage-6)
这是一个良好的开端,因为它与最新的CoreNLP版本(3.5.2)配合使用。但是,它将为您提供原始输出,您需要手动转换这些输出。例如,假设包装器正在运行:

>>> import json, jsonrpclib
>>> from pprint import pprint
>>>
>>> server = jsonrpclib.Server("http://localhost:8080")
>>>
>>> pprint(json.loads(server.parse('John loves Mary.')))  # doctest: +SKIP
{u'sentences': [{u'dependencies': [[u'root', u'ROOT', u'0', u'loves', u'2'],
                                   [u'nsubj',
                                    u'loves',
                                    u'2',
                                    u'John',
                                    u'1'],
                                   [u'dobj', u'loves', u'2', u'Mary', u'3'],
                                   [u'punct', u'loves', u'2', u'.', u'4']],
                 u'parsetree': [],
                 u'text': u'John loves Mary.',
                 u'words': [[u'John',
                             {u'CharacterOffsetBegin': u'0',
                              u'CharacterOffsetEnd': u'4',
                              u'Lemma': u'John',
                              u'PartOfSpeech': u'NNP'}],
                            [u'loves',
                             {u'CharacterOffsetBegin': u'5',
                              u'CharacterOffsetEnd': u'10',
                              u'Lemma': u'love',
                              u'PartOfSpeech': u'VBZ'}],
                            [u'Mary',
                             {u'CharacterOffsetBegin': u'11',
                              u'CharacterOffsetEnd': u'15',
                              u'Lemma': u'Mary',
                              u'PartOfSpeech': u'NNP'}],
                            [u'.',
                             {u'CharacterOffsetBegin': u'15',
                              u'CharacterOffsetEnd': u'16',
                              u'Lemma': u'.',
                              u'PartOfSpeech': u'.'}]]}]}
如果您想使用依赖关系解析器,您可以稍加努力重用NLTK的DependencyGraph

>>> import jsonrpclib, json
>>> from nltk.parse import DependencyGraph
>>>
>>> server = jsonrpclib.Server("http://localhost:8080")
>>> parses = json.loads(
...    server.parse(
...       'John loves Mary. '
...       'I saw a man with a telescope. '
...       'Ballmer has been vocal in the past warning that Linux is a threat to Microsoft.'
...    )
... )['sentences']
>>>
>>> def transform(sentence):
...     for rel, _, head, word, n in sentence['dependencies']:
...         n = int(n)
...
...         word_info = sentence['words'][n - 1][1]
...         tag = word_info['PartOfSpeech']
...         lemma = word_info['Lemma']
...         if rel == 'root':
...             # NLTK expects that the root relation is labelled as ROOT!
...             rel = 'ROOT'
...
...         # Hack: Return values we don't know as '_'.
...         #       Also, consider tag and ctag to be equal.
...         # n is used to sort words as they appear in the sentence.
...         yield n, '_', word, lemma, tag, tag, '_', head, rel, '_', '_'
...
>>> dgs = [
...     DependencyGraph(
...         ' '.join(items)  # NLTK expects an iterable of strings...
...         for n, *items in sorted(transform(parse))
...     )
...     for parse in parses
... ]
>>>
>>> # Play around with the information we've got.
>>>
>>> pprint(list(dgs[0].triples()))
[(('loves', 'VBZ'), 'nsubj', ('John', 'NNP')),
 (('loves', 'VBZ'), 'dobj', ('Mary', 'NNP')),
 (('loves', 'VBZ'), 'punct', ('.', '.'))]
>>>
>>> print(dgs[1].tree())
(saw I (man a (with (telescope a))) .)
>>>
>>> print(dgs[2].to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
Ballmer     NNP     4       nsubj
has         VBZ     4       aux
been        VBN     4       cop
vocal       JJ      0       ROOT
in          IN      4       prep
the         DT      8       det
past        JJ      8       amod
warning     NN      5       pobj
that        WDT     13      dobj
Linux       NNP     13      nsubj
is          VBZ     13      cop
a           DT      13      det
threat      NN      8       rcmod
to          TO      13      prep
Microsoft   NNP     14      pobj
.           .       4       punct
<BLANKLINE>
导入jsonrpclib,json >>>从nltk.parse导入依赖关系图 >>> >>>server=jsonrpclib.server(“http://localhost:8080") >>>parses=json.loads( …server.parse( “约翰爱玛丽。” …“我看见一个人拿着望远镜。” “…”鲍尔默过去曾公开警告说,Linux是对微软的威胁 ... ) …)[“句子”] >>> >>>def转换(句子): ... 对于句子['dependencies']中的rel、u、head、word、n: ... n=int(n) ... ... 单词信息=句子[单词][n-1][1] ... tag=word_info['PartOfSpeech'] ... 引理=单词信息['lemma'] ... 如果rel==“根”: ... # NLTK希望根关系被标记为根! ... rel='ROOT' ... ... # Hack:返回值,我们不知道它是“\ux”。 ... # 此外,考虑TAG和CTAG相等。 ... # n用于对句子中出现的单词进行排序。 ... 产生n,,,,单词,引理,标签,标签 ... >>>dgs=[ …依赖关系图( …''.join(items)#NLTK需要一个字符串的iterable。。。 …对于n,*项已排序(转换(解析)) ... ) …用于在分析中进行分析 ... ] >>> >>>#玩弄我们掌握的信息。 >>> >>>pprint(列表(dgs[0].triples()) [(('loves','VBZ'),'nsubj',('John','NNP')), (‘loves’、‘VBZ’、‘dobj’、(‘Mary’、‘NNP’)), ((‘爱’、‘VBZ’、‘点’、(‘点’、‘点’))] >>> >>>打印(dgs[1].tree()) (看见我(男人a(带望远镜a))) >>> >>>打印(dgs[2]。到(4))#doctest:+规范化空白 鲍尔默NNP 4 nsubj 有VBZ 4辅助 是VBN 4警察吗 发声JJ0根 在四年级预科 DT 8 det 过去JJ 8 amod 警告NN 5 pobj 那个WDT13DOBJ Linux NNP 13 nsubj VBZ是警察吗 a DT 13 det 威胁NN 8 rcmod 到13号准备 微软NNP14POBJ . . 4点 设置CoreNLP并不是那么难,请查看更多详细信息。

请参阅/--PyStanfordDependencies现在可以执行通用依赖。