Stanford对Python NLTK的通用依赖性_Python_Nlp_Nltk_Stanford Nlp

Stanford对Python NLTK的通用依赖性

python nlp stanford-nlp

Stanford对Python NLTK的通用依赖性,python,nlp,nltk,stanford-nlp,Python,Nlp,Nltk,Stanford Nlp,有什么方法可以使用python或nltk获得通用依赖项吗？我只能生成解析树例如：输入句子： My dog also likes eating sausage. 输出： Universal dependencies nmod:poss(dog-2, My-1) nsubj(likes-4, dog-2) advmod(likes-4, also-3) root(ROOT-0, likes-4) xcomp(likes-4, eating-5) dobj(eating-5, sausage-

有什么方法可以使用python或nltk获得通用依赖项吗？我只能生成解析树

例如：

输入句子：

My dog also likes eating sausage.

输出：

Universal dependencies

nmod:poss(dog-2, My-1)
nsubj(likes-4, dog-2)
advmod(likes-4, also-3)
root(ROOT-0, likes-4)
xcomp(likes-4, eating-5)
dobj(eating-5, sausage-6)

这是一个良好的开端，因为它与最新的CoreNLP版本（3.5.2）配合使用。但是，它将为您提供原始输出，您需要手动转换这些输出。例如，假设包装器正在运行：

>>> import json, jsonrpclib
>>> from pprint import pprint
>>>
>>> server = jsonrpclib.Server("http://localhost:8080")
>>>
>>> pprint(json.loads(server.parse('John loves Mary.')))  # doctest: +SKIP
{u'sentences': [{u'dependencies': [[u'root', u'ROOT', u'0', u'loves', u'2'],
                                   [u'nsubj',
                                    u'loves',
                                    u'2',
                                    u'John',
                                    u'1'],
                                   [u'dobj', u'loves', u'2', u'Mary', u'3'],
                                   [u'punct', u'loves', u'2', u'.', u'4']],
                 u'parsetree': [],
                 u'text': u'John loves Mary.',
                 u'words': [[u'John',
                             {u'CharacterOffsetBegin': u'0',
                              u'CharacterOffsetEnd': u'4',
                              u'Lemma': u'John',
                              u'PartOfSpeech': u'NNP'}],
                            [u'loves',
                             {u'CharacterOffsetBegin': u'5',
                              u'CharacterOffsetEnd': u'10',
                              u'Lemma': u'love',
                              u'PartOfSpeech': u'VBZ'}],
                            [u'Mary',
                             {u'CharacterOffsetBegin': u'11',
                              u'CharacterOffsetEnd': u'15',
                              u'Lemma': u'Mary',
                              u'PartOfSpeech': u'NNP'}],
                            [u'.',
                             {u'CharacterOffsetBegin': u'15',
                              u'CharacterOffsetEnd': u'16',
                              u'Lemma': u'.',
                              u'PartOfSpeech': u'.'}]]}]}

如果您想使用依赖关系解析器，您可以稍加努力重用NLTK的DependencyGraph

>>> import jsonrpclib, json
>>> from nltk.parse import DependencyGraph
>>>
>>> server = jsonrpclib.Server("http://localhost:8080")
>>> parses = json.loads(
...    server.parse(
...       'John loves Mary. '
...       'I saw a man with a telescope. '
...       'Ballmer has been vocal in the past warning that Linux is a threat to Microsoft.'
...    )
... )['sentences']
>>>
>>> def transform(sentence):
...     for rel, _, head, word, n in sentence['dependencies']:
...         n = int(n)
...
...         word_info = sentence['words'][n - 1][1]
...         tag = word_info['PartOfSpeech']
...         lemma = word_info['Lemma']
...         if rel == 'root':
...             # NLTK expects that the root relation is labelled as ROOT!
...             rel = 'ROOT'
...
...         # Hack: Return values we don't know as '_'.
...         #       Also, consider tag and ctag to be equal.
...         # n is used to sort words as they appear in the sentence.
...         yield n, '_', word, lemma, tag, tag, '_', head, rel, '_', '_'
...
>>> dgs = [
...     DependencyGraph(
...         ' '.join(items)  # NLTK expects an iterable of strings...
...         for n, *items in sorted(transform(parse))
...     )
...     for parse in parses
... ]
>>>
>>> # Play around with the information we've got.
>>>
>>> pprint(list(dgs[0].triples()))
[(('loves', 'VBZ'), 'nsubj', ('John', 'NNP')),
 (('loves', 'VBZ'), 'dobj', ('Mary', 'NNP')),
 (('loves', 'VBZ'), 'punct', ('.', '.'))]
>>>
>>> print(dgs[1].tree())
(saw I (man a (with (telescope a))) .)
>>>
>>> print(dgs[2].to_conll(4))  # doctest: +NORMALIZE_WHITESPACE
Ballmer     NNP     4       nsubj
has         VBZ     4       aux
been        VBN     4       cop
vocal       JJ      0       ROOT
in          IN      4       prep
the         DT      8       det
past        JJ      8       amod
warning     NN      5       pobj
that        WDT     13      dobj
Linux       NNP     13      nsubj
is          VBZ     13      cop
a           DT      13      det
threat      NN      8       rcmod
to          TO      13      prep
Microsoft   NNP     14      pobj
.           .       4       punct
<BLANKLINE>

导入jsonrpclib，json >>>从nltk.parse导入依赖关系图 >>> >>>server=jsonrpclib.server（“http://localhost:8080") >>>parses=json.loads( …server.parse( “约翰爱玛丽。” …“我看见一个人拿着望远镜。” “…”鲍尔默过去曾公开警告说，Linux是对微软的威胁 ... ) …）[“句子”] >>> >>>def转换（句子）： ... 对于句子['dependencies']中的rel、u、head、word、n： ... n=int（n） ... ... 单词信息=句子[单词][n-1][1] ... tag=word_info['PartOfSpeech'] ... 引理=单词信息['lemma'] ... 如果rel==“根”： ... # NLTK希望根关系被标记为根！ ... rel='ROOT' ... ... # Hack：返回值，我们不知道它是“\ux”。 ... # 此外，考虑TAG和CTAG相等。 ... # n用于对句子中出现的单词进行排序。 ... 产生n，，，，单词，引理，标签，标签 ... >>>dgs=[ …依赖关系图( …''.join（items）#NLTK需要一个字符串的iterable。。。 …对于n，*项已排序（转换（解析）） ... ) …用于在分析中进行分析 ... ] >>> >>>#玩弄我们掌握的信息。 >>> >>>pprint（列表（dgs[0].triples（）） [（（'loves'，'VBZ'），'nsubj'，（'John'，'NNP'）），（‘loves’、‘VBZ’、‘dobj’、（‘Mary’、‘NNP’）），（（‘爱’、‘VBZ’、‘点’、（‘点’、‘点’））] >>> >>>打印（dgs[1].tree（））（看见我（男人a（带望远镜a））） >>> >>>打印（dgs[2]。到（4））#doctest:+规范化空白鲍尔默NNP 4 nsubj 有VBZ 4辅助是VBN 4警察吗发声JJ0根在四年级预科 DT 8 det 过去JJ 8 amod 警告NN 5 pobj 那个WDT13DOBJ Linux NNP 13 nsubj VBZ是警察吗 a DT 13 det 威胁NN 8 rcmod 到13号准备微软NNP14POBJ . . 4点设置CoreNLP并不是那么难，请查看更多详细信息。

请参阅/--PyStanfordDependencies现在可以执行通用依赖。