Stanford对Python NLTK的通用依赖性
有什么方法可以使用python或nltk获得通用依赖项吗?我只能生成解析树 例如: 输入句子:Stanford对Python NLTK的通用依赖性,python,nlp,nltk,stanford-nlp,Python,Nlp,Nltk,Stanford Nlp,有什么方法可以使用python或nltk获得通用依赖项吗?我只能生成解析树 例如: 输入句子: My dog also likes eating sausage. 输出: Universal dependencies nmod:poss(dog-2, My-1) nsubj(likes-4, dog-2) advmod(likes-4, also-3) root(ROOT-0, likes-4) xcomp(likes-4, eating-5) dobj(eating-5, sausage-
My dog also likes eating sausage.
输出:
Universal dependencies
nmod:poss(dog-2, My-1)
nsubj(likes-4, dog-2)
advmod(likes-4, also-3)
root(ROOT-0, likes-4)
xcomp(likes-4, eating-5)
dobj(eating-5, sausage-6)
这是一个良好的开端,因为它与最新的CoreNLP版本(3.5.2)配合使用。但是,它将为您提供原始输出,您需要手动转换这些输出。例如,假设包装器正在运行:
>>> import json, jsonrpclib
>>> from pprint import pprint
>>>
>>> server = jsonrpclib.Server("http://localhost:8080")
>>>
>>> pprint(json.loads(server.parse('John loves Mary.'))) # doctest: +SKIP
{u'sentences': [{u'dependencies': [[u'root', u'ROOT', u'0', u'loves', u'2'],
[u'nsubj',
u'loves',
u'2',
u'John',
u'1'],
[u'dobj', u'loves', u'2', u'Mary', u'3'],
[u'punct', u'loves', u'2', u'.', u'4']],
u'parsetree': [],
u'text': u'John loves Mary.',
u'words': [[u'John',
{u'CharacterOffsetBegin': u'0',
u'CharacterOffsetEnd': u'4',
u'Lemma': u'John',
u'PartOfSpeech': u'NNP'}],
[u'loves',
{u'CharacterOffsetBegin': u'5',
u'CharacterOffsetEnd': u'10',
u'Lemma': u'love',
u'PartOfSpeech': u'VBZ'}],
[u'Mary',
{u'CharacterOffsetBegin': u'11',
u'CharacterOffsetEnd': u'15',
u'Lemma': u'Mary',
u'PartOfSpeech': u'NNP'}],
[u'.',
{u'CharacterOffsetBegin': u'15',
u'CharacterOffsetEnd': u'16',
u'Lemma': u'.',
u'PartOfSpeech': u'.'}]]}]}
如果您想使用依赖关系解析器,您可以稍加努力重用NLTK的DependencyGraph
>>> import jsonrpclib, json
>>> from nltk.parse import DependencyGraph
>>>
>>> server = jsonrpclib.Server("http://localhost:8080")
>>> parses = json.loads(
... server.parse(
... 'John loves Mary. '
... 'I saw a man with a telescope. '
... 'Ballmer has been vocal in the past warning that Linux is a threat to Microsoft.'
... )
... )['sentences']
>>>
>>> def transform(sentence):
... for rel, _, head, word, n in sentence['dependencies']:
... n = int(n)
...
... word_info = sentence['words'][n - 1][1]
... tag = word_info['PartOfSpeech']
... lemma = word_info['Lemma']
... if rel == 'root':
... # NLTK expects that the root relation is labelled as ROOT!
... rel = 'ROOT'
...
... # Hack: Return values we don't know as '_'.
... # Also, consider tag and ctag to be equal.
... # n is used to sort words as they appear in the sentence.
... yield n, '_', word, lemma, tag, tag, '_', head, rel, '_', '_'
...
>>> dgs = [
... DependencyGraph(
... ' '.join(items) # NLTK expects an iterable of strings...
... for n, *items in sorted(transform(parse))
... )
... for parse in parses
... ]
>>>
>>> # Play around with the information we've got.
>>>
>>> pprint(list(dgs[0].triples()))
[(('loves', 'VBZ'), 'nsubj', ('John', 'NNP')),
(('loves', 'VBZ'), 'dobj', ('Mary', 'NNP')),
(('loves', 'VBZ'), 'punct', ('.', '.'))]
>>>
>>> print(dgs[1].tree())
(saw I (man a (with (telescope a))) .)
>>>
>>> print(dgs[2].to_conll(4)) # doctest: +NORMALIZE_WHITESPACE
Ballmer NNP 4 nsubj
has VBZ 4 aux
been VBN 4 cop
vocal JJ 0 ROOT
in IN 4 prep
the DT 8 det
past JJ 8 amod
warning NN 5 pobj
that WDT 13 dobj
Linux NNP 13 nsubj
is VBZ 13 cop
a DT 13 det
threat NN 8 rcmod
to TO 13 prep
Microsoft NNP 14 pobj
. . 4 punct
<BLANKLINE>
导入jsonrpclib,json
>>>从nltk.parse导入依赖关系图
>>>
>>>server=jsonrpclib.server(“http://localhost:8080")
>>>parses=json.loads(
…server.parse(
“约翰爱玛丽。”
…“我看见一个人拿着望远镜。”
“…”鲍尔默过去曾公开警告说,Linux是对微软的威胁
... )
…)[“句子”]
>>>
>>>def转换(句子):
... 对于句子['dependencies']中的rel、u、head、word、n:
... n=int(n)
...
... 单词信息=句子[单词][n-1][1]
... tag=word_info['PartOfSpeech']
... 引理=单词信息['lemma']
... 如果rel==“根”:
... # NLTK希望根关系被标记为根!
... rel='ROOT'
...
... # Hack:返回值,我们不知道它是“\ux”。
... # 此外,考虑TAG和CTAG相等。
... # n用于对句子中出现的单词进行排序。
... 产生n,,,,单词,引理,标签,标签
...
>>>dgs=[
…依赖关系图(
…''.join(items)#NLTK需要一个字符串的iterable。。。
…对于n,*项已排序(转换(解析))
... )
…用于在分析中进行分析
... ]
>>>
>>>#玩弄我们掌握的信息。
>>>
>>>pprint(列表(dgs[0].triples())
[(('loves','VBZ'),'nsubj',('John','NNP')),
(‘loves’、‘VBZ’、‘dobj’、(‘Mary’、‘NNP’)),
((‘爱’、‘VBZ’、‘点’、(‘点’、‘点’))]
>>>
>>>打印(dgs[1].tree())
(看见我(男人a(带望远镜a)))
>>>
>>>打印(dgs[2]。到(4))#doctest:+规范化空白
鲍尔默NNP 4 nsubj
有VBZ 4辅助
是VBN 4警察吗
发声JJ0根
在四年级预科
DT 8 det
过去JJ 8 amod
警告NN 5 pobj
那个WDT13DOBJ
Linux NNP 13 nsubj
VBZ是警察吗
a DT 13 det
威胁NN 8 rcmod
到13号准备
微软NNP14POBJ
. . 4点
设置CoreNLP并不是那么难,请查看更多详细信息。请参阅/--PyStanfordDependencies现在可以执行通用依赖。