我如何在CoreNLP中使用coreference结果迭代令牌属性?
我正在寻找一种从CoreNLP中提取和合并注释结果的方法。具体来说我如何在CoreNLP中使用coreference结果迭代令牌属性?,nlp,stanford-nlp,Nlp,Stanford Nlp,我正在寻找一种从CoreNLP中提取和合并注释结果的方法。具体来说 import stanza import os from stanza.server import CoreNLPClient corenlp_dir = '/Users/fatih/stanford-corenlp-4.2.0/' os.environ['CORENLP_HOME'] = corenlp_dir client = CoreNLPClient( annotators=['tokenize','sspli
import stanza
import os
from stanza.server import CoreNLPClient
corenlp_dir = '/Users/fatih/stanford-corenlp-4.2.0/'
os.environ['CORENLP_HOME'] = corenlp_dir
client = CoreNLPClient(
annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'coref'],
memory='4G',
endpoint='http://localhost:9001',
be_quiet=True)
text = "Barack Obama was born in Hawaii. He is the president. Obama was elected in 2008."
doc = client.annotate(text)
for x in doc.corefChain:
for y in x.mention:
print(y.animacy)
ANIMATE
ANIMATE
ANIMATE
我想将这些结果与以下代码中的结果合并:
for i, sent in enumerate(document.sentence):
print("[Sentence {}]".format(i+1))
for t in sent.token:
print("{:12s}\t{:12s}\t{:6s}\t{}".format(t.word, t.lemma, t.pos, t.ner))
print("")
Barack Barack NNP PERSON
Obama Obama NNP PERSON
was be VBD O
born bear VBN O
in in IN O
Hawaii Hawaii NNP STATE_OR_PROVINCE
. . . O
[Sentence 2]
He he PRP O
is be VBZ O
the the DT O
president president NN TITLE
. . . O
[Sentence 3]
Obama Obama NNP PERSON
was be VBD O
elected elect VBN O
in in IN O
2008 2008 CD DATE
. . . O
由于注释存储在不同的对象中,所以我无法迭代两个不同的对象并获得相关项的结果
有出路吗
谢谢。coref链有一个句子索引和一个beginIndex,应该与句子中的位置相关。您可以使用它将两者关联起来 编辑:对示例代码进行快速且不干净的更改:
from collections import defaultdict
from stanza.server import CoreNLPClient
client = CoreNLPClient(
annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'coref'],
be_quiet=False)
text = "Barack Obama was born in Hawaii. In 2008 he became the president."
doc = client.annotate(text)
animacy = defaultdict(dict)
for x in doc.corefChain:
for y in x.mention:
print(y.animacy)
for i in range(y.beginIndex, y.endIndex):
animacy[y.sentenceIndex][i] = True
print(y.sentenceIndex, i)
for sent_idx, sent in enumerate(doc.sentence):
print("[Sentence {}]".format(sent_idx+1))
for t_idx, token in enumerate(sent.token):
animate = animacy[sent_idx].get(t_idx, False)
print("{:12s}\t{:12s}\t{:6s}\t{:20s}\t{}".format(token.word, token.lemma, token.pos, token.ner, animate))
print("")
你能提供一个示例代码吗?