我如何在CoreNLP中使用coreference结果迭代令牌属性?

我如何在CoreNLP中使用coreference结果迭代令牌属性?,nlp,stanford-nlp,Nlp,Stanford Nlp,我正在寻找一种从CoreNLP中提取和合并注释结果的方法。具体来说 import stanza import os from stanza.server import CoreNLPClient corenlp_dir = '/Users/fatih/stanford-corenlp-4.2.0/' os.environ['CORENLP_HOME'] = corenlp_dir client = CoreNLPClient( annotators=['tokenize','sspli

我正在寻找一种从CoreNLP中提取和合并注释结果的方法。具体来说

import stanza
import os
from stanza.server import CoreNLPClient
corenlp_dir = '/Users/fatih/stanford-corenlp-4.2.0/'
os.environ['CORENLP_HOME'] = corenlp_dir

client = CoreNLPClient(
    annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'coref'], 
    memory='4G', 
    endpoint='http://localhost:9001',
    be_quiet=True)

text = "Barack Obama was born in Hawaii.  He is the president. Obama was elected in 2008."

doc = client.annotate(text)

for x in doc.corefChain:
    for y in x.mention:
        print(y.animacy)
        
ANIMATE
ANIMATE
ANIMATE
我想将这些结果与以下代码中的结果合并:

for i, sent in enumerate(document.sentence):
    print("[Sentence {}]".format(i+1))
    for t in sent.token:
        print("{:12s}\t{:12s}\t{:6s}\t{}".format(t.word, t.lemma, t.pos, t.ner))
    print("")

Barack          Barack          NNP     PERSON
Obama           Obama           NNP     PERSON
was             be              VBD     O
born            bear            VBN     O
in              in              IN      O
Hawaii          Hawaii          NNP     STATE_OR_PROVINCE
.               .               .       O

[Sentence 2]
He              he              PRP     O
is              be              VBZ     O
the             the             DT      O
president       president       NN      TITLE
.               .               .       O

[Sentence 3]
Obama           Obama           NNP     PERSON
was             be              VBD     O
elected         elect           VBN     O
in              in              IN      O
2008            2008            CD      DATE
.               .               .       O
由于注释存储在不同的对象中,所以我无法迭代两个不同的对象并获得相关项的结果

有出路吗


谢谢。

coref链有一个句子索引和一个beginIndex,应该与句子中的位置相关。您可以使用它将两者关联起来

编辑:对示例代码进行快速且不干净的更改:

from collections import defaultdict
from stanza.server import CoreNLPClient

client = CoreNLPClient(
    annotators=['tokenize','ssplit', 'pos', 'lemma', 'ner', 'coref'],
    be_quiet=False)

text = "Barack Obama was born in Hawaii.  In 2008 he became the president."

doc = client.annotate(text)

animacy = defaultdict(dict)
for x in doc.corefChain:
    for y in x.mention:
        print(y.animacy)
        for i in range(y.beginIndex, y.endIndex):
            animacy[y.sentenceIndex][i] = True
            print(y.sentenceIndex, i)

for sent_idx, sent in enumerate(doc.sentence):
    print("[Sentence {}]".format(sent_idx+1))
    for t_idx, token in enumerate(sent.token):
        animate = animacy[sent_idx].get(t_idx, False)
        print("{:12s}\t{:12s}\t{:6s}\t{:20s}\t{}".format(token.word, token.lemma, token.pos, token.ner, animate))
    print("")

你能提供一个示例代码吗?