Java CoreNLP提取令牌的跨度

Java CoreNLP提取令牌的跨度,java,annotations,nlp,stanford-nlp,Java,Annotations,Nlp,Stanford Nlp,我想提取一个标记化的字符串的跨度。使用斯坦福大学的CoreNLP,我有: Properties props; props = new Properties(); props.put("annotators", "tokenize, ssplit, pos, lemma"); this.pipeline = new StanfordCoreNLP(props); String answerText = "This is the answer"; ArrayList<IntPair>

我想提取一个标记化的
字符串的跨度。使用斯坦福大学的CoreNLP,我有:

Properties props;
props = new Properties();
props.put("annotators", "tokenize, ssplit, pos, lemma");
this.pipeline = new StanfordCoreNLP(props);

String answerText = "This is the answer";
ArrayList<IntPair> tokenSpans = new ArrayList<IntPair>();
// create an empty Annotation with just the given text
Annotation document = new Annotation(answerText);
// run all Annotators on this text
this.pipeline.annotate(document);

// Iterate over all of the sentences
List<CoreMap> sentences = document.get(SentencesAnnotation.class);
for(CoreMap sentence: sentences) {
    // Iterate over all tokens in a sentence
    for (CoreLabel fullToken: sentence.get(TokensAnnotation.class)) {
        IntPair span = fullToken.get(SpanAnnotation.class);
        tokenSpans.add(span);
    }
}
期望输出:

(0,3), (5,6), (8,10), (12,17)

问题在于使用
SpanAnnotation
,它适用于
。此查询的正确类是
characterOffsetBeginNotation
characterOffsetEndNotation

例如,它们可以这样使用:

List<IntPair> spans = tokenSeq.stream()
    .map(token -> 
        new IntPair( 

  token.get(CoreAnnotations.CharacterOffsetBeginAnnotation.class),

  token.get(CoreAnnotations.CharacterOffsetEndAnnotation.class)))
List span=tokenSeq.stream()
.map(令牌->
新IntPair(
get(CoreAnnotations.CharacterOffsetBeginAnotation.class),
get(CoreAnnotations.CharacterOffsetEndAnnotation.class)))
…请原谅我的压痕

List<IntPair> spans = tokenSeq.stream()
    .map(token -> 
        new IntPair( 

  token.get(CoreAnnotations.CharacterOffsetBeginAnnotation.class),

  token.get(CoreAnnotations.CharacterOffsetEndAnnotation.class)))