Java 如何在CoreNLP完成ssplit后获得句子的原始文本？_Java_Stanford Nlp

Java 如何在CoreNLP完成ssplit后获得句子的原始文本？

java stanford-nlp

Java 如何在CoreNLP完成ssplit后获得句子的原始文本？,java,stanford-nlp,Java,Stanford Nlp,CoreNLP的标记化改变了句子文本。将由空白分隔的标记缝合在一起不是真正的重建。如果句子包含圆括号和其他标点符号，事情就会变得复杂。请参阅下面的代码块 Properties props = new Properties(); props.setProperty("annotators", "tokenize, ssplit"); pipeline = new StanfordCoreNLP(props); Annotation document = new Annotation(paragr

CoreNLP的标记化改变了句子文本。将由空白分隔的标记缝合在一起不是真正的重建。如果句子包含圆括号和其他标点符号，事情就会变得复杂。请参阅下面的代码块

Properties props = new Properties();
props.setProperty("annotators", "tokenize, ssplit");
pipeline = new StanfordCoreNLP(props);

Annotation document = new Annotation(paragraph);
pipeline.annotate(document);

List<CoreMap>sentences = document.get(SentencesAnnotation.class);

List<String> sentenceList = new ArrayList<>();
for (CoreMap sentence : sentences) 
{
    //How to get the original text of sentence?
}

Properties=newproperties（）；
props.setProperty（“注释器”、“标记化、ssplit”）；
管道=新StanfordCoreNLP（道具）；
注释文件=新注释（段落）；
管道注释（文件）；
listQuences=document.get（SentencesAnnotation.class）；
List sentenceList=新的ArrayList（）；
for（CoreMap句子：句子）
{
//如何获得句子的原文？
}

回答我自己的问题。这很容易。在问题代码块中插入以下行以代替注释

String sentenceString = Sentence.listToOriginalTextString(sentence.get(TokensAnnotation.class));

for (CoreMap sentence : sentences) 
{
    String sentenceStr = sentence.get(CoreAnnotations.TextAnnotation.class)
}