斯坦福NLP管道&x2013；顺序处理（Java）_Java_Stanford Nlp

斯坦福NLP管道&x2013；顺序处理（Java）

java stanford-nlp

斯坦福NLP管道&x2013；顺序处理（Java）,java,stanford-nlp,Java,Stanford Nlp,如何正确使用斯坦福NLP管道进行两阶段注释在第一阶段中，我只需要标记化和分句，因此我使用以下代码： private Annotation annotatedDocument = null; private StanfordCoreNLP pipeline = null; ... public void firstPhase() { Properties props = new Properties(); props.setProperty("annotat

如何正确使用斯坦福NLP管道进行两阶段注释

在第一阶段中，我只需要标记化和分句，因此我使用以下代码：

private Annotation annotatedDocument = null;
private StanfordCoreNLP pipeline = null;

...

public void firstPhase() {
        Properties props = new Properties();
        props.setProperty("annotators", "tokenize, ssplit");

        pipeline = new StanfordCoreNLP(props);
        annotatedDocument = new Annotation(textDocument);
}

第二阶段是可选的，因此我不会在第一阶段使用所有注释器。第二阶段代码：

public void secondPhase() { POSTaggerAnnotator posTaggerAnot = new POSTaggerAnnotator(); posAnot.annotate(annotatedDocument); // Lemmatization MorphaAnnotator morphaAnot = new MorphaAnnotator(); morphaAnot.annotate(annotatedDocument); }

第一个问题：在第二阶段使用“独立”注释器的方法正确吗？还是有办法利用现有的管道
第二个问题：我对相关注释器有问题。我希望在第二阶段使用它，如下所示：

CorefAnnotator coref = new CorefAnnotator(new Properties());

但这个构造器似乎永无止境。没有属性的构造函数不存在，对吗？是否需要一些属性设置？
有[至少]3种方法可以做到这一点：

你描述的方式。只需调用单个注释器并将它们链接在一起是完全有效的。coref注释器应该使用空属性——也许您需要更多内存？加载有点慢，而且模型也不小

如果要继续使用管道，可以创建部分管道并设置属性
enforceRequirements=false
。这将为您链接注释器，但不需要满足它们的要求——即，如果您知道某些注释已经存在，则不必重新运行它们相应的注释器

这是一个更大的变化，但实际上自动执行这种惰性评估。因此，您只需创建一个
文档
对象，当您请求各种注释时，它会懒散地将它们插入

没错，Coref注释器的问题是java.lang.OutOfMemoryError异常。