Stanford nlp Stanford NLP命名了多个令牌的实体_Stanford Nlp

Stanford nlp Stanford NLP命名了多个令牌的实体

stanford-nlp

Stanford nlp Stanford NLP命名了多个令牌的实体,stanford-nlp,Stanford Nlp,我正在用StanfordCore NLP进行命名实体识别的实验一些命名实体由多个标记组成，例如Person：“Bill Smith”。我不知道使用什么API调用来确定“Bill”和“Smith”何时应该被视为一个实体，以及何时应该是两个不同的实体是否有一些合适的文档可以解释这一点以下是我当前的代码： InputStream is = getClass().getResourceAsStream(MODEL_NAME); if (MODEL_NAME.endsWith(".g

我正在用StanfordCore NLP进行命名实体识别的实验

一些命名实体由多个标记组成，例如Person：“Bill Smith”。我不知道使用什么API调用来确定“Bill”和“Smith”何时应该被视为一个实体，以及何时应该是两个不同的实体

是否有一些合适的文档可以解释这一点

以下是我当前的代码：

    InputStream is = getClass().getResourceAsStream(MODEL_NAME);
    if (MODEL_NAME.endsWith(".gz")) {
        is = new GZIPInputStream(is);
    }
    is = new BufferedInputStream(is);

    Properties props = new Properties();
    props.setProperty("annotators", "tokenize, ssplit, pos, lemma, ner, parse, dcoref");

    AbstractSequenceClassifier<CoreLabel> classifier = CRFClassifier.getClassifier(is);
    is.close();

    String text = "Hello, Bill Smith, how are you?";

    List<List<CoreLabel>> sentences = classifier.classify(text);
    for (List<CoreLabel> sentence: sentences) {
        for (CoreLabel word: sentence) {
            String type = word.get(CoreAnnotations.AnswerAnnotation.class);
            System.out.println(word + " is of type " + type);
        }
    }

InputStream is=getClass（）.getResourceAsStream（模型名称）；
if（型号名称.endsWith（“.gz”））{
is=新的GZIPInputStream（is）；
}
is=新的BufferedInputStream（is）；
Properties props=新属性（）；
props.setProperty（“注释器”、“标记化、ssplit、pos、引理、ner、解析、dcoref”）；
AbstractSequenceClassifier=CRFClassizer.getClassifier（is）；
is.close（）；
String text=“你好，比尔·史密斯，你好吗？”；
列出句子=分类器。分类（文本）；
for（列出句子：句子）{
for（CoreLabel单词：句子）{
字符串类型=word.get（CoreAnnotations.AnswerAnnotation.class）；
System.out.println（word+“为”+类型）；
}
}

另外，我也不清楚为什么“PERSON”注释会作为AnswerAnnotation返回，而不是CoreAnnotations.EntityClassAnnotation、EntityTypeAnnotation或其他内容。

您应该使用“EntityManotions”注释器，它将标记具有与实体相同的ner标记的连续标记序列。每个句子的实体列表将存储在CoreAnnotations.ReferencesAnnotation.class键下。每一个实体提到自己都将是一个核心地图

查看此代码有助于：

一些示例代码：

import java.io.*;
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.util.*;



public class EntityMentionsExample {

  public static void main (String[] args) throws IOException {
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    String text = "Joe Smith is from Florida.";
    Annotation annotation = new Annotation(text);
    pipeline.annotate(annotation);
    System.out.println("---");
    System.out.println("text: " + text);
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      for (CoreMap entityMention : sentence.get(CoreAnnotations.MentionsAnnotation.class)) {
        System.out.print(entityMention.get(CoreAnnotations.TextAnnotation.class));
        System.out.print("\t");
        System.out.print(
                entityMention.get(CoreAnnotations.NamedEntityTagAnnotation.class));
        System.out.println();
      }
    }
  }
}

谢谢三个快速问题：此代码如何知道要加载哪些模型？这个代码的加载速度比我的原始代码慢得多；如何缩短启动时间？有更好的文档吗？它正在加载默认模型。您可以使用属性“ner.model”设置要加载的模型。一般来说，我会坚持使用我的代码并使用StanfordCoreNLP管道。独立分类器是一种比较古老的使用ner的方法。这是文档站点：这里有一些关于API的更具体的注释：如果您想使用Stanford CoreNLP管道，但不必不断地重新加载它，您可以构造代码，只需调用Stanford CoreNLP服务器。下面是关于服务器的文档：但是如果您只想要一个启动并运行的简单Java类，那么在启动时加载模型是不可能的。