Java 如何提取命名实体+；文中动词_Java_Nlp_Stanford Nlp

Java 如何提取命名实体+；文中动词

java nlp stanford-nlp

Java 如何提取命名实体+；文中动词,java,nlp,stanford-nlp,Java,Nlp,Stanford Nlp,嗯，我的目标是从文本中提取NE（Person）和与其相关的动词。例如，我有以下文本：邓布利多转过身，沿着街道往回走。哈利·波特在毯子里翻了个身，没有醒来作为一个理想的结果，我应该得到邓布利多转身走；哈利·波特我使用Stanford NER查找和标记人物，然后删除所有不包含NE的句子。因此，最后我有了一个“纯”文本，它只由带有字符名称的句子组成。之后，我使用斯坦福依赖关系。因此，我得到如下smth（CONLLU输出格式）：这就是我所有问题的起点。我知道人称和动词，但我不知道如何从这种格

嗯，我的目标是从文本中提取NE（Person）和与其相关的动词。例如，我有以下文本：

邓布利多转过身，沿着街道往回走。哈利·波特在毯子里翻了个身，没有醒来

作为一个理想的结果，我应该得到

邓布利多转身走；哈利·波特

我使用Stanford NER查找和标记人物，然后删除所有不包含NE的句子。因此，最后我有了一个“纯”文本，它只由带有字符名称的句子组成。之后，我使用斯坦福依赖关系。因此，我得到如下smth（CONLLU输出格式）：

这就是我所有问题的起点。我知道人称和动词，但我不知道如何从这种格式中提取它。我想，我可以这样做：在表中找到NN/NNP，找到它的“父”，然后提取它的所有“子”单词。从理论上讲，这应该是可行的。理论上

问题是，如果有人能想出任何其他想法，如何从文本中获得一个人及其行为？或者有没有更合理的方法

我将非常感谢任何帮助

以下是一些帮助您解决问题的示例代码：

import java.io.*;
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.util.*;



public class NERAndVerbExample {

  public static void main(String[] args) throws IOException {
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,depparse,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    String text = "John Smith went to the store.";
    Annotation annotation = new Annotation(text);
    pipeline.annotate(annotation);
    System.out.println("---");
    System.out.println("text: " + text);
    System.out.println("");
    System.out.println("dependency edges:");
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      SemanticGraph sg = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
      for (SemanticGraphEdge sge : sg.edgeListSorted()) {
        System.out.println(
                sge.getGovernor().word() + "," + sge.getGovernor().index() + "," + sge.getGovernor().tag() + "," +
                        sge.getGovernor().ner()
                        + " - " + sge.getRelation().getLongName()
                        + " -> "
                        + sge.getDependent().word() + "," +
                        +sge.getDependent().index() + "," + sge.getDependent().tag() + "," + sge.getDependent().ner());
      }
      System.out.println();
      System.out.println("entity mentions:");
      for (CoreMap entityMention : sentence.get(CoreAnnotations.MentionsAnnotation.class)) {
        int lastTokenIndex = entityMention.get(CoreAnnotations.TokensAnnotation.class).size()-1;
        System.out.println(entityMention.get(CoreAnnotations.TextAnnotation.class) +
                "\t" +
                entityMention.get(CoreAnnotations.TokensAnnotation.class)
                        .get(lastTokenIndex).get(CoreAnnotations.IndexAnnotation.class) + "\t" +
                entityMention.get(CoreAnnotations.NamedEntityTagAnnotation.class));
      }
    }
  }
}

我希望在StanfordCorenlp3.8.0中添加一些语法糖，以帮助处理实体提及

为了稍微解释一下这段代码，基本上EntityMotions注释器会遍历并将具有相同NER标记的令牌分组在一起。所以“约翰·史密斯”被标记为实体提及

如果浏览依赖关系图，可以得到每个单词的索引

同样，如果访问实体提及的标记列表，还可以找到实体提及的每个单词的索引

再加上一点代码，您就可以将它们链接在一起，形成您所请求的实体-动词对

正如您在当前代码中看到的，访问实体信息非常麻烦，因此我将在3.8.0中尝试改进这一点。

噢，非常感谢！只有一个问题-我甚至不能编译你的代码来看看它是如何工作的。它提供编译信息“解析时到达文件末尾”。也许只是我做错了什么？是否有我可以阅读实体提及和索引的资源？顺便说一下，我读过关于SemRegex的书。在我看来，这个工具也可以帮助找到NE+动词对。真的是这样吗？无论如何，谢谢你的帮助！

import java.io.*;
import java.util.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.semgraph.*;
import edu.stanford.nlp.util.*;



public class NERAndVerbExample {

  public static void main(String[] args) throws IOException {
    Properties props = new Properties();
    props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,depparse,entitymentions");
    StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
    String text = "John Smith went to the store.";
    Annotation annotation = new Annotation(text);
    pipeline.annotate(annotation);
    System.out.println("---");
    System.out.println("text: " + text);
    System.out.println("");
    System.out.println("dependency edges:");
    for (CoreMap sentence : annotation.get(CoreAnnotations.SentencesAnnotation.class)) {
      SemanticGraph sg = sentence.get(SemanticGraphCoreAnnotations.CollapsedDependenciesAnnotation.class);
      for (SemanticGraphEdge sge : sg.edgeListSorted()) {
        System.out.println(
                sge.getGovernor().word() + "," + sge.getGovernor().index() + "," + sge.getGovernor().tag() + "," +
                        sge.getGovernor().ner()
                        + " - " + sge.getRelation().getLongName()
                        + " -> "
                        + sge.getDependent().word() + "," +
                        +sge.getDependent().index() + "," + sge.getDependent().tag() + "," + sge.getDependent().ner());
      }
      System.out.println();
      System.out.println("entity mentions:");
      for (CoreMap entityMention : sentence.get(CoreAnnotations.MentionsAnnotation.class)) {
        int lastTokenIndex = entityMention.get(CoreAnnotations.TokensAnnotation.class).size()-1;
        System.out.println(entityMention.get(CoreAnnotations.TextAnnotation.class) +
                "\t" +
                entityMention.get(CoreAnnotations.TokensAnnotation.class)
                        .get(lastTokenIndex).get(CoreAnnotations.IndexAnnotation.class) + "\t" +
                entityMention.get(CoreAnnotations.NamedEntityTagAnnotation.class));
      }
    }
  }
}