stanford nlp从令牌列表中删除_Nlp_Stanford Nlp

stanford nlp从令牌列表中删除

nlp stanford-nlp

stanford nlp从令牌列表中删除,nlp,stanford-nlp,Nlp,Stanford Nlp,有没有一种方法可以使用stanford NER库输入令牌列表，并提取NEs 我已经检查了API，但它不明确。大多数情况下，输入是字符串或文档，在这两种情况下，标记化都是在幕后完成的在我的例子中，我确实需要在之前进行标记化，并将标记列表传递给API。我注意到我可以做到： List<HasWord> words = new ArrayList<>(); words.add(new Word("Tesco")); ..... //adding elements to wor

有没有一种方法可以使用stanford NER库输入令牌列表，并提取NEs

我已经检查了API，但它不明确。大多数情况下，输入是字符串或文档，在这两种情况下，标记化都是在幕后完成的

在我的例子中，我确实需要在之前进行标记化，并将标记列表传递给API。我注意到我可以做到：

List<HasWord> words = new ArrayList<>();

words.add(new Word("Tesco"));
..... //adding elements to words

List<CoreLabel> labels =classifier.classifySentence(words);

这是正确的吗

非常感谢

您可以使用：

以下是解决此问题的一种方法：

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.util.*;

public class NERPreToken {
    public static void main (String[] args) {
        Properties props = new Properties();
        props.setProperty("annotators",
            "tokenize, ssplit, pos, lemma, ner");
        props.setProperty("tokenize.whitespace", "true");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        String [] tokensArray = {"Stephen","Colbert","hosts","a","show","on","CBS","."};
        List<String> tokensList = Arrays.asList(tokensArray);
        String docString = String.join(" ",tokensList);
        Annotation annotation = new Annotation(docString);
        pipeline.annotate(annotation);
        List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
        for (CoreMap sentence : sentences) {
            List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
            for (CoreLabel token : tokens) {
                System.out.println(token.word()+" "+token.get(CoreAnnotations.NamedEntityTagAnnotation.class));
            }
        }
    }
}

这里的关键是从您的令牌列表开始，并将用于令牌化的管道属性设置为仅在空白处进行令牌化。然后提交一个字符串，其中包含空格连接的令牌。

我已经尝试了几个小时，但运气不佳。。。我无法理解您如何创建要分类的CoreMap对象列表。任何人都可以发布一些代码示例。只是要小心，如果你这样做，你永远不会有一个包含空格的标记，否则它将被破坏。

[Value=John Text=John Position=0 Answer=PERSON Shape=Xxxx DistSim=463]
[Value=met Text=met Position=1 Answer=O Shape=xxxk DistSim=476]
[Value=Amy Text=Amy Position=2 Answer=PERSON Shape=Xxx DistSim=396]
[Value=in Text=in Position=3 Answer=O Shape=xxk DistSim=510]
[Value=Los Text=Los Position=4 Answer=LOCATION Shape=Xxx DistSim=449]
[Value=Angeles Text=Angeles Position=5 Answer=LOCATION Shape=Xxxxx DistSim=199]

import java.io.*;
import java.util.*;
import edu.stanford.nlp.io.*;
import edu.stanford.nlp.ling.*;
import edu.stanford.nlp.pipeline.*;
import edu.stanford.nlp.trees.*;
import edu.stanford.nlp.trees.TreeCoreAnnotations.*;
import edu.stanford.nlp.util.*;

public class NERPreToken {
    public static void main (String[] args) {
        Properties props = new Properties();
        props.setProperty("annotators",
            "tokenize, ssplit, pos, lemma, ner");
        props.setProperty("tokenize.whitespace", "true");
        StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
        String [] tokensArray = {"Stephen","Colbert","hosts","a","show","on","CBS","."};
        List<String> tokensList = Arrays.asList(tokensArray);
        String docString = String.join(" ",tokensList);
        Annotation annotation = new Annotation(docString);
        pipeline.annotate(annotation);
        List<CoreMap> sentences = annotation.get(CoreAnnotations.SentencesAnnotation.class);
        for (CoreMap sentence : sentences) {
            List<CoreLabel> tokens = sentence.get(CoreAnnotations.TokensAnnotation.class);
            for (CoreLabel token : tokens) {
                System.out.println(token.word()+" "+token.get(CoreAnnotations.NamedEntityTagAnnotation.class));
            }
        }
    }
}