Nlp 斯坦福大学的数字-名称实体识别_Nlp_Stanford Nlp_Opennlp

Nlp 斯坦福大学的数字-名称实体识别

nlp stanford-nlp

Nlp 斯坦福大学的数字-名称实体识别,nlp,stanford-nlp,opennlp,Nlp,Stanford Nlp,Opennlp,我遇到了一个问题，我试图使用斯坦福大学从文本中识别数字名称实体，例如，如果我有2000万个，它像这样检索“数字”：[“20-5”，“百万-6”]，我如何优化答案，使2000万人聚集在一起？在上面的例子中，我怎么能忽略像（5,6）这样的索引号呢？我正在使用java语言 public void extractNumbers(String text) throws IOException { number = new HashMap<String, ArrayList<S

我遇到了一个问题，我试图使用斯坦福大学从文本中识别数字名称实体，例如，如果我有2000万个，它像这样检索“数字”：[“20-5”，“百万-6”]，我如何优化答案，使2000万人聚集在一起？在上面的例子中，我怎么能忽略像（5,6）这样的索引号呢？我正在使用java语言

    public void extractNumbers(String text) throws  IOException {
    number = new HashMap<String, ArrayList<String>>();
    n= new ArrayList<String>();
    edu.stanford.nlp.pipeline.Annotation document = new edu.stanford.nlp.pipeline.Annotation(text);
    pipeline.annotate(document);
    List<CoreMap> sentences = document.get(CoreAnnotations.SentencesAnnotation.class);
    for (CoreMap sentence : sentences) {
        for (CoreLabel token : sentence.get(CoreAnnotations.TokensAnnotation.class)) {

            if (!token.get(CoreAnnotations.NamedEntityTagAnnotation.class).equals("O")) {

                if (token.get(CoreAnnotations.NamedEntityTagAnnotation.class).equals("NUMBER")) {
                  n.add(token.toString());
        number.put("Number",n);
                }
            }

        }

    }

public void extractNumber（字符串文本）引发IOException{
number=新的HashMap（）；
n=新的ArrayList（）；
edu.stanford.nlp.pipeline.Annotation document=新的edu.stanford.nlp.pipeline.Annotation（文本）；
管道注释（文件）；
列出句子=document.get（coreanotations.SentencesAnnotation.class）；
for（CoreMap句子：句子）{
for（CoreLabel标记：句子.get（CoreAnnotations.TokensAnotation.class））{
如果（！token.get（CoreAnnotations.NamedEntityTagAnnotation.class）.equals（“O”））{
if（token.get（CoreAnnotations.NamedEntityTagAnnotation.class）.equals（“NUMBER”））{
n、 添加（token.toString（））；
数字。输入（“数字”，n）；
}
}
}
}

要从

CoreLabel

类的任何对象获取精确文本，只需使用

token.originalText（）

而不是

token.toString（）

如果您还需要这些标记中的任何内容，请查看

CoreLabel

。

要从

CoreLabel

类的任何对象获取确切文本，只需使用

token.originalText（）

而不是

token.toString（）

如果您需要这些令牌中的任何其他内容，请查看

CoreLabel

。

您可能需要稍微扩展一下。您使用的是哪种ner模型？您使用的是什么语言？还有一个代码片段，向我们展示您所做的事情也会有所帮助。@entrophy我编辑了以下问题：）此处哪个类的对象是

pipeline

n您正在使用哪个斯坦福管道。@entrophy'StanfordCoreNLP管道；注释；Properties props=new Properties（）；props.setProperty（“注释器”、“标记化、ssplit、pos、引理、ner”）；管道=new StanfordCoreNLP（props）“您可能需要稍微扩展一下。您使用的是哪种ner模型？您使用的是什么语言？还有一段代码片段，让我们确切地了解您所做的工作也会有所帮助。@entrophy我编辑了以下问题：）哪一个类的对象是

pipeline

。就像您使用的斯坦福管道一样。@entrophy“StanfordCorenlppipeline；Annotation”注释；Properties props=new Properties（）；props.setProperty（“注释器”、“标记化、ssplit、pos、引理、ner”）；pipeline=new StanfordCoreNLP（props）；“这对我的第二个问题有效，非常感谢这对我的第二个问题有效，非常感谢