Stanford nlp 实体提及检测无法与TokensRegex一起正常工作_Stanford Nlp

Stanford nlp 实体提及检测无法与TokensRegex一起正常工作

stanford-nlp

Stanford nlp 实体提及检测无法与TokensRegex一起正常工作,stanford-nlp,Stanford Nlp,整个方法似乎不起作用。我遵循了这里提到的类似方法，添加了entityments，作为注释器之一输入：“这是您的24美元” 我有一个TokensRegex： { ruleType: "tokens", pattern: ([{ner:"NUMBER"}] + [{word:"USD"}]), action: Annotate($0, ner, "NEW_MONEY"), result: "NEW_MONEY_RESULT" } 初始化管道： props.setProperty("annota

整个方法似乎不起作用。我遵循了这里提到的类似方法，添加了

entityments

，作为

注释器之一

输入：“这是您的24美元”
我有一个TokensRegex：
{ ruleType: "tokens", pattern: ([{ner:"NUMBER"}] + [{word:"USD"}]), action: Annotate($0, ner, "NEW_MONEY"), result: "NEW_MONEY_RESULT" }

初始化管道：
props.setProperty("annotators", "tokenize,ssplit,pos,lemma,ner,tokensregex,entitymentions");
props.setProperty("tokensregex.rules", "basic_ner.rules");

我还是得到了2个而不是1个
对于edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation
，它们的值相同，即NEW\u MONEY

但是它们有不同的edu.stanford.nlp.ling.CoreAnnotations$EntityMotionIndexAnnotation

对于24

1
美元

由于它们都具有相同的实体标记注释，因此如何合并它们

使用了斯坦福图书馆的

3.9.2

版本。

问题是数字有一个规范化的名称实体标记

下面是一个可以使用的规则文件：

# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
normNER = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NormalizedNamedEntityTagAnnotation" }
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }

# rule for recognizing company names
{ ruleType: "tokens", pattern: ([{ner:"NUMBER"}] [{word:"USD"}]), action: (Annotate($0, ner, "NEW_MONEY"), Annotate($0, normNER, "NEW_MONEY")), result: "NEW_MONEY" }

您不应在末尾添加额外的

tokensregex

annotator和

entityments

annotator。

ner

注释器将这些作为子注释器运行

下面是一个示例命令：

java -Xmx10g edu.stanford.nlp.pipeline.StanfordCoreNLP -annotators tokenize,ssplit,pos,lemma,ner -ner.additional.tokensregex.rules new_money.rules -file new_money_example.txt -outputFormat text

此处有更多文档：