Stanford nlp 命名实体识别RegexNER为信息添加更多列_Stanford Nlp

Stanford nlp 命名实体识别RegexNER为信息添加更多列

stanford-nlp

Stanford nlp 命名实体识别RegexNER为信息添加更多列,stanford-nlp,Stanford Nlp,有办法做到这一点吗向regexner.mapping文件中添加另一列，该列描述命名实体的某些方面，例如：工程学士学位2.0一些数据信息1 Lalor定位人员2.0一些数据信息2 劳动组织2.0 一些数据信息3 其思想是，当检测到实体提及时，可以访问此信息，例如某些数据\u信息可能是来自另一个数据库或任何东西的密钥 List<CoreMap> entityMentions = document.get(MentionsAnnotation.class); for (CoreMa

有办法做到这一点吗

向regexner.mapping文件中添加另一列，该列描述命名实体的某些方面，例如：

工程学士学位2.0一些数据信息1

Lalor定位人员2.0一些数据信息2

劳动组织2.0 一些数据信息3

其思想是，当检测到实体提及时，可以访问此信息，例如

某些数据\u信息

可能是来自另一个数据库或任何东西的密钥

List<CoreMap> entityMentions = document.get(MentionsAnnotation.class);

for (CoreMap entityMention : entityMentions) {
  //get the information in the description column...
  entityMention.get( ... );
}

List entityments=document.get（notation.class）；
for（CoreMap EntityMotions:EntityMotions）{
//获取描述列中的信息。。。
获取（…）；
}

可以这样做吗？

RegexNER目前不支持这种类型的功能。您可以编写TokensRegex规则来实现这一点

# make all patterns case-sensitive
ENV.defaultStringMatchFlags = 0
ENV.defaultStringPatternFlags = 0

# these Java classes will be used by the rules
ner = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NamedEntityTagAnnotation" }
nerInfo = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$NERInfo" }
tokens = { type: "CLASS", value: "edu.stanford.nlp.ling.CoreAnnotations$TokensAnnotation" }

# define some regexes over tokens
$COMPANY_BEGINNING = "/[A-Z][A-Za-z]+/"
$COMPANY_ENDING = "/(Corp|Inc)\.?/"

# rule for recognizing company names
{ ruleType: "tokens", pattern: ([{word:$COMPANY_BEGINNING} & {tag:"NNP"}]+ [{word:$COMPANY_ENDING}]), action: (Annotate($0, ner, "COMPANY"), Annotate($0, nerInfo, "COMPANY_INFO")), result: "COMPANY_RESULT" }


// replace "edu.stanford.nlp.ling.CoreAnnotations$NERInfo" with a class you define (that class does not exist, I just list it as an example.)

有关使用TokensRegex的详细信息，请点击此处：

您好，非常感谢您的回答。我会试试看是否足够满足我的需要。我也在测试其他工具，比如GATE，来做同样的事情。我正在做一些文本挖掘框架的基准测试。