Stanford nlp 用于运行Stanford CoreNLP中文NER的代码段

Stanford nlp 用于运行Stanford CoreNLP中文NER的代码段,stanford-nlp,Stanford Nlp,我在Java程序中运行CoreNLP,使用maven依赖项。我需要在原始中文文本上运行NER。有人能提供一个代码片段来做这件事吗 我找到了说明:。。。您将首先需要运行斯坦福分词器或其他一些中文分词器,然后在该分词器的输出上运行NER!但我不知道该怎么做。你有没有把一个中文段注释员加入到英文段注释中?在此之前,您需要中文文档到EntenceProcessor吗?所有这些都可以使用StanfordCoreNLP和正确的属性集来完成,还是需要其他东西?我有中国模特 谢谢。您可以在中文文本上运行整个管道

我在Java程序中运行CoreNLP,使用maven依赖项。我需要在原始中文文本上运行NER。有人能提供一个代码片段来做这件事吗

我找到了说明:。。。您将首先需要运行斯坦福分词器或其他一些中文分词器,然后在该分词器的输出上运行NER!但我不知道该怎么做。你有没有把一个中文段注释员加入到英文段注释中?在此之前,您需要中文文档到EntenceProcessor吗?所有这些都可以使用StanfordCoreNLP和正确的属性集来完成,还是需要其他东西?我有中国模特


谢谢。

您可以在中文文本上运行整个管道。关键区别在于使用段注释器而不是标记化注释器

以下是您用于整个中国管道的属性。您可以删除任何不需要的注释器。因此,在您的例子中,您可以在ner处停止并删除parse、antify和coref的属性

# Pipeline options - lemma is no-op for Chinese but currently needed because coref demands it (bad old requirements system)
annotators = segment, ssplit, pos, lemma, ner, parse, mention, coref

# segment
customAnnotatorClass.segment = edu.stanford.nlp.pipeline.ChineseSegmenterAnnotator

segment.model = edu/stanford/nlp/models/segmenter/chinese/ctb.gz
segment.sighanCorporaDict = edu/stanford/nlp/models/segmenter/chinese
segment.serDictionary = edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz
segment.sighanPostProcessing = true

# sentence split
ssplit.boundaryTokenRegex = [.]|[!?]+|[。]|[!?]+

# pos
pos.model = edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger

# ner
ner.model = edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz
ner.applyNumericClassifiers = false
ner.useSUTime = false

# parse
parse.model = edu/stanford/nlp/models/lexparser/chineseFactored.ser.gz

# coref and mention
coref.sieves = ChineseHeadMatch, ExactStringMatch, PreciseConstructs, StrictHeadMatch1, StrictHeadMatch2, StrictHeadMatch3, StrictHeadMatch4, PronounMatch
coref.input.type = raw
coref.postprocessing = true
coref.calculateFeatureImportance = false
coref.useConstituencyTree = true
coref.useSemantics = false
coref.md.type = RULE
coref.mode = hybrid
coref.path.word2vec =
coref.language = zh
coref.print.md.log = false
coref.defaultPronounAgreement = true
coref.zh.dict = edu/stanford/nlp/models/dcoref/zh-attributes.txt.gz
如果有机会,我会尝试为您编写一个完整的演示类,包括适当的导入。但这是一段代码,将在中文文本上运行管道。确保类路径中有中文模型jar。您可以学习如何在Maven中添加中文模型jar

Properties props = new Properties();
props = StringUtils.propFileToProperties("StanfordCoreNLP-chinese.properties");
// that properties file will run the entire pipeline
// if you uncomment the following line it will just go up to ner
//props.setProperty("annotators","segment,ssplit,pos,lemma,ner");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation annotation = new Annotation("whatever your Chinese text is");
pipeline.annotate(annotation);

您可以在中文文本上运行整个管道。关键区别在于使用段注释器而不是标记化注释器

以下是您用于整个中国管道的属性。您可以删除任何不需要的注释器。因此,在您的例子中,您可以在ner处停止并删除parse、antify和coref的属性

# Pipeline options - lemma is no-op for Chinese but currently needed because coref demands it (bad old requirements system)
annotators = segment, ssplit, pos, lemma, ner, parse, mention, coref

# segment
customAnnotatorClass.segment = edu.stanford.nlp.pipeline.ChineseSegmenterAnnotator

segment.model = edu/stanford/nlp/models/segmenter/chinese/ctb.gz
segment.sighanCorporaDict = edu/stanford/nlp/models/segmenter/chinese
segment.serDictionary = edu/stanford/nlp/models/segmenter/chinese/dict-chris6.ser.gz
segment.sighanPostProcessing = true

# sentence split
ssplit.boundaryTokenRegex = [.]|[!?]+|[。]|[!?]+

# pos
pos.model = edu/stanford/nlp/models/pos-tagger/chinese-distsim/chinese-distsim.tagger

# ner
ner.model = edu/stanford/nlp/models/ner/chinese.misc.distsim.crf.ser.gz
ner.applyNumericClassifiers = false
ner.useSUTime = false

# parse
parse.model = edu/stanford/nlp/models/lexparser/chineseFactored.ser.gz

# coref and mention
coref.sieves = ChineseHeadMatch, ExactStringMatch, PreciseConstructs, StrictHeadMatch1, StrictHeadMatch2, StrictHeadMatch3, StrictHeadMatch4, PronounMatch
coref.input.type = raw
coref.postprocessing = true
coref.calculateFeatureImportance = false
coref.useConstituencyTree = true
coref.useSemantics = false
coref.md.type = RULE
coref.mode = hybrid
coref.path.word2vec =
coref.language = zh
coref.print.md.log = false
coref.defaultPronounAgreement = true
coref.zh.dict = edu/stanford/nlp/models/dcoref/zh-attributes.txt.gz
如果有机会,我会尝试为您编写一个完整的演示类,包括适当的导入。但这是一段代码,将在中文文本上运行管道。确保类路径中有中文模型jar。您可以学习如何在Maven中添加中文模型jar

Properties props = new Properties();
props = StringUtils.propFileToProperties("StanfordCoreNLP-chinese.properties");
// that properties file will run the entire pipeline
// if you uncomment the following line it will just go up to ner
//props.setProperty("annotators","segment,ssplit,pos,lemma,ner");
StanfordCoreNLP pipeline = new StanfordCoreNLP(props);
Annotation annotation = new Annotation("whatever your Chinese text is");
pipeline.annotate(annotation);

好吧,我放弃了,为什么这个值得a-1?好吧,我放弃了,为什么这个值得a-1?非常有用!!非常感谢您的帮助!!深切感激